1. Theme | Sanitation. This can be a hot issue in NYC. Think Pizza Rat. |
2. Units > type | Neighborhoods are a common unit of disaggregation in NYC open data and have a more conversational feel for reporting than zip codes or community districts. Because of residential segregation, using neighborhoods may reveal racial impacts of municipal policies. |
3. Metrics > stable |
Are neighborhood tabulation areas similarly sized? I was able to answer this question directly in NYC Open Data using visualizations (below). What about population density? It is clear, if you have been to Laurelton and the Lower East Side that neighborhoods are not similarly dense. To quantify this I downloaded a spreadsheet from the Department of City Planning (below). |
Following the steps below, I found that Neighborhood Tabulation Areas range from about 20,000 to 80,000 people with some notable outliers. (Link to visualization and list of outliers is below.)
This spreadsheet shows that Neighborhood Tabulation Areas (NTAs) range in density from 6 people per square acre up to 200 people per square acre, which is a pretty big range. For the purpose of generating story ideas, I decided to compare sanitation-related data for NTAs with similar population and density, by selecting neighborhoods from a single quadrant in the scatterplot below.
I created the scatterplot by:
4. Units > Points | For the purpose of generating story ideas, I decided to compare sanitation-related data for NTAs with similar population and density. |
To restrict my data to just these points I used filters in Excel.
Because I couldn't easily visualize all ten neighborhoods, I used the Department of Planning's Population Factfinder to make a map.
5. Metrics > Variables |
Passed/failed rat inspections.
|
The data set is very large. Excel will only open a file with about a million records or fewer, and a million records is probably enough for a data story idea. We want to be able to export the data in Excel so we can merge it with the Excel files we have that have neighborhood information.
Following the steps below, I found that initial rat inspections range from 160,000 to 180,000 per year.
I filter to the most recent full year available (2022) to download.
In Excel I use a pivot table to calculate the 2022 pass rate.
I used the INDEX and MATCH formulas in Excel to merge the pass rates into the demographic data file. You could also use the VLOOKUP formula. If you are not comfortable with these formulas, you can copy and paste the data for the ten neighborhoods you have selected.
Finally I used the scatterplot feature in Excel to compare these ten neighborhoods' racial composition and rat inspection pass rates.
Story idea | Dyker Heights and East Flatbush. These neighborhoods are similar in density and income. Why does the one with fewer Black residents pass more rat inspections? |