Introduction

This project analyzes the differences in neighborhoods characteristics in New York City. Using choropleth maps, it highlights the disparities in race/ethnicity, educational and poverty levels among the different NYC neighborhoods.

Data

The analysis uses estimates from the American Community Survey (ACS) and spatial data of NYC neighborhoods, accessed through the nycgeo package in R. The package contains spatial data files for various geographic and administrative boundaries in New York City as well as selected demographic, social, and economic estimates from the ACS.

Extract the data and take a look at the top few rows.

##

data <- 
  nyc_boundaries(geography = "nta",
                 add_acs_data = T)


data %>% head(5)
## Simple feature collection with 5 features and 35 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 980630.8 ymin: 147001.4 xmax: 1007000 ymax: 195449.4
## Projected CRS: NAD83 / New York Long Island (ftUS)
## # A tibble: 5 × 36
##   nta_id nta_n…¹ state…² count…³ count…⁴ borou…⁵ borou…⁶ puma_id puma_…⁷ pop_t…⁸
##   <chr>  <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>     <dbl>
## 1 BK09   Brookl… 36      047     Kings   Brookl… 3       4004    Brookl…   24212
## 2 BK17   Sheeps… 36      047     Kings   Brookl… 3       4016    Brookl…   67681
## 3 BK19   Bright… 36      047     Kings   Brookl… 3       4018    Brookl…   35811
## 4 BK21   Seagat… 36      047     Kings   Brookl… 3       4018    Brookl…   31132
## 5 BK23   West B… 36      047     Kings   Brookl… 3       4018    Brookl…   16436
## # … with 26 more variables: pop_total_moe <dbl>, pop_white_est <dbl>,
## #   pop_white_moe <dbl>, pop_white_pct_est <dbl>, pop_white_pct_moe <dbl>,
## #   pop_black_est <dbl>, pop_black_moe <dbl>, pop_black_pct_est <dbl>,
## #   pop_black_pct_moe <dbl>, pop_hisp_est <dbl>, pop_hisp_moe <dbl>,
## #   pop_hisp_pct_est <dbl>, pop_hisp_pct_moe <dbl>, pop_asian_est <dbl>,
## #   pop_asian_moe <dbl>, pop_asian_pct_est <dbl>, pop_asian_pct_moe <dbl>,
## #   pop_ba_above_est <dbl>, pop_ba_above_moe <dbl>, …

Data Wrangling

In this section, I do some EDA to understand the data. This will help identify the amount of data cleaning and manipulation needed to get it ready for mapping.

Data Cleaning

First, I check to see if there are any missing values in the data and figure out how to handle them.

library(Amelia)

data %>% missmap(main = "Observed vs Missing Values")

The plot above shows that about two percent of the data are missing. From the table below, we see that there are only six rows with missing data, all of which have missing ACS data.

data %>% filter(is.na(pop_total_est))
## Simple feature collection with 6 features and 35 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 945163.9 ymin: 132264.1 xmax: 1058145 ymax: 271352.1
## Projected CRS: NAD83 / New York Long Island (ftUS)
## # A tibble: 6 × 36
##   nta_id nta_n…¹ state…² count…³ count…⁴ borou…⁵ borou…⁶ puma_id puma_…⁷ pop_t…⁸
## * <chr>  <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>     <dbl>
## 1 BK99   park-c… 36      047     Kings   Brookl… 3       <NA>    <NA>         NA
## 2 BX99   park-c… 36      005     Bronx   Bronx   2       <NA>    <NA>         NA
## 3 MN99   park-c… 36      061     New Yo… Manhat… 1       <NA>    <NA>         NA
## 4 QN98   Airport 36      081     Queens  Queens  4       <NA>    <NA>         NA
## 5 QN99   park-c… 36      081     Queens  Queens  4       <NA>    <NA>         NA
## 6 SI99   park-c… 36      085     Richmo… Staten… 5       <NA>    <NA>         NA
## # … with 26 more variables: pop_total_moe <dbl>, pop_white_est <dbl>,
## #   pop_white_moe <dbl>, pop_white_pct_est <dbl>, pop_white_pct_moe <dbl>,
## #   pop_black_est <dbl>, pop_black_moe <dbl>, pop_black_pct_est <dbl>,
## #   pop_black_pct_moe <dbl>, pop_hisp_est <dbl>, pop_hisp_moe <dbl>,
## #   pop_hisp_pct_est <dbl>, pop_hisp_pct_moe <dbl>, pop_asian_est <dbl>,
## #   pop_asian_moe <dbl>, pop_asian_pct_est <dbl>, pop_asian_pct_moe <dbl>,
## #   pop_ba_above_est <dbl>, pop_ba_above_moe <dbl>, …

As can shown in the table above, the rows with missing values are mostly parks, cemeteries and airports. The data treats them as neighborhoods of their own; however, there are no ACS demographic or economic data associated with them.

The rows with ACS missing data are dropped when calculating borough-wide central tendencies as they would affect the results if not excluded. However, I keep the missing data when mapping neighborhood differences. The rows (representing parks, cemeteries, airports, etc.) with ACS missing data are grayed out on the maps.

Bivariate Analysis

Plot borough against the demographic, educational and economic variables to understand the differences among NYC residents from different boroughs.

Population by Race

data %>% 
  select(borough_name, pop_white_est, pop_asian_est, pop_black_est, pop_hisp_est) %>% 
  as.data.frame() %>% 
  drop_na(pop_white_est) %>%
  group_by(borough_name) %>% 
  summarise(White = sum(pop_white_est),
            Asian = sum(pop_asian_est),
            Black = sum(pop_black_est),
            Hispanic = sum(pop_black_est)
            ) %>% 
  pivot_longer(cols = 2:5,
               names_to = "Race",
               values_to = "total_pop") %>%
  ggplot(aes(x = borough_name, y = total_pop, fill = Race)) +
  geom_bar(position = "fill", stat = "identity") +
  scale_y_continuous(labels = percent) +
  scale_fill_tableau() +
  theme_few() +
  labs(title = "Share of Borough Population by Race/Ethnicity",
       x = NULL, y = "Share of Total Borough Population")

Educational Levels by Borough

data %>% 
  select(borough_name,pop_total_est, pop_ba_above_est) %>% 
  as.data.frame() %>% 
  drop_na(pop_ba_above_est) %>%
  mutate(no_bachelors = pop_total_est - pop_ba_above_est) %>% 
  group_by(borough_name) %>%
  summarise(`Bachelors or Higher` = sum(pop_ba_above_est),
            `No Bachelors` = sum(no_bachelors)) %>%
  pivot_longer(cols = 2:3,
               names_to = "Educ. Level",
               values_to = "Total") %>%
  ggplot(aes(x = borough_name, y = Total, fill = `Educ. Level`)) +
  geom_bar(position = "fill", stat = "identity") +
  scale_y_continuous(labels = percent) +
  scale_fill_manual(values = c("#76B7B2", "#E15759")) +
  theme_few() +
  labs(title = "Borough Population by with Educational Level",
       x = NULL, y = "Share of Total Borough Population")

Poverty Levels by Borough

data %>% 
  select(borough_name,pop_total_est, pop_inpov_est) %>% 
  as.data.frame() %>% 
  drop_na(pop_total_est) %>%
  mutate(not_poor = pop_total_est - pop_inpov_est) %>% 
  group_by(borough_name) %>%
  summarise("Not Poor" = sum(not_poor),
            "Poor" = sum(pop_inpov_est)) %>% 
  pivot_longer(cols = 2:3,
               names_to = "Poverty Level",
               values_to = "Total") %>%
  ggplot(aes(x = borough_name, y = Total, fill = `Poverty Level`)) +
  geom_bar(position = "fill", stat = "identity") +
  scale_y_continuous(labels = percent) +
  scale_fill_manual(values = c("#76B7B2", "#E15759")) +
  coord_flip() +
  theme_few() +
  labs(title = "Borough Population by Poverty Level",
       x = NULL, y = "Share of Total Borough Population")

Race by Neighborhood

This section takes a look at the racial components of neighborhoods. The maps depict the share of a neighborhood’s population that are White, Black, Hispanic or Asian. They

grid.arrange(
  
## White  
  data %>% 
  ggplot()+
  geom_sf(aes(fill = pop_white_pct_est), color = "white", lwd = .2) + 
  scale_fill_continuous_tableau(palette = "Blue-Teal", name = "% White",
                                labels = percent_format()
                                ) + 
  theme_void() +
  theme(plot.title.position = 'plot', 
        plot.title = element_text(hjust = 0.5)) +
  theme(panel.grid = element_line(color = "transparent")) +
  labs(title = "Share Neighborhood Population that is White"),
  
## Black  
  data %>% 
  ggplot()+
  geom_sf(aes(fill = pop_black_pct_est), color = "white", lwd = .2) + 
  scale_fill_continuous_tableau(palette = "Blue-Teal", name = "% Black",
                                labels = percent_format()
                                ) + 
  theme_void() +
  theme(plot.title.position = 'plot', 
        plot.title = element_text(hjust = 0.5)) +
  theme(panel.grid = element_line(color = "transparent")) +
  labs(title = "Share Neighborhood Population that is Black"),
  
## Hispanics  
  data %>% 
  ggplot()+
  geom_sf(aes(fill = pop_hisp_pct_est), color = "white", lwd = .2) + 
  scale_fill_continuous_tableau(palette = "Blue-Teal", name = "% Hispanic",
                                labels = percent_format()
                                ) + 
  theme_void() +
  theme(plot.title.position = 'plot', 
        plot.title = element_text(hjust = 0.5)) +
  theme(panel.grid = element_line(color = "transparent")) +
  labs(title = "Share Neighborhood Population that is Hispanic"),
  
## Asians  
  data %>% 
  ggplot()+
  geom_sf(aes(fill = pop_asian_pct_est), color = "white", lwd = .2) + 
  scale_fill_continuous_tableau(palette = "Blue-Teal", name = "% Asian",
                                labels = percent_format()
                                ) + 
  theme_void() +
  theme(plot.title.position = 'plot', 
        plot.title = element_text(hjust = 0.5)) +
  theme(panel.grid = element_line(color = "transparent")) +
  labs(title = "Share Neighborhood Population that is Asian"),
  
  
  ncol = 2, nrow = 2

    
)

Share of Neighborhood Population with Bachelors or Higher

data %>% 
  ggplot()+
  geom_sf(aes(fill = pop_ba_above_pct_est), color = "white", lwd = .2) + 
  scale_fill_continuous_tableau(palette = "Blue-Teal", name = "Bachelor's or Higher",
                                labels = percent_format()) + 
  theme_void() +
  theme(plot.title.position = 'plot', 
        plot.title = element_text(hjust = 0.5)) +
  theme(panel.grid = element_line(color = "transparent")) +
  labs(title = "Which NYC Neighborhood is the Most educated?",
       caption = "Grayed out areas are parks, cemeteries, airports, etc.")

Share of Neighborhood Population in Poverty

data %>% 
  ggplot()+
  geom_sf(aes(fill = pop_inpov_pct_est), color = "white", lwd = .2) + 
  scale_fill_continuous_tableau(palette = "Blue-Teal", name = "Bachelor's or Higher",
                                labels = percent_format()) + 
  theme_void() +
  theme(plot.title.position = 'plot', 
        plot.title = element_text(hjust = 0.5)) +
  theme(panel.grid = element_line(color = "transparent")) +
  labs(title = "Share of Neighborhood Population in Poverty",
       caption = "Grayed out areas are parks, cemeteries, airports, etc.")

Takeaways