This project analyzes the differences in neighborhoods characteristics in New York City. Using choropleth maps, it highlights the disparities in race/ethnicity, educational and poverty levels among the different NYC neighborhoods.
The analysis uses estimates from the American Community Survey (ACS)
and spatial data of NYC neighborhoods, accessed through the
nycgeo package in R. The package contains spatial data
files for various geographic and administrative boundaries in New York
City as well as selected demographic, social, and economic estimates
from the ACS.
Extract the data and take a look at the top few rows.
##
data <-
nyc_boundaries(geography = "nta",
add_acs_data = T)
data %>% head(5)
## Simple feature collection with 5 features and 35 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 980630.8 ymin: 147001.4 xmax: 1007000 ymax: 195449.4
## Projected CRS: NAD83 / New York Long Island (ftUS)
## # A tibble: 5 × 36
## nta_id nta_n…¹ state…² count…³ count…⁴ borou…⁵ borou…⁶ puma_id puma_…⁷ pop_t…⁸
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 BK09 Brookl… 36 047 Kings Brookl… 3 4004 Brookl… 24212
## 2 BK17 Sheeps… 36 047 Kings Brookl… 3 4016 Brookl… 67681
## 3 BK19 Bright… 36 047 Kings Brookl… 3 4018 Brookl… 35811
## 4 BK21 Seagat… 36 047 Kings Brookl… 3 4018 Brookl… 31132
## 5 BK23 West B… 36 047 Kings Brookl… 3 4018 Brookl… 16436
## # … with 26 more variables: pop_total_moe <dbl>, pop_white_est <dbl>,
## # pop_white_moe <dbl>, pop_white_pct_est <dbl>, pop_white_pct_moe <dbl>,
## # pop_black_est <dbl>, pop_black_moe <dbl>, pop_black_pct_est <dbl>,
## # pop_black_pct_moe <dbl>, pop_hisp_est <dbl>, pop_hisp_moe <dbl>,
## # pop_hisp_pct_est <dbl>, pop_hisp_pct_moe <dbl>, pop_asian_est <dbl>,
## # pop_asian_moe <dbl>, pop_asian_pct_est <dbl>, pop_asian_pct_moe <dbl>,
## # pop_ba_above_est <dbl>, pop_ba_above_moe <dbl>, …
In this section, I do some EDA to understand the data. This will help identify the amount of data cleaning and manipulation needed to get it ready for mapping.
First, I check to see if there are any missing values in the data and figure out how to handle them.
library(Amelia)
data %>% missmap(main = "Observed vs Missing Values")
The plot above shows that about two percent of the data are missing. From the table below, we see that there are only six rows with missing data, all of which have missing ACS data.
data %>% filter(is.na(pop_total_est))
## Simple feature collection with 6 features and 35 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 945163.9 ymin: 132264.1 xmax: 1058145 ymax: 271352.1
## Projected CRS: NAD83 / New York Long Island (ftUS)
## # A tibble: 6 × 36
## nta_id nta_n…¹ state…² count…³ count…⁴ borou…⁵ borou…⁶ puma_id puma_…⁷ pop_t…⁸
## * <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 BK99 park-c… 36 047 Kings Brookl… 3 <NA> <NA> NA
## 2 BX99 park-c… 36 005 Bronx Bronx 2 <NA> <NA> NA
## 3 MN99 park-c… 36 061 New Yo… Manhat… 1 <NA> <NA> NA
## 4 QN98 Airport 36 081 Queens Queens 4 <NA> <NA> NA
## 5 QN99 park-c… 36 081 Queens Queens 4 <NA> <NA> NA
## 6 SI99 park-c… 36 085 Richmo… Staten… 5 <NA> <NA> NA
## # … with 26 more variables: pop_total_moe <dbl>, pop_white_est <dbl>,
## # pop_white_moe <dbl>, pop_white_pct_est <dbl>, pop_white_pct_moe <dbl>,
## # pop_black_est <dbl>, pop_black_moe <dbl>, pop_black_pct_est <dbl>,
## # pop_black_pct_moe <dbl>, pop_hisp_est <dbl>, pop_hisp_moe <dbl>,
## # pop_hisp_pct_est <dbl>, pop_hisp_pct_moe <dbl>, pop_asian_est <dbl>,
## # pop_asian_moe <dbl>, pop_asian_pct_est <dbl>, pop_asian_pct_moe <dbl>,
## # pop_ba_above_est <dbl>, pop_ba_above_moe <dbl>, …
As can shown in the table above, the rows with missing values are mostly parks, cemeteries and airports. The data treats them as neighborhoods of their own; however, there are no ACS demographic or economic data associated with them.
The rows with ACS missing data are dropped when calculating borough-wide central tendencies as they would affect the results if not excluded. However, I keep the missing data when mapping neighborhood differences. The rows (representing parks, cemeteries, airports, etc.) with ACS missing data are grayed out on the maps.
Plot borough against the demographic, educational and economic variables to understand the differences among NYC residents from different boroughs.
data %>%
select(borough_name, pop_white_est, pop_asian_est, pop_black_est, pop_hisp_est) %>%
as.data.frame() %>%
drop_na(pop_white_est) %>%
group_by(borough_name) %>%
summarise(White = sum(pop_white_est),
Asian = sum(pop_asian_est),
Black = sum(pop_black_est),
Hispanic = sum(pop_black_est)
) %>%
pivot_longer(cols = 2:5,
names_to = "Race",
values_to = "total_pop") %>%
ggplot(aes(x = borough_name, y = total_pop, fill = Race)) +
geom_bar(position = "fill", stat = "identity") +
scale_y_continuous(labels = percent) +
scale_fill_tableau() +
theme_few() +
labs(title = "Share of Borough Population by Race/Ethnicity",
x = NULL, y = "Share of Total Borough Population")
data %>%
select(borough_name,pop_total_est, pop_ba_above_est) %>%
as.data.frame() %>%
drop_na(pop_ba_above_est) %>%
mutate(no_bachelors = pop_total_est - pop_ba_above_est) %>%
group_by(borough_name) %>%
summarise(`Bachelors or Higher` = sum(pop_ba_above_est),
`No Bachelors` = sum(no_bachelors)) %>%
pivot_longer(cols = 2:3,
names_to = "Educ. Level",
values_to = "Total") %>%
ggplot(aes(x = borough_name, y = Total, fill = `Educ. Level`)) +
geom_bar(position = "fill", stat = "identity") +
scale_y_continuous(labels = percent) +
scale_fill_manual(values = c("#76B7B2", "#E15759")) +
theme_few() +
labs(title = "Borough Population by with Educational Level",
x = NULL, y = "Share of Total Borough Population")
data %>%
select(borough_name,pop_total_est, pop_inpov_est) %>%
as.data.frame() %>%
drop_na(pop_total_est) %>%
mutate(not_poor = pop_total_est - pop_inpov_est) %>%
group_by(borough_name) %>%
summarise("Not Poor" = sum(not_poor),
"Poor" = sum(pop_inpov_est)) %>%
pivot_longer(cols = 2:3,
names_to = "Poverty Level",
values_to = "Total") %>%
ggplot(aes(x = borough_name, y = Total, fill = `Poverty Level`)) +
geom_bar(position = "fill", stat = "identity") +
scale_y_continuous(labels = percent) +
scale_fill_manual(values = c("#76B7B2", "#E15759")) +
coord_flip() +
theme_few() +
labs(title = "Borough Population by Poverty Level",
x = NULL, y = "Share of Total Borough Population")
This section takes a look at the racial components of neighborhoods. The maps depict the share of a neighborhood’s population that are White, Black, Hispanic or Asian. They
grid.arrange(
## White
data %>%
ggplot()+
geom_sf(aes(fill = pop_white_pct_est), color = "white", lwd = .2) +
scale_fill_continuous_tableau(palette = "Blue-Teal", name = "% White",
labels = percent_format()
) +
theme_void() +
theme(plot.title.position = 'plot',
plot.title = element_text(hjust = 0.5)) +
theme(panel.grid = element_line(color = "transparent")) +
labs(title = "Share Neighborhood Population that is White"),
## Black
data %>%
ggplot()+
geom_sf(aes(fill = pop_black_pct_est), color = "white", lwd = .2) +
scale_fill_continuous_tableau(palette = "Blue-Teal", name = "% Black",
labels = percent_format()
) +
theme_void() +
theme(plot.title.position = 'plot',
plot.title = element_text(hjust = 0.5)) +
theme(panel.grid = element_line(color = "transparent")) +
labs(title = "Share Neighborhood Population that is Black"),
## Hispanics
data %>%
ggplot()+
geom_sf(aes(fill = pop_hisp_pct_est), color = "white", lwd = .2) +
scale_fill_continuous_tableau(palette = "Blue-Teal", name = "% Hispanic",
labels = percent_format()
) +
theme_void() +
theme(plot.title.position = 'plot',
plot.title = element_text(hjust = 0.5)) +
theme(panel.grid = element_line(color = "transparent")) +
labs(title = "Share Neighborhood Population that is Hispanic"),
## Asians
data %>%
ggplot()+
geom_sf(aes(fill = pop_asian_pct_est), color = "white", lwd = .2) +
scale_fill_continuous_tableau(palette = "Blue-Teal", name = "% Asian",
labels = percent_format()
) +
theme_void() +
theme(plot.title.position = 'plot',
plot.title = element_text(hjust = 0.5)) +
theme(panel.grid = element_line(color = "transparent")) +
labs(title = "Share Neighborhood Population that is Asian"),
ncol = 2, nrow = 2
)
A larger share of residents in predominately White and Asian neighborhoods have a bachelor’s degree or higher.
The predominantly White and Asian neighborhoods have smaller shares of their residents living in poverty. The reverse is true for predominantly Black and Hispanic neighborhoods.
Neighborhoods with high share of college graduates tend to have low poverty rates.