Source: www.redbubble.com
This project examine the factors that may account for the differences in college enrollment odds among black people. Native born Blacks. Literature on the topic shows that in the Black community (everyone considered as having African origins), college enrollment rates for students with immigrant parents are higher than those for students with parents born in the U.S. Understanding this question may help measure the scale of societal discrimination versus other factors that account for the education success gap.
Among a number of others, this analysis seeks to mainly answer the question: is there a difference in college enrollment odds between children from US-born Black households and those from immigrants households? Lower college enrollment rates for students with native-born Black parents than Black students with immigrant parents would imply that discrimination alone does not explain the success gap.
The data used in this project comes from the Current Population Survey (CPS) - a monthly survey of U.S. households conducted by the Bureau of Labor Statistics for the Census Bureau. The CPS data used here is from the March 2013 survey and was accessed through the IPUMS project of the Minnesota Population Center at the University of Minnesota.
Before uploading the I load the packages containing the various
functions we will be using for the analysis. They include
tidyverse
, visdat
, summarytools
,
plotly
, among others.
Let’s upload the data.
Here are the top and bottom rows of the data.
Now, let us look at the summary and structure of the data.
No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | STATEFIP [numeric] |
|
51 distinct values | 202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | AGE [numeric] |
|
82 distinct values | 202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | SEX [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | RACE [numeric] |
|
26 distinct values | 202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | MARST [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | ASIAN [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | BPL [numeric] |
|
162 distinct values | 202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | CITIZEN [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | NATIVITY [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 | HISPAN [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
11 | EMPSTAT [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12 | LABFORCE [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
13 | EDUC99 [numeric] |
|
16 distinct values | 202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
14 | SCHLCOLL [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
15 | FTOTVAL [numeric] |
|
37560 distinct values | 202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
16 | OFFPOV [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
17 | POVERTY [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
18 | GOTWIC [numeric] |
|
|
202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
19 | HOURWAGE [numeric] |
|
1032 distinct values | 202634 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
20 | SEX_HEAD [numeric] |
|
|
202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
21 | RACE_HEAD [numeric] |
|
24 distinct values | 202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
22 | MARST_HEAD [numeric] |
|
|
202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
23 | BPL_HEAD [numeric] |
|
161 distinct values | 202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
24 | CITIZEN_HEAD [numeric] |
|
|
202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
25 | NATIVITY_HEAD [numeric] |
|
|
202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
26 | HISPAN_HEAD [numeric] |
|
|
202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
27 | LABFORCE_HEAD [numeric] |
|
|
202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
28 | EDUC99_HEAD [numeric] |
|
15 distinct values | 202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
29 | FTOTVAL_HEAD [numeric] |
|
36025 distinct values | 202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
30 | OFFPOV_HEAD [numeric] |
|
|
202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
31 | POVERTY_HEAD [numeric] |
|
|
202537 (100.0%) | 97 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
32 | GOTWIC_HEAD [numeric] |
|
|
202537 (100.0%) | 97 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.2.1)
2023-02-11
In this section, I clean the data to get it ready for modelling.
The first thing I do is create new variables out of the existing
ones. It involves breaking up categorical variables with multiple levels
into binary variables that will be used in the regression. Some of the
dummy variables created include AFRICAN_HEAD
,
CARRIBEAN_HEAD
.
Some of the main variables used in this work have observations that fall outside the scope of the analysis. I remove them from the data.
AGE: The analysis focuses on respondents of college-going age and their families. Hence, I set the age criteria to 18 and 25 years, excluding repondents who fall outside of that age range.
FTOTVAL: Some respondents reported income levels of $0 or less (see summary table in Section 3). Since the analyses focuses on working families, I treat those with no incomes as outliers and remove them from our analysis.
Now, let’s execute the cleaning listed above.
After removing the unwanted age groups and those with no income, we are left with 18994 total number of observations.
I keep only variables needed for the analysis - there are about 12 of them. The rest are excluded from this work.
numeric
to factor
All the categorical variables have been classified as
character
variables in R. I change them to factors, as this
matters for the regression analysis.
The summary table indicated that some of the columns had missing values. Let’s check to see if there are still present even after doing some cleaning.
From the chart above, there are no missing values in our data. It seems removing respondents outside the age group we need and those with no incomes took care of the missing values.
The summary table above showed that there were duplicated rows. Let’s check to see if they are still present and consolidate them.
Number of Duplicated Rows |
---|
419 |
Now, let’s keep only unique rows and confirm that all the duplicates have been delt with.
Number of Duplicated Rows |
---|
0 |
Now that we have our final data for the analysis, let’s do some EDA.
One of the main assumptions for logistic regressions is that there be no collinearity/multicollinearity among the explanatory variables. This means that the predictor variables must not have a high correlation or association. Hence, I check the correlation among all the variables I intend to use as predictor variables in the logistic regression model.
From the plot, there appear to be strong positive correlation between
some pair of variables. RACE_OTHER
and
HISPANICS
are highly correlated. This makes sense since
most Hispanics probably identify as “other” when it comes to race.
There is also high correlation between the birth place of heads of household (BPL_HEAD) and their status as a foreigner (FOREIGN_HEAD). Again, this makes sense as most non-U.S. born heads of households are from other continents besides Africa and Caribbean.
It is important to keep track of which two or more variables are highly correlated. Using them as explanatory variables in the same model would cause multicollinearity issues, affecting the overall validity of the model.
Using logit regression models, I attempt to answer the main question: is there a difference in college enrollment rates between Native-born Blacks household children and African immigrants household children?.
But before modeling enrollment rates among Black people, I look at college enrollment rates in the overall population with emphasis on the differences in enrollment rates between whites and other races and Hispanics. I also look at college enrollment rates in California to determine if the State’s ban on affirmative action in 1996 has had any impact on college enrollment rates.
The dependent variable for the regression is
COLL_ATT
.
Let’s start with enrollment rates in the overall sample and compare enrollment rates among other races to that of whites.
##
## Call:
## glm(formula = COLL_ATT ~ AGE + SEX + FTOTVAL + BACHELORS_HEAD +
## STABLEMARRIAGE + RACE + HISPANIC, family = binomial(link = "logit"),
## data = IPUMS)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8336 -0.9200 -0.7477 1.2642 2.1406
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.609e+00 1.536e-01 10.473 < 2e-16 ***
## AGE -1.299e-01 6.977e-03 -18.617 < 2e-16 ***
## SEXFEMALE 3.842e-01 3.169e-02 12.124 < 2e-16 ***
## FTOTVAL 2.425e-06 2.340e-07 10.360 < 2e-16 ***
## BACHELORS_HEADYES 5.072e-01 3.770e-02 13.452 < 2e-16 ***
## STABLEMARRIAGEYES 4.818e-02 3.476e-02 1.386 0.165766
## RACEASIAN 6.363e-01 6.732e-02 9.451 < 2e-16 ***
## RACEBLACK -3.225e-02 5.096e-02 -0.633 0.526796
## RACEMIXED RACE 8.942e-02 9.618e-02 0.930 0.352523
## RACENATIVEAMERICAN -7.125e-01 1.891e-01 -3.768 0.000164 ***
## RACEOTHER -1.853e-01 2.618e-01 -0.708 0.479014
## RACEPACIFICISLANDER -2.047e-01 2.232e-01 -0.917 0.359057
## HISPANICYES 1.294e-01 2.589e-01 0.500 0.617016
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 24410 on 18777 degrees of freedom
## Residual deviance: 23203 on 18765 degrees of freedom
## AIC: 23229
##
## Number of Fisher Scoring iterations: 4
From the regression, the following variables are statistically
significant at 95% confidence interval: AGE
,
SEX(FEMALE)
, FTOTVAL
,
BACHELORS_HEAD
, and
RACE (ASIAN & NATIVE AMERICAN)
. The aforementiond
variable influence college enrollment odds - some positively, others
negatively.
Odds Ratio
## (Intercept) AGE SEXFEMALE FTOTVAL
## 4.9980399 0.8781926 1.4684601 1.0000024
## BACHELORS_HEADYES STABLEMARRIAGEYES RACEASIAN RACEBLACK
## 1.6605527 1.0493574 1.8894788 0.9682623
## RACEMIXED RACE RACENATIVEAMERICAN RACEOTHER RACEPACIFICISLANDER
## 1.0935369 0.4903964 0.8308155 0.8149197
## HISPANICYES
## 1.1382000
Interpreting the Regression Results
Assuming all else are equal, we can make the following inferences from the model:
Note: Because they are not statistically significant, we cannot make any inferences about enrollment rates when comparing college enrollment among Blacks, Pacific Islanders, Hispanics, Mixed Race and Other races to that among Whites. We also cannot make any conclusion about the odds of enrollment for people coming from statble homes.
In 1996, the State of California banned the use of affirmative action. To test the effects of the ban on college enrollment rates, let’s restrict our sample in this regression to the state of California.
The size of the sample for our analysis is 2200.
##
## Call:
## glm(formula = COLL_ATT ~ AGE + SEX + FTOTVAL + RACE + HISPANIC +
## BACHELORS_HEAD + STABLEMARRIAGE, family = binomial(link = "logit"),
## data = IPUMS_CALI)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1869 -1.0429 -0.8291 1.2146 1.8320
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.437e+00 4.392e-01 5.548 2.88e-08 ***
## AGE -1.388e-01 1.983e-02 -6.998 2.60e-12 ***
## SEXFEMALE 3.151e-01 8.853e-02 3.559 0.000372 ***
## FTOTVAL 1.122e-06 5.940e-07 1.888 0.059002 .
## RACEASIAN 3.831e-01 1.538e-01 2.492 0.012702 *
## RACEBLACK -3.668e-02 2.037e-01 -0.180 0.857068
## RACEMIXED RACE 7.578e-01 2.893e-01 2.619 0.008808 **
## RACENATIVEAMERICAN -1.082e+00 8.128e-01 -1.331 0.183221
## RACEOTHER 4.795e-01 5.753e-01 0.834 0.404559
## RACEPACIFICISLANDER -3.203e-01 5.343e-01 -0.599 0.548919
## HISPANICYES -7.269e-01 5.659e-01 -1.285 0.198933
## BACHELORS_HEADYES 1.554e-01 1.167e-01 1.332 0.182824
## STABLEMARRIAGEYES 1.350e-01 9.417e-02 1.433 0.151754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3015.6 on 2199 degrees of freedom
## Residual deviance: 2899.6 on 2187 degrees of freedom
## AIC: 2925.6
##
## Number of Fisher Scoring iterations: 4
Odds Ratio
## (Intercept) AGE SEXFEMALE FTOTVAL
## 11.4365513 0.8704066 1.3704233 1.0000011
## RACEASIAN RACEBLACK RACEMIXED RACE RACENATIVEAMERICAN
## 1.4668978 0.9639843 2.1335949 0.3389883
## RACEOTHER RACEPACIFICISLANDER HISPANICYES BACHELORS_HEADYES
## 1.6152338 0.7259535 0.4834042 1.1681600
## STABLEMARRIAGEYES
## 1.1445206
Interpreting the Regression Results
In the second part of this analysis, I take at college enrollment rates among black people only. To begin, let’s pull a subset respondents who identify solely as black and nothing else.
First, I look at college enrollment odds among all black people.
##
## Call:
## glm(formula = COLL_ATT ~ AGE + SEX + FTOTVAL + BACHELORS_HEAD +
## STABLEMARRIAGE, family = binomial(link = "logit"), data = IPUMS_BLK)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7263 -0.8697 -0.7178 1.2398 1.9768
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.891e-01 4.458e-01 0.649 0.5166
## AGE -8.413e-02 2.060e-02 -4.084 4.44e-05 ***
## SEXFEMALE 6.071e-01 9.318e-02 6.515 7.27e-11 ***
## FTOTVAL 4.518e-06 9.421e-07 4.795 1.63e-06 ***
## BACHELORS_HEADYES 6.212e-01 1.202e-01 5.170 2.34e-07 ***
## STABLEMARRIAGEYES 2.286e-01 1.061e-01 2.154 0.0312 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2926.2 on 2325 degrees of freedom
## Residual deviance: 2764.2 on 2320 degrees of freedom
## AIC: 2776.2
##
## Number of Fisher Scoring iterations: 4
Odds Ratio
## (Intercept) AGE SEXFEMALE FTOTVAL
## 1.3352525 0.9193131 1.8350230 1.0000045
## BACHELORS_HEADYES STABLEMARRIAGEYES
## 1.8612173 1.2568511
Interpreting the Regression Results
All variables included are statistically significant when it comes to college enrollment rates among black people. From the odds ratios, we can make following conclusions about the effects of each variable on black college enrollment rates.
What if the head of a black household is a foreigner, meaning they are not from the U.S. or any of its territories? Does that affect the odds of college enrollment for a black person from such a household?
##
## Call:
## glm(formula = COLL_ATT ~ AGE + SEX + FTOTVAL + BACHELORS_HEAD +
## STABLEMARRIAGE + FOREIGN_HEAD, family = binomial(link = "logit"),
## data = IPUMS_BLK)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.599 -0.866 -0.706 1.206 1.999
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.059e-01 4.479e-01 0.460 0.6458
## AGE -8.279e-02 2.067e-02 -4.005 6.20e-05 ***
## SEXFEMALE 6.179e-01 9.360e-02 6.601 4.07e-11 ***
## FTOTVAL 4.266e-06 9.427e-07 4.526 6.02e-06 ***
## BACHELORS_HEADYES 5.957e-01 1.207e-01 4.936 7.97e-07 ***
## STABLEMARRIAGEYES 2.052e-01 1.066e-01 1.925 0.0542 .
## FOREIGN_HEADYES 5.634e-01 1.342e-01 4.197 2.70e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2926.2 on 2325 degrees of freedom
## Residual deviance: 2746.9 on 2319 degrees of freedom
## AIC: 2760.9
##
## Number of Fisher Scoring iterations: 4
Odds Ratio
## (Intercept) AGE SEXFEMALE FTOTVAL
## 1.2285950 0.9205414 1.8550405 1.0000043
## BACHELORS_HEADYES STABLEMARRIAGEYES FOREIGN_HEADYES
## 1.8142718 1.2278002 1.7565551
Interpreting the Regression Results
Let’s look citizenship status of the respondents. Does citizenship affect a black person’s chances of enrolling in college?
##
## Call:
## glm(formula = COLL_ATT ~ AGE + SEX + FTOTVAL + BACHELORS_HEAD +
## STABLEMARRIAGE + FOREIGN_HEAD + CITIZEN, family = binomial(link = "logit"),
## data = IPUMS_BLK)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.599 -0.866 -0.706 1.206 1.999
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.091e-01 5.306e-01 0.394 0.693493
## AGE -8.280e-02 2.068e-02 -4.003 6.25e-05 ***
## SEXFEMALE 6.179e-01 9.360e-02 6.601 4.07e-11 ***
## FTOTVAL 4.266e-06 9.429e-07 4.524 6.06e-06 ***
## BACHELORS_HEADYES 5.956e-01 1.207e-01 4.934 8.07e-07 ***
## STABLEMARRIAGEYES 2.052e-01 1.066e-01 1.925 0.054222 .
## FOREIGN_HEADYES 5.627e-01 1.472e-01 3.822 0.000132 ***
## CITIZENYES -3.061e-03 2.689e-01 -0.011 0.990918
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2926.2 on 2325 degrees of freedom
## Residual deviance: 2746.9 on 2318 degrees of freedom
## AIC: 2762.9
##
## Number of Fisher Scoring iterations: 4
Odds Ratio
## (Intercept) AGE SEXFEMALE FTOTVAL
## 1.2325786 0.9205343 1.8550375 1.0000043
## BACHELORS_HEADYES STABLEMARRIAGEYES FOREIGN_HEADYES CITIZENYES
## 1.8141966 1.2277773 1.7553466 0.9969437
Interpreting the Regression Results
In this section, I assess how college enrollment odds differ for black children with an immigrant as head of household compared to those with non-immigrant head of household. I pay particular to those with African or Caribbeans as head of household since majority of the black immigrant population come from African and Caribbean countries.
##
## Call:
## glm(formula = COLL_ATT ~ AGE + SEX + FTOTVAL + BACHELORS_HEAD +
## STABLEMARRIAGE + BPL_HEAD, family = binomial(link = "logit"),
## data = IPUMS_BLK)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5933 -0.8649 -0.7047 1.2064 2.0007
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.051e-01 4.481e-01 0.458 0.64715
## AGE -8.295e-02 2.068e-02 -4.011 6.06e-05 ***
## SEXFEMALE 6.188e-01 9.370e-02 6.604 4.00e-11 ***
## FTOTVAL 4.257e-06 9.429e-07 4.515 6.32e-06 ***
## BACHELORS_HEADYES 5.954e-01 1.208e-01 4.931 8.20e-07 ***
## STABLEMARRIAGEYES 2.062e-01 1.067e-01 1.931 0.05344 .
## BPL_HEADAFRICA 5.807e-01 1.885e-01 3.080 0.00207 **
## BPL_HEADCARIBBEAN 5.701e-01 1.985e-01 2.873 0.00407 **
## BPL_HEADOTHER 6.646e-01 3.545e-01 1.875 0.06084 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2926.2 on 2325 degrees of freedom
## Residual deviance: 2745.2 on 2317 degrees of freedom
## AIC: 2763.2
##
## Number of Fisher Scoring iterations: 4
Odds Ratio
## (Intercept) AGE SEXFEMALE FTOTVAL
## 1.2276315 0.9204001 1.8566887 1.0000043
## BACHELORS_HEADYES STABLEMARRIAGEYES BPL_HEADAFRICA BPL_HEADCARIBBEAN
## 1.8137854 1.2289504 1.7872992 1.7684728
## BPL_HEADOTHER
## 1.9437560
Interpreting the Regression Results
Compared to black respondents with non-immigrant head of household, those from homes with African or Caribbean heads of household have 82.6% and 75.6% higher odds of college enrollment, respectively.
Literature (Bennett and Lutz, 2009) suggests that immigrant parents have higher educational attainments compared to parents of native blacks and that perhaps has an effect on the differences in college enrollment odds between their children.