Data Import and Manipulation

  • We first read in our cleaned data brfss_with_air2.csv and mutate the variable types as needed.

  • We have asthma and asthma_now as the outcome variables, mean_aqi_month,mean_so2_month, mean_no2_month, mean_co_month and mean_pm2_5_month as predictors. mental_health, physical_health, county, age, race, smoker and income as the covariates.

asthma_df2 = 
  read_csv("./data/brfss_with_air2.csv") %>%
  mutate(
    #outcome
    asthma = as.numeric(asthma),
    asthma_now = as.numeric(asthma_now),
    #predictors below
    mean_aqi_month = as.numeric(mean_aqi_month),
    mental_health = as.numeric(mental_health),
    physical_health = as.numeric(physical_health),
    county = as.factor(county),
    age = as.factor(age),
    race = as.factor(race),
    smoker = as.factor(smoker),
    income = as.factor(income)
    )

Linear Regression with asthma emergency visits time as outcome

  • Based on our exploratory analysis, We first ran a linear regression model with asthma_emergency as the outcome variable and check the correlation with mean_so2_month, mean_no2_month, mean_co_month and mean_pm2_5_month adjusting for other predictors.

SO2

linear_fit_so2_emergency = 
  asthma_df2 %>%
  lm(asthma_emergency ~ mental_health + physical_health + sex + smoker + race + age + income + county + mean_so2_month, data = .) %>% 
  broom::tidy() 

linear_fit_so2_emergency %>% 
  knitr::kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) 1.087 0.976 1.113 0.266
mental_health 0.024 0.016 1.450 0.148
physical_health 0.013 0.015 0.890 0.374
sex -0.263 0.288 -0.913 0.362
smoker2 0.042 0.598 0.069 0.945
smoker3 0.570 0.434 1.314 0.189
smoker4 0.093 0.399 0.232 0.816
race2 0.982 0.514 1.911 0.057
race3 -0.638 0.885 -0.721 0.471
race4 -1.336 2.889 -0.463 0.644
race5 -0.549 2.048 -0.268 0.789
race6 0.397 1.449 0.274 0.784
race7 0.558 0.895 0.624 0.533
race8 0.029 0.532 0.055 0.956
age2 -1.558 0.767 -2.031 0.043
age3 -0.745 0.711 -1.049 0.295
age4 -0.514 0.734 -0.700 0.484
age5 -0.512 0.671 -0.763 0.446
age6 -0.888 0.697 -1.274 0.203
age7 -0.750 0.686 -1.094 0.275
age8 -0.232 0.714 -0.324 0.746
age9 -1.767 0.766 -2.307 0.022
age10 -0.876 0.811 -1.080 0.281
age11 -1.705 0.837 -2.038 0.042
age12 -0.848 0.910 -0.933 0.352
age13 -1.621 1.132 -1.432 0.153
income2 -0.472 0.488 -0.967 0.334
income3 -0.645 0.529 -1.219 0.224
income4 -1.132 0.533 -2.124 0.034
income5 -1.509 0.445 -3.392 0.001
countyBronx -0.521 0.836 -0.623 0.533
countyChautauqua -0.418 1.006 -0.415 0.678
countyErie 0.360 0.686 0.524 0.601
countyMonroe 0.010 0.700 0.014 0.989
countyNassau 0.476 0.713 0.667 0.505
countyNew York -0.185 0.763 -0.242 0.809
countyNiagara 0.183 0.896 0.204 0.839
countyOnondaga 0.321 0.744 0.431 0.667
countyQueens 0.228 0.750 0.304 0.761
countyRensselaer 0.864 0.969 0.891 0.373
countySchenectady -0.375 1.058 -0.354 0.723
countySuffolk -0.087 0.698 -0.124 0.901
countyUlster 0.919 0.936 0.982 0.327
mean_so2_month 0.166 0.047 3.564 0.000

Interpretation

  • The results show that mean_so2_month, the average SO2 concentration has a positive coefficient of 0.166 with Asthma_emergency occurrence. This means that higher SO2 concentration contributes to Asthma emergency visit times with an 0.166 in cases per unit increase of average SO2 concentration in one month, adjusting for other covariate. The p-value is 0.000, which means that this is significant at a significance level of 0.05.

NO2

linear_fit_no2_emergency = asthma_df2 %>%
  lm(asthma_emergency ~ mental_health + physical_health + sex + smoker + race + age + income + county + mean_no2_month, data = .) %>% 
  broom::tidy()

linear_fit_no2_emergency %>% 
  knitr::kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) -3.547 2.132 -1.664 0.097
mental_health 0.029 0.027 1.098 0.273
physical_health 0.020 0.022 0.895 0.372
sex -0.303 0.436 -0.694 0.488
smoker2 0.800 0.900 0.890 0.375
smoker3 1.067 0.688 1.551 0.122
smoker4 0.545 0.627 0.870 0.385
race2 0.826 0.711 1.161 0.247
race3 -1.584 1.276 -1.242 0.215
race4 -1.908 3.548 -0.538 0.591
race5 -0.545 2.495 -0.219 0.827
race6 -0.137 2.506 -0.055 0.956
race7 -0.014 1.207 -0.012 0.991
race8 0.207 0.727 0.285 0.776
age2 -1.927 1.114 -1.729 0.085
age3 -0.981 1.095 -0.896 0.371
age4 -0.663 1.142 -0.580 0.562
age5 -0.360 1.048 -0.344 0.731
age6 -1.344 1.099 -1.223 0.223
age7 -0.844 1.025 -0.824 0.411
age8 0.332 1.065 0.312 0.756
age9 -2.004 1.175 -1.705 0.089
age10 -0.398 1.336 -0.298 0.766
age11 -2.031 1.364 -1.490 0.138
age12 -0.443 1.458 -0.304 0.762
age13 -2.240 1.596 -1.404 0.162
income2 -0.187 0.772 -0.242 0.809
income3 -0.512 0.800 -0.639 0.523
income4 -1.643 0.868 -1.892 0.060
income5 -1.759 0.675 -2.605 0.010
countyErie 2.541 1.086 2.339 0.020
countyNassau 1.697 1.059 1.601 0.111
countyNew York -0.864 0.935 -0.924 0.357
countyQueens 0.472 0.805 0.587 0.558
countySuffolk 2.961 1.240 2.388 0.018
mean_no2_month 0.195 0.056 3.473 0.001

Interpretation

  • The results show that mean_no2_month, the average NO2 concentration has a positive coefficient of 0.195 with Asthma_emergency occurrence. This means that higher NO2 concentration contributes to Asthma emergency visit times with an 0.195 in cases per unit increase of average NO2 concentration in one month, adjusting for other covariate. The p-value is 0.001, which means that this is significant at a significance level of 0.05.

CO

linear_fit_co_emergency = asthma_df2 %>%
  lm(asthma_emergency ~ mental_health + physical_health + sex + smoker + race + age + income + county + mean_co_month, data = .) %>% 
  broom::tidy()

linear_fit_co_emergency %>% 
  knitr::kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) 1.183 1.146 1.033 0.302
mental_health 0.008 0.017 0.492 0.623
physical_health 0.023 0.015 1.516 0.130
sex -0.033 0.304 -0.108 0.914
smoker2 0.052 0.592 0.088 0.930
smoker3 0.721 0.448 1.608 0.109
smoker4 -0.084 0.407 -0.207 0.836
race2 0.648 0.473 1.369 0.172
race3 -0.627 0.875 -0.716 0.474
race4 -1.157 2.957 -0.391 0.696
race5 -1.467 2.903 -0.505 0.614
race6 -0.246 1.004 -0.245 0.807
race7 0.070 0.866 0.081 0.935
race8 0.388 0.484 0.801 0.423
age2 -1.256 0.700 -1.795 0.073
age3 -0.617 0.682 -0.905 0.366
age4 -0.401 0.688 -0.583 0.560
age5 -0.111 0.655 -0.170 0.865
age6 -0.886 0.688 -1.287 0.199
age7 -0.721 0.657 -1.097 0.273
age8 0.028 0.690 0.041 0.967
age9 -1.300 0.791 -1.644 0.101
age10 -0.873 0.801 -1.090 0.276
age11 -1.364 0.885 -1.541 0.124
age12 -1.878 0.901 -2.084 0.038
age13 -1.454 1.228 -1.184 0.237
income2 -0.083 0.483 -0.173 0.863
income3 -0.562 0.539 -1.043 0.298
income4 -1.084 0.537 -2.018 0.044
income5 -1.394 0.457 -3.051 0.002
countyBronx -0.050 0.946 -0.052 0.958
countyErie 0.146 0.705 0.207 0.836
countyKings -1.031 1.185 -0.870 0.385
countyMonroe -0.370 0.727 -0.509 0.611
countyNew York 0.281 0.907 0.310 0.757
countyNiagara -0.136 0.994 -0.137 0.891
countyOnondaga -0.557 0.782 -0.712 0.477
countyQueens 0.029 0.794 0.037 0.970
countySchenectady -0.537 1.084 -0.495 0.621
countySuffolk 0.375 0.853 0.440 0.660
mean_co_month 1.635 1.764 0.927 0.354

Interpretation

  • The results show that mean_co_month, the average CO concentration has a positive coefficient of 1.64 with Asthma_emergency occurrence. This means that higher CO concentration contributes to Asthma emergency visit times with an 1.64 in cases per unit increase of average CO concentration in one month, adjusting for other covariate. The p-value is 0.354, which means that this is not significant at a significance level of 0.05.

PM2.5

linear_fit_pm2_5_emergency = asthma_df2 %>%
  lm(asthma_emergency ~ mental_health + physical_health + sex + smoker + race + age + income + county + mean_pm2_5_month, data = .)%>% 
  broom::tidy()

linear_fit_pm2_5_emergency %>% 
  knitr::kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) -5.497 4.343 -1.266 0.214
mental_health 0.006 0.064 0.089 0.930
physical_health -0.042 0.051 -0.812 0.422
sex -0.993 1.095 -0.907 0.371
smoker2 -0.303 2.813 -0.108 0.915
smoker3 1.580 1.875 0.843 0.405
smoker4 -0.863 1.658 -0.521 0.606
race2 3.309 1.842 1.796 0.081
race3 1.173 2.792 0.420 0.677
race4 4.376 4.407 0.993 0.327
race5 -0.762 3.328 -0.229 0.820
race6 2.920 5.575 0.524 0.604
race7 1.209 2.705 0.447 0.658
race8 3.669 2.023 1.813 0.078
age2 3.177 2.442 1.301 0.202
age3 1.018 2.591 0.393 0.697
age4 1.664 2.953 0.563 0.577
age5 1.991 2.464 0.808 0.424
age6 -0.130 2.363 -0.055 0.956
age7 1.610 2.473 0.651 0.519
age8 4.053 2.414 1.679 0.102
age9 2.122 3.166 0.670 0.507
age10 NA NA NA NA
age11 0.680 4.089 0.166 0.869
age12 -1.321 3.702 -0.357 0.723
income2 0.165 1.982 0.083 0.934
income3 2.677 1.668 1.605 0.117
income4 2.960 2.243 1.320 0.195
income5 1.100 1.659 0.663 0.511
countyOrange 1.231 2.058 0.598 0.554
countyQueens -0.083 1.820 -0.046 0.964
countyRichmond 0.832 2.008 0.414 0.681
countySt. Lawrence 1.867 2.519 0.741 0.463
countySteuben 3.112 5.689 0.547 0.588
mean_pm2_5_month 0.237 0.219 1.081 0.287

Interpretation

  • The results show that mean_pm2_5_month, the average PM2.5 concentration has a positive coefficient of 0.237 with Asthma_emergency occurrence. This means that higher PM2.5 concentration contributes to Asthma emergency visit times with an 0.237 in cases per unit increase of average PM2.5 concentration in one month, adjusting for other covariate. The p-value is 0.287, which means that this is not significant at a significance level of 0.05.


Logistic regression with asthma as exposure

  • We still want to see if there is any relationship between asthma and mean_aqi_month. We then conducted a Logistic regression with asthma as the main outcome varibale and check the relationship between mean_aqi_month with asthma status. A logistic regression model was used because of the binary outcome variable.
fit_logistic1b = 
  asthma_df2 %>% 
  glm(asthma ~ mental_health + physical_health + sex + smoker + race + age + income + county + mean_aqi_month, data = ., family = binomial()) 

fit_logistic1b %>% 
  broom::tidy() %>% 
  mutate(OR = exp(estimate)) %>%
  select(term, log_OR = estimate, OR, p.value) %>% 
  knitr::kable(digits = 3)
term log_OR OR p.value
(Intercept) -1.123 0.325 0.000
mental_health 0.014 1.014 0.000
physical_health 0.033 1.034 0.000
sex -0.338 0.713 0.000
smoker2 0.160 1.173 0.024
smoker3 0.185 1.203 0.000
smoker4 0.011 1.011 0.815
race2 0.217 1.243 0.000
race3 -0.392 0.676 0.000
race4 0.056 1.058 0.833
race5 0.129 1.138 0.514
race6 0.129 1.138 0.411
race7 0.624 1.866 0.000
race8 0.150 1.162 0.004
age2 -0.407 0.665 0.000
age3 -0.339 0.713 0.000
age4 -0.464 0.629 0.000
age5 -0.550 0.577 0.000
age6 -0.639 0.528 0.000
age7 -0.609 0.544 0.000
age8 -0.607 0.545 0.000
age9 -0.663 0.515 0.000
age10 -0.741 0.477 0.000
age11 -0.802 0.449 0.000
age12 -0.939 0.391 0.000
age13 -1.238 0.290 0.000
income2 -0.252 0.777 0.000
income3 -0.208 0.812 0.000
income4 -0.321 0.726 0.000
income5 -0.272 0.762 0.000
countyBronx -0.058 0.943 0.592
countyBroome -0.331 0.718 0.605
countyChautauqua -0.223 0.800 0.131
countyChemung 0.014 1.014 0.950
countyDutchess -0.128 0.880 0.330
countyErie -0.153 0.858 0.135
countyEssex 0.144 1.155 0.573
countyFranklin 0.528 1.696 0.024
countyHerkimer 0.013 1.013 0.957
countyJefferson 0.193 1.213 0.247
countyKings -0.210 0.811 0.035
countyMadison 0.168 1.183 0.449
countyMonroe 0.003 1.003 0.980
countyNassau -0.178 0.837 0.087
countyNew York -0.113 0.893 0.254
countyNiagara -0.202 0.817 0.140
countyOneida 0.062 1.064 0.641
countyOnondaga 0.001 1.001 0.991
countyOrange 0.185 1.204 0.126
countyOswego 0.101 1.106 0.548
countyPutnam 0.088 1.092 0.691
countyQueens -0.378 0.685 0.000
countyRensselaer 0.116 1.123 0.416
countyRichmond -0.112 0.894 0.380
countyRockland -0.086 0.918 0.686
countySaratoga 0.001 1.001 0.992
countySchenectady 0.053 1.054 0.733
countySt. Lawrence -0.029 0.972 0.864
countySteuben 0.102 1.107 0.570
countySuffolk -0.207 0.813 0.041
countyTompkins 0.248 1.281 0.143
countyUlster 0.009 1.009 0.951
countyWayne -0.442 0.643 0.062
countyWestchester -0.186 0.830 0.086
mean_aqi_month 0.000 1.000 0.985

Interpretation

The results show that mean_aqi_month, the average Air quality index value, has an odd ratio of 1 towards Asthma status. This implies that mean_aqi_month does not have any influence on Asthma occurrences within our investigation.