Pearson’s Chi-squared test

Create two categories of aqi values:

  • Those aqi values < 50 are categorized as good.

  • Those aqi values >= 50 are categorized as not good.

Hypothesis:

  • H0:The proportion of the individuals who experienced current asthma is the same across two aqi categories.

  • H1:The proportion of the individuals who experienced current asthma is not the same across two aqi categories.

asthma_df2 = asthma_df2 %>%
  mutate(
    aqi_cat = 
         case_when(
           mean_aqi_month <= 50 ~ "Good",
           mean_aqi_month > 50  ~ "Not Good",
           )
      ) 
chisq.test(asthma_df2$aqi_cat, asthma_df2$asthma_now, correct=FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  asthma_df2$aqi_cat and asthma_df2$asthma_now
## X-squared = 2.0033, df = 1, p-value = 0.157

Test statistics: X-squared = 2.0033, df = 1, p-value = 0.157

  • Do not have enough evidence to conclude the proportion of individuals who experienced current asthma is different across two aqi categories.


Welch Two Sample t-test

  • H0: The average number of the visits to asthma emergency is the same for aqi good category and aqi not good category.

  • H1: The average number of the visits to asthma emergency is not the same for aqi good category and aqi not good category.

asthma_df2_ttest = asthma_df2 %>%
  mutate(
    aqi_cat = 
         case_when(
           mean_aqi_month <= 50 ~ "Good",
           mean_aqi_month > 50  ~ "Not_Good",
           )
      ) %>%
  pivot_wider(
    names_from = "aqi_cat",
    values_from = "asthma_emergency"
  )

t.test(asthma_df2_ttest$Good, asthma_df2_ttest$Not_Good)
## 
##  Welch Two Sample t-test
## 
## data:  asthma_df2_ttest$Good and asthma_df2_ttest$Not_Good
## t = -2.5689, df = 421.41, p-value = 0.01055
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.3715692 -0.1824764
## sample estimates:
## mean of x mean of y 
##  0.718845  1.495868
  • t = -2.5689, df = 421.41, p-value = 0.01055
  • alternative hypothesis: true difference in means is not equal to 0
  • 95 percent confidence interval: (-1.3715692 -0.1824764)
  • sample estimates: mean of x: 0.718845; mean of y: 1.495868
  • The p-value is less than 0.01 so we can conclude that the average number of the visits to asthma emergency is not the same for aqi good category and aqi not good category.


Chi-squared test on asthma_now & heart disease

  • H0: The proportion of the individuals who experienced heart diseases is the same across those who had asthma now and those who did not have asthma now .

  • H1: The proportion of the individuals who experienced heart diseases is not the same across those who had asthma now and those who did not have asthma now .

asthma_df2_chi = asthma_df2 %>%
  mutate(
    heart_disease = ifelse(coronary_heart_disease == "1" | heart_attack == "1" | stroke == "1", "1", "0")
  ) 
chisq.test(asthma_df2_chi$heart_disease, asthma_df2_chi$asthma_now, correct=FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  asthma_df2_chi$heart_disease and asthma_df2_chi$asthma_now
## X-squared = 54.976, df = 1, p-value = 1.22e-13
  • X-squared = 54.976, df = 1, p-value = 1.22e-13
  • The p-value is less than 0.05. So we can conclude that the proportion of the individuals who experienced heart disease is not the same in those who had asthma now and those who did not have asthma now.


Proportion Test

Now we want to see whether having asthma now is am equally common occurrence within the residents of each county. To fufil this goal, we would conduct a proportion test.

  • H0:The proportion of the individuals who experienced asthma now is the same across all counties.

  • H1:The proportion of the individuals who experienced asthma now is not the same across all counties.

asthma_now_1 =
  asthma_df2 %>%
  drop_na(asthma_now) %>%
  group_by(county) %>%
  filter(asthma_now==1) %>%
  count(asthma_now==1) 

asthma_now_all =
  asthma_df2 %>%
  drop_na(asthma_now) %>%
  group_by(county) %>%
  count() 

data_for_proptest = 
  left_join(asthma_now_1, asthma_now_all, by = "county")

prop.test(data_for_proptest$n.x, data_for_proptest$n.y)
## 
##  36-sample test for equality of proportions without continuity
##  correction
## 
## data:  data_for_proptest$n.x out of data_for_proptest$n.y
## X-squared = 83.022, df = 35, p-value = 8.893e-06
## alternative hypothesis: two.sided
## sample estimates:
##    prop 1    prop 2    prop 3    prop 4    prop 5    prop 6    prop 7    prop 8 
## 0.6945813 0.7164179 0.6666667 0.8073394 0.7368421 0.6266667 0.7641326 0.7272727 
##    prop 9   prop 10   prop 11   prop 12   prop 13   prop 14   prop 15   prop 16 
## 0.7575758 0.8666667 0.8082192 0.7090164 0.4827586 0.6976744 0.6505747 0.6563758 
##   prop 17   prop 18   prop 19   prop 20   prop 21   prop 22   prop 23   prop 24 
## 0.7680000 0.7785714 0.7564576 0.6782178 0.8059701 0.7222222 0.6617916 0.7383178 
##   prop 25   prop 26   prop 27   prop 28   prop 29   prop 30   prop 31   prop 32 
## 0.6770186 0.6571429 0.7482014 0.7415730 0.7462687 0.8333333 0.6954813 0.7000000 
##   prop 33   prop 34   prop 35   prop 36 
## 0.6967213 0.8620690 0.6851312 0.7271330
  • X-squared = 83.022, df = 35, p-value = 8.893e-06

  • alternative hypothesis: two.sided

  • sample estimates:

  • prop 1 prop 2 prop 3 prop 4 prop 5 prop 6 prop 7 prop 8 prop 9 prop 10 prop 11 prop 12 prop 13

  • 0.6945813 0.7164179 0.6666667 0.8073394 0.7368421 0.6266667 0.7641326 0.7272727 0.7575758 0.8666667 0.8082192 0.7090164 0.4827586

  • prop 14 prop 15 prop 16 prop 17 prop 18 prop 19 prop 20 prop 21 prop 22 prop 23 prop 24 prop 25 prop 26

  • 0.6976744 0.6505747 0.6563758 0.7680000 0.7785714 0.7564576 0.6782178 0.8059701 0.7222222 0.6617916 0.7383178 0.6770186 0.6571429

  • prop 27 prop 28 prop 29 prop 30 prop 31 prop 32 prop 33 prop 34 prop 35 prop 36

  • 0.7482014 0.7415730 0.7462687 0.8333333 0.6954813 0.7000000 0.6967213 0.8620690 0.6851312 0.7271330

  • From the above results, p-values are small and so we we can say that the proportions of people getting asthma now are different across boroughs.