Create two categories of aqi values:
Those aqi values < 50 are categorized as good.
Those aqi values >= 50 are categorized as not good.
Hypothesis:
H0:The proportion of the individuals who experienced current asthma is the same across two aqi categories.
H1:The proportion of the individuals who experienced current asthma is not the same across two aqi categories.
asthma_df2 = asthma_df2 %>%
mutate(
aqi_cat =
case_when(
mean_aqi_month <= 50 ~ "Good",
mean_aqi_month > 50 ~ "Not Good",
)
)
chisq.test(asthma_df2$aqi_cat, asthma_df2$asthma_now, correct=FALSE)
##
## Pearson's Chi-squared test
##
## data: asthma_df2$aqi_cat and asthma_df2$asthma_now
## X-squared = 2.0033, df = 1, p-value = 0.157
Test statistics: X-squared = 2.0033, df = 1, p-value = 0.157
H0: The average number of the visits to asthma emergency is the same for aqi good category and aqi not good category.
H1: The average number of the visits to asthma emergency is not the same for aqi good category and aqi not good category.
asthma_df2_ttest = asthma_df2 %>%
mutate(
aqi_cat =
case_when(
mean_aqi_month <= 50 ~ "Good",
mean_aqi_month > 50 ~ "Not_Good",
)
) %>%
pivot_wider(
names_from = "aqi_cat",
values_from = "asthma_emergency"
)
t.test(asthma_df2_ttest$Good, asthma_df2_ttest$Not_Good)
##
## Welch Two Sample t-test
##
## data: asthma_df2_ttest$Good and asthma_df2_ttest$Not_Good
## t = -2.5689, df = 421.41, p-value = 0.01055
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.3715692 -0.1824764
## sample estimates:
## mean of x mean of y
## 0.718845 1.495868
H0: The proportion of the individuals who experienced heart diseases is the same across those who had asthma now and those who did not have asthma now .
H1: The proportion of the individuals who experienced heart diseases is not the same across those who had asthma now and those who did not have asthma now .
asthma_df2_chi = asthma_df2 %>%
mutate(
heart_disease = ifelse(coronary_heart_disease == "1" | heart_attack == "1" | stroke == "1", "1", "0")
)
chisq.test(asthma_df2_chi$heart_disease, asthma_df2_chi$asthma_now, correct=FALSE)
##
## Pearson's Chi-squared test
##
## data: asthma_df2_chi$heart_disease and asthma_df2_chi$asthma_now
## X-squared = 54.976, df = 1, p-value = 1.22e-13
Now we want to see whether having asthma now is am equally common occurrence within the residents of each county. To fufil this goal, we would conduct a proportion test.
H0:The proportion of the individuals who experienced asthma now is the same across all counties.
H1:The proportion of the individuals who experienced asthma now is not the same across all counties.
asthma_now_1 =
asthma_df2 %>%
drop_na(asthma_now) %>%
group_by(county) %>%
filter(asthma_now==1) %>%
count(asthma_now==1)
asthma_now_all =
asthma_df2 %>%
drop_na(asthma_now) %>%
group_by(county) %>%
count()
data_for_proptest =
left_join(asthma_now_1, asthma_now_all, by = "county")
prop.test(data_for_proptest$n.x, data_for_proptest$n.y)
##
## 36-sample test for equality of proportions without continuity
## correction
##
## data: data_for_proptest$n.x out of data_for_proptest$n.y
## X-squared = 83.022, df = 35, p-value = 8.893e-06
## alternative hypothesis: two.sided
## sample estimates:
## prop 1 prop 2 prop 3 prop 4 prop 5 prop 6 prop 7 prop 8
## 0.6945813 0.7164179 0.6666667 0.8073394 0.7368421 0.6266667 0.7641326 0.7272727
## prop 9 prop 10 prop 11 prop 12 prop 13 prop 14 prop 15 prop 16
## 0.7575758 0.8666667 0.8082192 0.7090164 0.4827586 0.6976744 0.6505747 0.6563758
## prop 17 prop 18 prop 19 prop 20 prop 21 prop 22 prop 23 prop 24
## 0.7680000 0.7785714 0.7564576 0.6782178 0.8059701 0.7222222 0.6617916 0.7383178
## prop 25 prop 26 prop 27 prop 28 prop 29 prop 30 prop 31 prop 32
## 0.6770186 0.6571429 0.7482014 0.7415730 0.7462687 0.8333333 0.6954813 0.7000000
## prop 33 prop 34 prop 35 prop 36
## 0.6967213 0.8620690 0.6851312 0.7271330
X-squared = 83.022, df = 35, p-value = 8.893e-06
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 prop 4 prop 5 prop 6 prop 7 prop 8 prop 9 prop 10 prop 11 prop 12 prop 13
0.6945813 0.7164179 0.6666667 0.8073394 0.7368421 0.6266667 0.7641326 0.7272727 0.7575758 0.8666667 0.8082192 0.7090164 0.4827586
prop 14 prop 15 prop 16 prop 17 prop 18 prop 19 prop 20 prop 21 prop 22 prop 23 prop 24 prop 25 prop 26
0.6976744 0.6505747 0.6563758 0.7680000 0.7785714 0.7564576 0.6782178 0.8059701 0.7222222 0.6617916 0.7383178 0.6770186 0.6571429
prop 27 prop 28 prop 29 prop 30 prop 31 prop 32 prop 33 prop 34 prop 35 prop 36
0.7482014 0.7415730 0.7462687 0.8333333 0.6954813 0.7000000 0.6967213 0.8620690 0.6851312 0.7271330
From the above results, p-values are small and so we we can say that the proportions of people getting asthma now are different across boroughs.