Data Activity 6 / Unit 8
Task
Using the Health_Data, please perform the following functions in R:
Find out the mean, median and mode of ‘age’ variable. Find out whether median diastolic blood pressure is same among diabetic and non-diabetic participants. Find out whether systolic BP is different across occupational group.
Process and Findings
library(haven)
health_data <- read_sav("path location")
mean(health_data$age, na.rm = TRUE)
median(health_data$age, na.rm = TRUE)
mode(health_data$age) # this does not give required result as in R it provides storage mode of the objects rather than the statistical mode (as shown below)
> mean(health_data$age, na.rm = TRUE)
[1] 26.51429
> median(health_data$age, na.rm = TRUE)
[1] 27
> mode(health_data$age)
[1] "numeric"
#in order to get mode we need to first create a frequency table
find_mode <-table(health_data$age)
mode <- names(find_mode)[which.max(find_mode)]
mode
> mode
[1] "26"
#Find out whether median diastolic blood pressure is same among diabetic and non-diabetic participants.
#first test for normality of data
shapiro.test(health_data$sbp)
Shapiro-Wilk normality test
data: health_data$sbp
W = 0.95474, p-value = 3.345e-06
#P<0.5 therefore reject null hypothesis, data is not normally distributed
shapiro.test(health_data$dbp)
Shapiro-Wilk normality test
data: health_data$dbp
W = 0.97052, p-value = 0.0002204
#P<0.5 therefore reject null hypothesis, data is not normally distributed
#non parametric test required. 2 non-paired samples, Mann-Whitney U test
wilcox.test(dbp~diabetes,health_data)
#H0 median dbp is the same in diabeteic and non-diabetic participants
#H1 there is a difference in dbp between diabetic and non-diabetic participants
Wilcoxon rank sum test with continuity correction
data: dbp by diabetes
W = 3804.5, p-value = 0.7999
alternative hypothesis: true location shift is not equal to 0
#result P = 0.799 therefore there is no evidence to reject H0, DBP is the same amongst diabetic and non-diabetic patients
#Find out whether systolic BP is different across occupational group
#ocupation has multiple categories, therefore use kruskal wallice test
kruskal.test(dbp~occupation,health_data)
#H0 dbp is not different across occupational groups
#H1 dbp is different across occupational groups
Kruskal-Wallis rank sum test
data: dbp by occupation
Kruskal-Wallis chi-squared = 2.6281, df = 3, p-value = 0.4526
#result P = 0.4526 therefore there is no evidence to reject H0, DBP is not different across ocuppational groups