This document performs some basic exploratory analyses using ToothGrowth dataset.
We use the ToothGrowth dataset from the UsingR package.
## Loading required package: MASS
## Loading required package: HistData
## Loading required package: Hmisc
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
## [1] "len" "supp" "dose"
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
The data consists of 60 guinea pig observations and 3 variables. Guinea pigs were given two different supplements: supp=OJ is orange juice and supp=VC is ascorbic acid two different methods to deliver vitamic C to the animals. The idea is to test if supplement type and dose affects tooth growth measured by length.
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
Let’s see if mean tooth length differs by supp
mean(ToothGrowth[ToothGrowth$supp=="OJ",]$len)
## [1] 20.66333
mean(ToothGrowth[ToothGrowth$supp=="VC",]$len)
## [1] 16.96333
Let’s plot to see it on a plot.
library(ggplot2)
p <- ggplot(ToothGrowth, aes(x = dose, y = len)) +
geom_line() +
facet_wrap(~ supp) +
labs(title = "Tooth Length vs Dose by Supplement Type",
x = "Dose",
y = "Tooth Length") +
theme_minimal()
# Print the plot
print(p)
We calculate confidence interval for independent groups. We choose supp=VC as one group and supp=OJ as another group and create respective subsets and calculate means and variances for each group. We also calculate pooled variance estimate.
oj <- subset(ToothGrowth, supp == "OJ")
oj_mean = mean(oj$len)
oj_var = var(oj$len)
vc <- subset(ToothGrowth, supp == "VC")
vc_mean = mean(vc$len)
vc_var = var(vc$len)
# difference of means
diff_mean <- oj_mean-vc_mean
# pooled standard deviation
sp <- sqrt(((30-1)*oj_var^2+(30-1)*vc_var^2)/(30+30-2))
#sp <- 8
# confidence interval for difference of means
diff_mean + c(-1,1)*qt(.975, 58)*sp*(1/30+1/30)^.5
## [1] -25.92832 33.32832
Does dose levels make a difference in tooth growth? Let’s test.We use dose05, dose10 for 0.5 and 1.0 dose levels respectively. We want to see if going from dose 0.5 to 1.0 or 2.0 makes a difference. We calculate two confidence intervals using the mean difference between dose05 and dose10 and dose05 and dose20.
dose05 <- subset(ToothGrowth, dose == 0.5)
dose05_mean = mean(dose05$len)
dose05_var = var(dose05$len)
dose10 <- subset(ToothGrowth,dose==1.0)
dose10_mean = mean(dose10$len)
dose10_var = var(dose10$len)
dose20 <- subset(ToothGrowth,dose==2.0)
dose20_mean = mean(dose20$len)
dose20_var = var(dose20$len)
# difference of means
diff_dose <- dose10_mean-dose05_mean
diff_dose0520 <-dose20_mean-dose05_mean
# pooled standard deviation
sp <- sqrt(((20-1)*dose10_var^2+(30-1)*dose05_var^2)/(20+20-2))
# confidence interval for difference of means
diff_dose + c(-1,1)*qt(.975, 38)*sp*(1/20+1/20)^.5
## [1] -5.226439 23.486439
diff_dose0520 + c(-1,1)*qt(.975, 38)*sp*(1/20+1/20)^.5
## [1] 1.138561 29.851439
The confidence interval for mean differences includes 0, which means that we cannot rule out that supplement type has no impact on tooth growth.
The confidence interval for mean differences of dose-divided groups show that 0 difference between animals that receive 0.5 or 1.0 dose levels cannot be ruled out. But when we jump from 0.5 to 2.0 dose level it has at least some positive effect on tooth growth.
These conclusions are based on the assumption that the animals were assigned randomly to the treatment of different supplements and dose values. This also implies that the variance is constant across two groups. We also assume that all factors contributing to tooth growth are controlled in choosing the test animals. If there are interaction effects between two variables impacting tooth growth then we need other types of analyses.