```{r setup, include=FALSE} knitr::opts_chunk$set(echo=TRUE, fig.align="center") options(width=110) ``` #### Statistics for Laboratory Scientists ( 140.615 ) ## Analysis of Variance We are starting to use functions and data from the SPH.140.615 GitHub package. Uncomment and execute the line calling the function `install.packages` if you have not yet installed devtools. ```{r} # install.packages("devtools") library(devtools,quiet=TRUE) devtools::install_github("bllfrg/SPH.140.615",quiet=TRUE) library(SPH.140.615) ``` #### Example from class: test for equality of variances The data. ```{r} xa <- c(672, 747, 749, 792, 875, 888, 930, 962, 994,1295) xb <- c(290, 359, 384, 466, 510, 516, 522, 532, 595, 706) ``` Plot the data, using the custom function `dot.plot` from the SPH.140.615 package. ```{r} dot.plot(xa,xb, includeCI=FALSE) ``` Plot the data, using the `stripchart` function. Look at the help file to explore what all the arguments do. ```{r,fig.width=4} par(las=1) stripchart(list(xa,xb), method="jitter", pch=1, vertical=TRUE, xlim=c(0.5,2.5), group.names=c("A","B")) abline(v=1:2,lty=2) ``` The 95% confidence interval for the ratio of the population variances, the pedestrian way. ```{r} ratio <- var(xa)/var(xb) ratio na <- length(xa) na nb <- length(xb) nb L <- qf(0.025,na-1,nb-1) L U <- qf(0.975,na-1,nb-1) U c(ratio/U,ratio/L) ``` The 95% confidence interval for the ratio of the population standard deviations. ```{r} sqrt(c(ratio/U,ratio/L)) ``` The built-in function. ```{r} var.test(xa,xb) attributes(var.test(xa,xb)) var.test(xa,xb)$estimate var.test(xa,xb)$p.value var.test(xa,xb)$conf.int sqrt(var.test(xa,xb)$conf.int) ``` #### Example from class: diets and blood coagulation times ```{r} coag <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59) ttt <- c("A","A","A","A", "B","B","B","B","B","B", "C","C","C","C","C","C", "D","D","D","D","D","D","D","D") ttt LETTERS[1:4] ttt <- rep(LETTERS[1:4],c(4,6,6,8)) ttt ``` Make the treatment "ttt" a factor. ```{r} str(ttt) ttt <- factor(ttt,levels=c("A","B","C","D")) str(ttt) ``` Plot the data. In the `stripchart` function you want to jitter the data points horizontally, vertically, or stack them, since there are identical coagulation times recorded in some of the groups. ```{r,fig.width=5} set.seed(1) par(las=1) stripchart(coag~ttt, method="jitter", pch=1, vertical=TRUE, xlab="diet", ylab="coagulation time [ sec ]") stripchart(jitter(coag,factor=2)~ttt, pch=1, vertical=TRUE, xlab="diet", ylab="coagulation time [ sec ]") stripchart(coag~ttt, method="stack", pch=1, vertical=TRUE, xlab="diet", ylab="coagulation time [ sec ]") ``` Note: the function `tapply` splits the first vector into groups according to the values in the second vector, and then "applies" the function, here 'mean', to each group. The within-group means. ```{r} tapply(coag,ttt,mean) ``` The number of data points per group. ```{r} tapply(coag,ttt,length) ``` One-way ANOVA, the pedestrain way. ```{r} k <- 4 m <- tapply(coag,ttt,mean) m gm <- mean(coag) gm n <- tapply(coag,ttt,length) n N <- sum(n) N m-gm SB <- sum(n*(m-gm)^2) SB coag rep(m,n) coag-rep(m,n) SW <- sum((coag-rep(m,n))^2) SW MB <- SB/(k-1) MB MW <- SW/(N-k) MW Fstat <- MB/MW Fstat p <- pf(Fstat,k-1,N-k,lower=FALSE) p ``` The built-in function. ```{r} aov(coag~ttt) summary(aov(coag~ttt)) anova(aov(coag~ttt)) ``` #### Example from class: meioses in females and males The `meiosis` data used in this lecture are available in the SPH.140.615 R package you installed and loaded above. Reference: Broman et al. (1998) Am J Hum Genet 63:861-869. The data have three columns: family, female, male. These last two give the total number of crossovers on the 22 autosomes in each of the male and female meioses. ```{r} head(meioses) str(meioses) ``` We need to make the family column a "factor" to get the ANOVA to work properly. ```{r} meioses$family <- as.factor(meioses$family) str(meioses) ``` Plot the data for the female meioses, get the within-group means, and add the means to the plot. ```{r} par(las=1) stripchart(meioses$female~meioses$family,method="jitter",pch=1) female.me <- tapply(meioses$female,meioses$family,mean) segments(female.me,(1:8)-0.25,female.me,(1:8)+0.25,lwd=2,col="blue") ``` Same for the male counts. ```{r} par(las=1) stripchart(meioses$male~meioses$family,method="jitter",pch=1) male.me <- tapply(meioses$male,meioses$family,mean) segments(male.me,(1:8)-0.25,male.me,(1:8)+0.25,lwd=2,col="blue") ``` The ANOVA tables. ```{r} out.fem <- aov(female~family,data=meioses) summary(out.fem) out.mal <- aov(male~family,data=meioses) summary(out.mal) ``` #### Another example ```{r,fig.width=5} xA <- c(47,48,51,54) xB <- c(39,44,38,39) xC <- c(47,49,41,43) x <- c(xA,xB,xC) treat <- rep(1:3,rep(4,3)) x treat ex <- data.frame(rsp=x,trt=treat) ex par(las=1) stripchart(ex$rsp~ex$trt, method="jitter", pch=1, vertical=TRUE) stripchart(jitter(ex$rsp,factor=2)~ex$trt, pch=1, vertical=TRUE) ``` The following is wrong. ```{r} aov.out <- aov(rsp~trt,data=ex) summary(aov.out) ``` The correct way. ```{r} treat <- factor(rep(1:3,rep(4,3))) ex.c <- data.frame(rsp=x,trt=treat) aov.out.c <- aov(rsp~trt,data=ex.c) summary(aov.out.c) ``` Compare. ```{r} summary(aov.out) summary(aov.out.c) str(ex) str(ex.c) summary(ex) summary(ex.c) ```