```{r setup, include=FALSE} knitr::opts_chunk$set(echo=TRUE, fig.align="center") options(width=110) ``` #### Statistics for Laboratory Scientists ( 140.615 ) ## Confidence Intervals ### Working with the Normal distribution Plot the standard Gaussian N(0,1), and look at some normal quantiles. ```{r} curve(dnorm,-3.5,3.5) abline(h=0) qnorm(0.025) abline(v=qnorm(0.025),col="red") qnorm(0.975) abline(v=qnorm(0.975),col="red") qnorm(0.005) abline(v=qnorm(0.005),col="blue") qnorm(0.995) abline(v=qnorm(0.995),col="blue") ``` ### Confidence interval for the population mean, population SD known Take ten random samples from a Normal distribution with population mean 5 and population standard deviation 2. Calculate a 95% confidence interval for the population mean from the sample, under the assumption that the standard deviation is known. Repeat once more, and compare the two confidence intervals. First confidence interval: ```{r} set.seed(1) x <- rnorm(10,mean=5,sd=2) x mean(x) qnorm(0.975) mean(x) - qnorm(0.975) * 2/sqrt(10) mean(x) + qnorm(0.975) * 2/sqrt(10) ci1 <- c(mean(x) - qnorm(0.975) * 2/sqrt(10), mean(x) + qnorm(0.975) * 2/sqrt(10)) ci1 ``` Second confidence interval: ```{r} y <- rnorm(10,mean=5,sd=2) y mean(y) ci2 <- c(mean(y) - qnorm(0.975) * 2/sqrt(10), mean(y) + qnorm(0.975) * 2/sqrt(10)) ci2 ``` Compare: ```{r} ci1 ci2 mean(ci1) mean(ci2) diff(ci1) diff(ci2) 2*qnorm(0.975)*2/sqrt(10) ``` The intervals had a 95% a priori probability of covering the population mean 5 (in this simulation, both did). Both have the same length, as the only random quantity in the confidence interval is the sample mean. ### Working with the t distribution Some t quantiles. ```{r} qt(0.975,9) qt(0.975,19) qt(0.975,29) qt(0.975,2:50) ``` A t distribution with nine degrees of freedom. ```{r} curve(dt(x,9),-3.5,3.5) abline(h=0) curve(dnorm,-3.5,3.5,add=TRUE,lty="dotted") qt(0.025,9) abline(v=qt(0.025,9),col="red") qt(0.975,9) abline(v=qt(0.975,9),col="red") ``` ### Confidence interval for the population mean, population SD unknown ##### First example from class ```{r} x <- c(0.2, 1.3, 1.4, 2.3, 4.2, 4.7, 4.7, 5.1, 5.9, 7.0) x stripchart(x,method="stack",pch=1,ylim=c(0,2)) abline(h=1,lty=2,col="gray40") segments(mean(x),0.9,mean(x),1.1, col="blue",lwd=2) ``` Calculating the 95% confidence interval. ```{r} length(x) mean(x) sd(x) sd(x)/sqrt(length(x)) qt(0.975,length(x)-1) mean(x) - qt(0.975,length(x)-1) * sd(x)/sqrt(length(x)) mean(x) + qt(0.975,length(x)-1) * sd(x)/sqrt(length(x)) mean(x) + c(-1,1) * qt(0.975,length(x)-1) * sd(x)/sqrt(length(x)) round(mean(x) + c(-1,1) * qt(0.975,length(x)-1) * sd(x)/sqrt(length(x)), 2) ``` Using the function t.test(). ```{r} t.test(x) attributes(t.test(x)) t.test(x)$conf.int round(t.test(x)$conf.int,2) ``` Plotting the confidence interval. ```{r} ci <- t.test(x)$conf.int stripchart(x,method="stack",pch=1,ylim=c(0,2)) segments(mean(x),0.9,mean(x),1.1, col="blue",lwd=3) segments(ci[1],1,ci[2],1, col="blue",lwd=3) ``` Calculating a 99% confidence interval. ```{r} qt(0.975,length(x)-1) qt(0.995,length(x)-1) mean(x) + c(-1,1) * qt(0.995,length(x)-1) * sd(x)/sqrt(length(x)) t.test(x,conf.level=0.99) t.test(x,conf.level=0.99)$conf.int ``` ##### Second example from class ```{r} x <- c(1.17, 6.35, 7.76) x stripchart(x,pch=1,xlim=c(-5,15),ylim=c(0,2)) abline(h=1,lty=2,col="gray40") segments(mean(x),0.9,mean(x),1.1, col="blue",lwd=2) ``` Calculating the 95% confidence interval. ```{r} length(x) mean(x) sd(x) sd(x)/sqrt(length(x)) qt(0.975,length(x)-1) mean(x) + c(-1,1) * qt(0.975,length(x)-1) * sd(x)/sqrt(length(x)) t.test(x)$conf.int ``` Plotting the confidence interval. ```{r} ci <- t.test(x)$conf.int stripchart(x,pch=1,xlim=c(-5,15),ylim=c(0,2)) segments(mean(x),0.9,mean(x),1.1, col="blue",lwd=3) segments(ci[1],1,ci[2],1, col="blue",lwd=3) ``` ##### Third example from class ```{r} x <- c(34.9, 28.5, 34.3, 38.4, 29.6, 29.6, 38.7, 22.4, 30.1, 23.1, 29.6, 33.4, 20.6, 33.6, 42.4, 28.2, 25.3, 22.1, 37.3, 32.1) x stripchart(x,method="stack",pch=1,ylim=c(0,2)) abline(h=1,lty=2,col="gray40") segments(mean(x),0.9,mean(x),1.1, col="blue",lwd=2) ``` Calculating the 95% confidence interval. ```{r} length(x) mean(x) sd(x) sd(x)/sqrt(length(x)) qt(0.975,length(x)-1) mean(x) + c(-1,1) * qt(0.975,length(x)-1) * sd(x)/sqrt(length(x)) t.test(x)$conf.int ``` Plotting the confidence interval. ```{r} ci <- t.test(x)$conf.int stripchart(x,method="stack",pch=1,ylim=c(0,2)) segments(mean(x),0.9,mean(x),1.1, col="blue",lwd=3) segments(ci[1],1,ci[2],1, col="blue",lwd=3) ``` ### Confidence intervals for the difference in means ##### Example from class ```{r} x <- c(59.2, 54.6, 58.1, 41.8, 64.7, 50.8, 51.6, 47.0, 66.3, 58.1) y <- c(73.3, 91.4, 69.1, 104.2, 64.6, 78.4, 56.4, 55.6, 61.1, 39.8, 43.9, 63.6, 78.6, 76.5, 45.1, 90.2) list(x,y) par(las=1) stripchart(list(x,y),method="stack",pch=1,group.names=c("A","B"),ylim=c(0,3)) stripchart(list(x,y),method="jitter",jitter=0.05,pch=1,group.names=c("A","B"),ylim=c(0,3)) ``` Calculating the confidence interval. ```{r} mean(x) mean(y) mean(x) - mean(y) sd(x) sd(y) var(x) var(y) nx <- length(x) nx ny <- length(y) ny spooled <- sqrt( (var(x)*(nx-1)+var(y)*(ny-1)) / (nx+ny-2)) spooled spooled * sqrt(1/nx+1/ny) qt(0.975,nx+ny-2) mean(x)-mean(y) - qt(0.975,nx+ny-2) * spooled * sqrt(1/nx+1/ny) mean(x)-mean(y) + qt(0.975,nx+ny-2) * spooled * sqrt(1/nx+1/ny) mean(x)-mean(y) + c(-1,1) * qt(0.975,nx+ny-2) * spooled * sqrt(1/nx+1/ny) ``` 95\% CI for the population mean of the first population using the function t.test(). ```{r} t.test(x)$conf ``` 95\% CI for the population mean of the second population. ```{r} t.test(y)$conf ``` 95% CI for difference in population means assuming the population SDs are equal. ```{r} t.test(x,y,var.equal=TRUE)$conf ``` 95\% CI for difference in population means not assuming the population SDs are equal. ```{r} t.test(x,y)$conf ``` ### Confidence intervals for the population standard deviations ##### Example from class First population. ```{r} x length(x) sd(x) ``` The chi-square distribution. ```{r} curve(dchisq(x,df=nx-1),from=0,to=30) abline(h=0) qchisq(0.025,nx-1) abline(v=qchisq(0.025,nx-1),lty="dotted",col="blue") qchisq(0.975,nx-1) abline(v=qchisq(0.975,nx-1),lty="dotted",col="blue") ``` 95\% CI for the **population standard deviation** of the first population. ```{r} L <- qchisq(0.025,nx-1) U <- qchisq(0.975,nx-1) sd(x) * sqrt((nx-1)/U) sd(x) * sqrt((nx-1)/L) sd(x) * sqrt((nx-1)/c(U,L)) ``` 95\% CI for the **population variance** of the first population. ```{r} var(x) ( sd(x)*sqrt((nx-1)/c(U,L)) )^2 var(x) * (nx-1) / c(U,L) ``` We can write a function to make this more convenient! ```{r} sdci <- function(x,conf.level=0.95){ nx <- length(x) alpha <- 1-conf.level ci <- sd(x)*sqrt((nx-1)/qchisq(c(1-alpha/2,alpha/2),nx-1)) return(ci) } ``` 95\% CI for the population standard deviation of the first population. ```{r} sdci(x) ``` 99\% CI for the population standard deviation of the first population. ```{r} sdci(x,conf.level=0.99) ``` 95\% CI for the population variance of the first population. ```{r} sdci(x)^2 ``` 95\% CI for the population standard deviation of the **second** population. ```{r} sdci(y) ```