```{r setup, include=FALSE} knitr::opts_chunk$set(echo=TRUE, fig.align="center") ``` #### Statistics for Laboratory Scientists ( 140.615 ) ## Prediction and Calibration ### Prediction #### A simple example Generate some random data, 20 design points equally spaced on the x-axis, and assume E[y|x] = 10 + 5x. Assume Gaussian noise with mean zero and standard deviation 10. ```{r} set.seed(1) x <- 1:20 x eps <- rnorm(20, sd=10) eps y <- 10 + 5*x + eps y plot(x,y) ``` Fit a linear regression model. ```{r} xydat <- data.frame(x,y) head(xydat) lm.fit <- lm(y ~ x, data=xydat) summary(lm.fit) confint(lm.fit) ``` The fitted values. ```{r} lm.fit$fitted fitted(lm.fit) predict(lm.fit) ``` Make a picture with the regression line and the fitted values. ```{r} plot(xydat$x, xydat$y, xlab="x", ylab="y") abline(lsfit(xydat$x, xydat$y)) points(xydat$x, predict(lm.fit), pch=21, bg="grey") ``` You can use the 'predict' function to get predicted values and confidence limits for the mean response for any value of x. ```{r} predict(lm.fit, data.frame(x=c(5,10,20)), interval="confidence") predict(lm.fit, data.frame(x=c(5,10,20)), interval="confidence", level=0.99) ``` Plot the confidence band for the mean response. ```{r} xx <- seq(1, 20, by=0.1) predict.mean <- predict(lm.fit, data.frame(x=xx), interval="confidence") plot(xydat$x, xydat$y, xlab="x", ylab="y", ylim=range(predict.mean), col="lightgrey") lines(xx, predict.mean[,1], lwd=2) lines(xx, predict.mean[,2], lty=2) lines(xx, predict.mean[,3], lty=2) ``` You can also use the 'predict' function to get prediction intervals for new observations. ```{r} predict(lm.fit, data.frame(x=c(5,10,20)), interval="prediction") ``` Plot the confidence band for the mean response, and the prediction interval for new observations. ```{r} predict.new <- predict(lm.fit, data.frame(x=xx), interval="prediction") plot(xydat$x, xydat$y, xlab="x", ylab="y", ylim=range(predict.new), col="lightgrey") lines(xx, predict.mean[,1], lwd=2) lines(xx, predict.mean[,2], lty=2) lines(xx, predict.mean[,3], lty=2) lines(xx, predict.new[,2], lty=2, col="blue") lines(xx, predict.new[,3], lty=2, col="blue") ``` #### Example: David Sullivan's heme data ```{r} h2o2 <- rep(c(0,10,25,50), each=3) pf3d7 <- c(0.3399,0.3563,0.3538, 0.3168,0.3054,0.3174, 0.2460,0.2618,0.2848, 0.1535,0.1613,0.1525) dat <- data.frame(conc=h2o2, od=pf3d7) ``` The regression of optical density on hydrogen peroxide concentration. ```{r} lm.fit <- lm(od ~ conc, data=dat) summary(lm.fit)$coef ``` Predict the optical density at H202 concentration 30. ```{r} predict(lm.fit, data.frame(conc=30)) ``` Predict the optical density at H202 concentrations 20, 30, and 40. ```{r} predict(lm.fit, data.frame(conc=c(20,30,40))) ``` Predict the optical density at H202 concentrations 20, 30, and 40, and get the 95% confidence intervals. ```{r} predict(lm.fit, data.frame(conc=c(20,30,40)), interval="confidence") ``` Predict the optical density at H202 concentrations 20, 30, and 40, and get the 95% prediction intervals. ```{r} predict(lm.fit, data.frame(conc=c(20,30,40)),interval="prediction") ``` Plot the confidence band for the mean response, and the prediction interval for new observations. ```{r} xx <- seq(0, 50, length=101) predict.mean <- predict(lm.fit, data.frame(conc=xx), interval="confidence") predict.new <- predict(lm.fit, data.frame(conc=xx), interval="prediction") plot(dat$conc, dat$od, xlab="H2O2 concentration", ylab="OD", ylim=range(predict.new), col="lightgrey") lines(xx, predict.mean[,1], lwd=2) lines(xx, predict.mean[,2], lty=2) lines(xx, predict.mean[,3], lty=2) lines(xx, predict.new[,2], lty=2, col="blue") lines(xx, predict.new[,3], lty=2, col="blue") ``` ### Calibration #### Example from class: concentration of quinine in a set of standards ```{r} dat <- data.frame(quinine=c( 0, 0, 0, 12,12,12, 24,24,24, 36,36,36), fluor=c(100.45, 98.92,101.33, 133.19,127.33,126.78, 152.72,157.43,160.81, 188.73,191.96,183.70)) dat ``` Plot the data. ```{r} plot(dat) abline(lsfit(dat$quinine, dat$fluor), col="blue") ``` Using the 'calibrate' function from the SPH.140.615 package, estimate the concentration corresponding to the fluorescence of 143.70. ```{r} library(SPH.140.615) calibrate(dat$quinine, dat$fluor, 143.70) ``` Estimate the concentration on the basis of 3 independent measurements. ```{r} newy <- c(148.56, 149.36, 150.29) calibrate(dat$quinine, dat$fluor, newy) ```