```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, fig.align="center")
```

#### Statistics for Laboratory Scientists ( 140.615 )

## Logistic Regression

#### Example from class: spider mites 

The data are in the SPH.140.615 package.

```{r}
library(SPH.140.615)
example(spiders)
```

Fit the logistic regression model.

```{r}
glm.out <- glm(n.dead/n ~ dose, data=spiders, weights=n, family=binomial(link=logit))
summary(glm.out)$coef
```

Plot the data with the fitted curve.

```{r}
par(las=1)
plot(n.dead/n ~ dose, data=spiders, lwd=2, ylim=c(0,1), ylab="proportion dead")
u <- par("usr")
x <- seq(0,u[2],len=250)
y <- predict(glm.out, data.frame(dose=x), type="response")
lines(x, y, col="blue", lwd=2)
```

##### Interpretation of the model

The model parameters from the logistic model.

```{r}
params <- glm.out$coef
params
```

When the dose is equal to zero, the log odds of death among the spider mites is

```{r}
params[1]
```

When the dose is equal to zero, the probability of death among the spider mites is

```{r}
exp(params[1])/(1+exp(params[1]))
```

Comparing a spider mite receiving a dose one unit larger than another spider mite, the log odds ratio (or equivalently, difference in log odds) of death is 

```{r}
params[2]
```

In other words, the odds of death are

```{r}
exp(params[2])
```

times larger for the spider mite receiving the higher dose.

##### What is the LD50?

```{r}
ld50 <- -params[1]/params[2]
ld50
```

Calculate the 95% confidence interval using a function from the SPH.140.615 package to calculate the estimated standard error.

```{r}
se.ld50 <- ld50se(glm.out)
se.ld50
ci.ld50 <- ld50 + c(-1,1) * qnorm(0.975) * se.ld50
ci.ld50
```

##### Alternative fitting of the logistic regression

Sometimes the data are recorded for each experimental unit, with individual outcome (e.g., 0/1, alive/dead, etc).

```{r}
d <- spiders$dose
n0 <- 25-spiders$n.dead
n1 <- spiders$n.dead
d0 <- rep(d,n0)
d1 <- rep(d,n1)
y0 <- rep(0,sum(n0))
y1 <- rep(1,sum(n1))
dat <- data.frame(y=c(y0,y1),dose=c(d0,d1))
dim(dat)
head(dat)
tail(dat)
table(dat$y,dat$dose)
```

The logistic regression for this data format.

```{r}
glm.out <- glm(y ~ dose, data=dat, family=binomial(link=logit))
summary(glm.out)$coef
```

#### Example from class: tobacco budworms

The data are in the SPH.140.615 package, and use log2 dose.

```{r}
worms
worms$log2dose <- log2(worms$dose)
worms
```

Fit the model, sexes completely different.

```{r}
glm.out <- glm(n.dead/n ~ sex*log2dose, weights=n, data=worms, family=binomial(link=logit))
summary(glm.out)$coef
```

Fit the model with different slopes but common intercept.

```{r}
glm.out <- glm(n.dead/n ~ log2dose + sex:log2dose, weights=n, data=worms, family=binomial(link=logit))
summary(glm.out)$coef
```

Plot the data with the fitted curves.

```{r}
par(las=1)
plot(n.dead/n ~ log2dose, data=worms, col=ifelse(sex=="male","blue","red"), lwd=2, ylim=c(0,1), 
     xlab="log2 dose", ylab="proprtion dead")
u <- par("usr")
x <- seq(0,u[2],len=250)
ym <- predict(glm.out, data.frame(log2dose=x,sex=factor(rep("male",length(x)))), type="response")
lines(x, ym, col="blue")
yf <- predict(glm.out, data.frame(log2dose=x,sex=factor(rep("female",length(x)))), type="response")
lines(x, yf, col="red")
legend(u[2], u[3], col=c("blue","red"), lty=1, xjust=1, yjust=0, c("Male","Female"))
```

##### Interpretation of the model

The model parameters from the logistic model.

```{r}
params <- glm.out$coef
params
```

The intercept and slope for the female tobacco budworms.

```{r}
params[1:2]
```

The slope for the male tobacco budworms.

```{r}
params[2]+params[3]
```

The intercept and slope for the male tobacco budworms.

```{r}
c(params[1], params[2]+params[3])
```

When the dose is equal to one (i.e., the log2 dose is 0), the log odds of death among the tobacco budworms (both male and female) is

```{r}
params[1]
```

When the dose is equal to one (i.e., the log2 dose is 0), the probability of death among the tobacco budworms (both male and female) is

```{r}
exp(params[1])/(1+exp(params[1]))
```

Comparing a female tobacco budworm receiving a dose twice as large as another female tobacco budworm (i.e., the log2 dose difference is 1), the log odds ratio of death is 

```{r}
params[2]
```

In other words, the odds of death are

```{r}
exp(params[2])
```

times larger for the female tobacco budworm receiving the higher dose.

Comparing a male tobacco budworm receiving a dose twice as large as another male tobacco budworm (i.e., the log2 dose difference is 1), the log odds ratio of death is 

```{r}
params[2]+params[3]
```

In other words, the odds of death are

```{r}
exp(params[2]+params[3])
```

times larger for the male tobacco budworm receiving the higher dose.

#### Example from class: the ticks on the clay islands, revisited

The data.

```{r}
ticks
```

Calculating the total number of ticks under the respective conditions.

```{r}
ticks$n <- ticks$T + ticks$U
ticks
```

This is a $\text{2}^\text{3}$ factorial design (three factors with two levels each). These designs in particular allow you to investigate the interactions between the factors.

Two-way interactions: 

Does the tick sex effect differ across the levels of the factor leg? 

Does the tick sex effect differ across the levels of the factor deer sex? 

Does the leg effect differ across the levels of the factor deer sex? 

Three-way interaction: 

Does the tick sex / leg interaction differ across the levels of the factor deer sex? 

Equivalently:

Does the tick sex / deer sex interaction differ across the levels of the factor leg? 

Does the leg / deer sex interaction differ across the levels of the factor tick sex? 

Fit the full model, will all higher order interactions. If you fit higher order interactions (e.g., the three-way interaction) always include the lower order interactions (e.g., the two-way interactions).

```{r}
glm.out1 <- glm(T/n ~ Tsex * Leg * Dsex, data=ticks, weights=n, family=binomial(link=logit))
summary(glm.out1)$coef
```

The three-way interaction is not significant. Fit the model without the three-way interaction.

```{r}
glm.out2 <- glm(T/n ~ Tsex * Leg * Dsex - Tsex:Leg:Dsex, data=ticks, weights=n, family=binomial(link=logit))
summary(glm.out2)$coef
```

Alternative syntax.

```{r}
glm.out2 <- glm(T/n ~ (Tsex + Leg + Dsex)^2, data=ticks, weights=n, family=binomial(link=logit))
summary(glm.out2)$coef
```

The two-way interactions are not very significant. Fit an additive model, and test for a significant decrease in the likelihood.
  
```{r}
glm.out3 <- glm(T/n ~ Tsex + Leg + Dsex, data=ticks, weights=n, family=binomial(link=logit))
summary(glm.out3)$coef
anova(glm.out3, glm.out2, test="Chisq")
```

Calculate the confidence intervals.

```{r}
confint(glm.out3)
```

Summary: the ticks seems to have a preference for the treated tubes (since 0 is not in the 95% CI for the intercept), but neither tick sex, deer sex, or type of leg seem to matter as far as rates are concerned. So we fit a model with an intercept only.

```{r}
glm.out <- glm(T/n ~ 1, data=ticks, weights=n, family=binomial(link=logit))
summary(glm.out)$coef
```

Estimate the proportion of ticks running to the treated tube on average.

```{r}
b0 <- summary(glm.out)$coef[1,1]
b0
exp(b0)/(1+exp(b0))
```

Get the confidence interval for the proportion of ticks running to the treated tube on average.

```{r}
ci <- confint(glm.out)
ci
exp(ci)/(1+exp(ci))
```