Statistics for Laboratory Scientists ( 140.615 )

Survival Analysis

We will use the ‘survival’ and the ‘MASS’ R packages for the analyses. If you have not installed these packages, uncomment the first two lines in the code chunk.

# install.packages("survival") 
# install.packages("MASS") 
library(survival)
library(MASS)

Example from class: the Gehan data

Gehan, E.A. (1965) A generalized Wilcoxon test for comparing arbitrarily single-censored samples. Biometrika 52, 203–233.

gehan
##    pair time cens   treat
## 1     1    1    1 control
## 2     1   10    1    6-MP
## 3     2   22    1 control
## 4     2    7    1    6-MP
## 5     3    3    1 control
## 6     3   32    0    6-MP
## 7     4   12    1 control
## 8     4   23    1    6-MP
## 9     5    8    1 control
## 10    5   22    1    6-MP
## 11    6   17    1 control
## 12    6    6    1    6-MP
## 13    7    2    1 control
## 14    7   16    1    6-MP
## 15    8   11    1 control
## 16    8   34    0    6-MP
## 17    9    8    1 control
## 18    9   32    0    6-MP
## 19   10   12    1 control
## 20   10   25    0    6-MP
## 21   11    2    1 control
## 22   11   11    0    6-MP
## 23   12    5    1 control
## 24   12   20    0    6-MP
## 25   13    4    1 control
## 26   13   19    0    6-MP
## 27   14   15    1 control
## 28   14    6    1    6-MP
## 29   15    8    1 control
## 30   15   17    0    6-MP
## 31   16   23    1 control
## 32   16   35    0    6-MP
## 33   17    5    1 control
## 34   17    6    1    6-MP
## 35   18   11    1 control
## 36   18   13    1    6-MP
## 37   19    4    1 control
## 38   19    9    0    6-MP
## 39   20    1    1 control
## 40   20    6    0    6-MP
## 41   21    8    1 control
## 42   21   10    0    6-MP
Surv(gehan$time,gehan$cens)
##  [1]  1  10  22   7   3  32+ 12  23   8  22  17   6   2  16  11  34+  8  32+ 12  25+  2  11+  5  20+  4  19+
## [27] 15   6   8  17+ 23  35+  5   6  11  13   4   9+  1   6+  8  10+

The Kaplan-Meier plots.

par(las=1)
gehan.surv <- survfit(Surv(time,cens) ~ treat, data=gehan)
plot(gehan.surv)

A fancier plot.

par(las=1, yaxs="i")
plot(gehan.surv, xlab="time of remission (weeks)", ylab="S(t)", col=c("red","blue"), ylim=c(0,1.05), lwd=2) 
legend(25, 1, c("Control","6-MP"), lty=1, col=c("blue","red"), lwd=2) 

Kaplan-Meier plots with confidence intervals.

par(mfrow=c(1,2), las=1, yaxs="i")
gehan.0 <- survfit(Surv(time,cens) ~ 1, data=subset(gehan, treat=="control"))
gehan.1 <- survfit(Surv(time,cens) ~ 1, data=subset(gehan, treat=="6-MP"))
plot(gehan.0, xlab="time of remission (weeks)", ylab="S(t)", xlim=c(0,35), ylim=c(0,1.05), col=c("blue")) 
plot(gehan.1, xlab="time of remission (weeks)", ylab="S(t)", xlim=c(0,35), ylim=c(0,1.05), col=c("red")) 

The log-rank test.

survdiff(Surv(time,cens) ~ treat, data=gehan)
## Call:
## survdiff(formula = Surv(time, cens) ~ treat, data = gehan)
## 
##                N Observed Expected (O-E)^2/E (O-E)^2/V
## treat=6-MP    21        9     19.3      5.46      16.8
## treat=control 21       21     10.7      9.77      16.8
## 
##  Chisq= 16.8  on 1 degrees of freedom, p= 4e-05

The Cox proportional hazards model.

gehan.cox <- coxph(Surv(time,cens) ~ treat, data=gehan)
summary(gehan.cox)
## Call:
## coxph(formula = Surv(time, cens) ~ treat, data = gehan)
## 
##   n= 42, number of events= 30 
## 
##                coef exp(coef) se(coef)     z Pr(>|z|)    
## treatcontrol 1.5721    4.8169   0.4124 3.812 0.000138 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##              exp(coef) exp(-coef) lower .95 upper .95
## treatcontrol     4.817     0.2076     2.147     10.81
## 
## Concordance= 0.69  (se = 0.041 )
## Likelihood ratio test= 16.35  on 1 df,   p=5e-05
## Wald test            = 14.53  on 1 df,   p=1e-04
## Score (logrank) test = 17.25  on 1 df,   p=3e-05
Interpretation of the model

The hazard for coming out of remission is

round(exp(gehan.cox$coefficients),2)
## treatcontrol 
##         4.82

times higher in the control group, with a 95% confidence interval of

round(exp(confint(gehan.cox)),2)
##              2.5 % 97.5 %
## treatcontrol  2.15  10.81

Changing the contrast.

gehan$treat <- factor(gehan$treat, levels=c("control","6-MP"))
gehan.cox <- coxph(Surv(time,cens) ~ treat, data=gehan)
summary(gehan.cox)
## Call:
## coxph(formula = Surv(time, cens) ~ treat, data = gehan)
## 
##   n= 42, number of events= 30 
## 
##              coef exp(coef) se(coef)      z Pr(>|z|)    
## treat6-MP -1.5721    0.2076   0.4124 -3.812 0.000138 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## treat6-MP    0.2076      4.817   0.09251    0.4659
## 
## Concordance= 0.69  (se = 0.041 )
## Likelihood ratio test= 16.35  on 1 df,   p=5e-05
## Wald test            = 14.53  on 1 df,   p=1e-04
## Score (logrank) test = 17.25  on 1 df,   p=3e-05

Comparing the treatment to the control group, the relative risk of coming out of remission is

round(exp(gehan.cox$coefficients),2)
## treat6-MP 
##      0.21

The 95% confidence interval for the relative risk is

round(exp(confint(gehan.cox)),2)
##           2.5 % 97.5 %
## treat6-MP  0.09   0.47

Example from class: leukemia patients

Survival times are given for 33 patients who died from acute myelogenous leukemia. Also measured was the patient’s white blood cell count at the time of diagnosis. The patients were also factored into 2 groups according to the presence or absence of a morphologic characteristic of white blood cells.

leuk
##       wbc      ag time
## 1    2300 present   65
## 2     750 present  156
## 3    4300 present  100
## 4    2600 present  134
## 5    6000 present   16
## 6   10500 present  108
## 7   10000 present  121
## 8   17000 present    4
## 9    5400 present   39
## 10   7000 present  143
## 11   9400 present   56
## 12  32000 present   26
## 13  35000 present   22
## 14 100000 present    1
## 15 100000 present    1
## 16  52000 present    5
## 17 100000 present   65
## 18   4400  absent   56
## 19   3000  absent   65
## 20   4000  absent   17
## 21   1500  absent    7
## 22   9000  absent   16
## 23   5300  absent   22
## 24  10000  absent    3
## 25  19000  absent    4
## 26  27000  absent    2
## 27  28000  absent    3
## 28  31000  absent    8
## 29  26000  absent    4
## 30  21000  absent    3
## 31  79000  absent   30
## 32 100000  absent    4
## 33 100000  absent   43
summary(leuk)
##       wbc               ag          time       
##  Min.   :   750   absent :16   Min.   :  1.00  
##  1st Qu.:  5300   present:17   1st Qu.:  4.00  
##  Median : 10500                Median : 22.00  
##  Mean   : 29165                Mean   : 40.88  
##  3rd Qu.: 32000                3rd Qu.: 65.00  
##  Max.   :100000                Max.   :156.00

The Cox proportional hazards model.

leuk.cox <- coxph(Surv(time) ~ ag + log10(wbc), data=leuk)
summary(leuk.cox)
## Call:
## coxph(formula = Surv(time) ~ ag + log10(wbc), data = leuk)
## 
##   n= 33, number of events= 33 
## 
##               coef exp(coef) se(coef)      z Pr(>|z|)   
## agpresent  -1.0691    0.3433   0.4293 -2.490  0.01276 * 
## log10(wbc)  0.8467    2.3318   0.3132  2.703  0.00687 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##            exp(coef) exp(-coef) lower .95 upper .95
## agpresent     0.3433     2.9126     0.148    0.7964
## log10(wbc)    2.3318     0.4288     1.262    4.3083
## 
## Concordance= 0.726  (se = 0.047 )
## Likelihood ratio test= 15.64  on 2 df,   p=4e-04
## Wald test            = 15.06  on 2 df,   p=5e-04
## Score (logrank) test = 16.49  on 2 df,   p=3e-04
Interpretation of the model

The relative risk comparing patients with the morphologic characteristic of the white blood cells to the ones without the characteristic is

round(exp(leuk.cox$coefficients[1]),2)
## agpresent 
##      0.34

assuming the patients have the same white blood cell count. The 95% confidence interval for this relative risk is

round(exp(confint(leuk.cox)[1,]),2)
##  2.5 % 97.5 % 
##   0.15   0.80

Comparing patients in the same morphologic characteristic group that differ ten-fold in the white blood cell count, the relative risk is

round(exp(leuk.cox$coefficients[2]),2)
## log10(wbc) 
##       2.33

The 95% confidence interval for this relative risk is

round(exp(confint(leuk.cox)[2,]),2)
##  2.5 % 97.5 % 
##   1.26   4.31
Alternative analysis

There are no censored observations in these data, so we could also use a linear model to analyze the data, with a log-transformed response.

par(mfrow=c(1,2))
hist(leuk$time, xlab="", main="survival time")
hist(log2(leuk$time), xlab="", main="log2 survival time")

Let’s explore the relationship between the predictors (test results and log-transformed white blood cell counts), and the log-transformed response.

par(mfrow=c(1,2))
plot(log10(leuk$wbc), log2(leuk$time), xlab="log10 white blood cell count", ylab="log2 survival time")
abline(lsfit(log10(leuk$wbc),log2(leuk$time)), col="red", lty=2)
boxplot(split(log2(leuk$time),leuk$ag), ylab="log2 survival time")

The linear model.

leuk.lm <- lm(log2(time) ~ ag + log10(wbc), data=leuk)
summary(leuk.lm)
## 
## Call:
## lm(formula = log2(time) ~ ag + log10(wbc), data = leuk)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1355 -1.3521  0.2903  0.9249  3.7165 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.1914     2.3297   4.804 4.05e-05 ***
## agpresent     1.4258     0.6291   2.266  0.03081 *  
## log10(wbc)   -1.8963     0.5470  -3.467  0.00161 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.802 on 30 degrees of freedom
## Multiple R-squared:  0.3799, Adjusted R-squared:  0.3386 
## F-statistic: 9.191 on 2 and 30 DF,  p-value: 0.00077
leuk.lm$coefficients
## (Intercept)   agpresent  log10(wbc) 
##   11.191360    1.425766   -1.896327
confint(leuk.lm)
##                  2.5 %     97.5 %
## (Intercept)  6.4334939 15.9492258
## agpresent    0.1409149  2.7106166
## log10(wbc)  -3.0134224 -0.7792313
Interpretation of the linear model

Comparing patients with the morphologic characteristic of the white blood cells to the ones without the characteristic, we expect the survival time to be

round(2^leuk.lm$coefficients[2],2)
## agpresent 
##      2.69

times higher in the group with the characteristic, assuming the patients have the same white blood cell count. The 95% confidence interval for this fold change is

round(2^(confint(leuk.lm)[2,]),2)
##  2.5 % 97.5 % 
##   1.10   6.55

Comparing patients in the same morphologic characteristic group that differ ten-fold in the white blood cell count, we expect the ratio of survival times to be

round(2^leuk.lm$coefficients[3],2)
## log10(wbc) 
##       0.27

The 95% confidence interval for this fold change is

round(2^(confint(leuk.lm)[3,]),2)
##  2.5 % 97.5 % 
##   0.12   0.58
Remember

All models are wrong, but some are useful.

https://en.wikipedia.org/wiki/All_models_are_wrong