We will use the ‘survival’ and the ‘MASS’ R packages for the analyses. If you have not installed these packages, uncomment the first two lines in the code chunk.
# install.packages("survival")
# install.packages("MASS")
library(survival)
library(MASS)
Gehan, E.A. (1965) A generalized Wilcoxon test for comparing arbitrarily single-censored samples. Biometrika 52, 203–233.
gehan
## pair time cens treat
## 1 1 1 1 control
## 2 1 10 1 6-MP
## 3 2 22 1 control
## 4 2 7 1 6-MP
## 5 3 3 1 control
## 6 3 32 0 6-MP
## 7 4 12 1 control
## 8 4 23 1 6-MP
## 9 5 8 1 control
## 10 5 22 1 6-MP
## 11 6 17 1 control
## 12 6 6 1 6-MP
## 13 7 2 1 control
## 14 7 16 1 6-MP
## 15 8 11 1 control
## 16 8 34 0 6-MP
## 17 9 8 1 control
## 18 9 32 0 6-MP
## 19 10 12 1 control
## 20 10 25 0 6-MP
## 21 11 2 1 control
## 22 11 11 0 6-MP
## 23 12 5 1 control
## 24 12 20 0 6-MP
## 25 13 4 1 control
## 26 13 19 0 6-MP
## 27 14 15 1 control
## 28 14 6 1 6-MP
## 29 15 8 1 control
## 30 15 17 0 6-MP
## 31 16 23 1 control
## 32 16 35 0 6-MP
## 33 17 5 1 control
## 34 17 6 1 6-MP
## 35 18 11 1 control
## 36 18 13 1 6-MP
## 37 19 4 1 control
## 38 19 9 0 6-MP
## 39 20 1 1 control
## 40 20 6 0 6-MP
## 41 21 8 1 control
## 42 21 10 0 6-MP
Surv(gehan$time,gehan$cens)
## [1] 1 10 22 7 3 32+ 12 23 8 22 17 6 2 16 11 34+ 8 32+ 12 25+ 2 11+ 5 20+ 4 19+
## [27] 15 6 8 17+ 23 35+ 5 6 11 13 4 9+ 1 6+ 8 10+
The Kaplan-Meier plots.
par(las=1)
gehan.surv <- survfit(Surv(time,cens) ~ treat, data=gehan)
plot(gehan.surv)
A fancier plot.
par(las=1, yaxs="i")
plot(gehan.surv, xlab="time of remission (weeks)", ylab="S(t)", col=c("red","blue"), ylim=c(0,1.05), lwd=2)
legend(25, 1, c("Control","6-MP"), lty=1, col=c("blue","red"), lwd=2)
Kaplan-Meier plots with confidence intervals.
par(mfrow=c(1,2), las=1, yaxs="i")
gehan.0 <- survfit(Surv(time,cens) ~ 1, data=subset(gehan, treat=="control"))
gehan.1 <- survfit(Surv(time,cens) ~ 1, data=subset(gehan, treat=="6-MP"))
plot(gehan.0, xlab="time of remission (weeks)", ylab="S(t)", xlim=c(0,35), ylim=c(0,1.05), col=c("blue"))
plot(gehan.1, xlab="time of remission (weeks)", ylab="S(t)", xlim=c(0,35), ylim=c(0,1.05), col=c("red"))
The log-rank test.
survdiff(Surv(time,cens) ~ treat, data=gehan)
## Call:
## survdiff(formula = Surv(time, cens) ~ treat, data = gehan)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## treat=6-MP 21 9 19.3 5.46 16.8
## treat=control 21 21 10.7 9.77 16.8
##
## Chisq= 16.8 on 1 degrees of freedom, p= 4e-05
The Cox proportional hazards model.
gehan.cox <- coxph(Surv(time,cens) ~ treat, data=gehan)
summary(gehan.cox)
## Call:
## coxph(formula = Surv(time, cens) ~ treat, data = gehan)
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## treatcontrol 1.5721 4.8169 0.4124 3.812 0.000138 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## treatcontrol 4.817 0.2076 2.147 10.81
##
## Concordance= 0.69 (se = 0.041 )
## Likelihood ratio test= 16.35 on 1 df, p=5e-05
## Wald test = 14.53 on 1 df, p=1e-04
## Score (logrank) test = 17.25 on 1 df, p=3e-05
The hazard for coming out of remission is
round(exp(gehan.cox$coefficients),2)
## treatcontrol
## 4.82
times higher in the control group, with a 95% confidence interval of
round(exp(confint(gehan.cox)),2)
## 2.5 % 97.5 %
## treatcontrol 2.15 10.81
Changing the contrast.
gehan$treat <- factor(gehan$treat, levels=c("control","6-MP"))
gehan.cox <- coxph(Surv(time,cens) ~ treat, data=gehan)
summary(gehan.cox)
## Call:
## coxph(formula = Surv(time, cens) ~ treat, data = gehan)
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## treat6-MP -1.5721 0.2076 0.4124 -3.812 0.000138 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## treat6-MP 0.2076 4.817 0.09251 0.4659
##
## Concordance= 0.69 (se = 0.041 )
## Likelihood ratio test= 16.35 on 1 df, p=5e-05
## Wald test = 14.53 on 1 df, p=1e-04
## Score (logrank) test = 17.25 on 1 df, p=3e-05
Comparing the treatment to the control group, the relative risk of coming out of remission is
round(exp(gehan.cox$coefficients),2)
## treat6-MP
## 0.21
The 95% confidence interval for the relative risk is
round(exp(confint(gehan.cox)),2)
## 2.5 % 97.5 %
## treat6-MP 0.09 0.47
Survival times are given for 33 patients who died from acute myelogenous leukemia. Also measured was the patient’s white blood cell count at the time of diagnosis. The patients were also factored into 2 groups according to the presence or absence of a morphologic characteristic of white blood cells.
leuk
## wbc ag time
## 1 2300 present 65
## 2 750 present 156
## 3 4300 present 100
## 4 2600 present 134
## 5 6000 present 16
## 6 10500 present 108
## 7 10000 present 121
## 8 17000 present 4
## 9 5400 present 39
## 10 7000 present 143
## 11 9400 present 56
## 12 32000 present 26
## 13 35000 present 22
## 14 100000 present 1
## 15 100000 present 1
## 16 52000 present 5
## 17 100000 present 65
## 18 4400 absent 56
## 19 3000 absent 65
## 20 4000 absent 17
## 21 1500 absent 7
## 22 9000 absent 16
## 23 5300 absent 22
## 24 10000 absent 3
## 25 19000 absent 4
## 26 27000 absent 2
## 27 28000 absent 3
## 28 31000 absent 8
## 29 26000 absent 4
## 30 21000 absent 3
## 31 79000 absent 30
## 32 100000 absent 4
## 33 100000 absent 43
summary(leuk)
## wbc ag time
## Min. : 750 absent :16 Min. : 1.00
## 1st Qu.: 5300 present:17 1st Qu.: 4.00
## Median : 10500 Median : 22.00
## Mean : 29165 Mean : 40.88
## 3rd Qu.: 32000 3rd Qu.: 65.00
## Max. :100000 Max. :156.00
The Cox proportional hazards model.
leuk.cox <- coxph(Surv(time) ~ ag + log10(wbc), data=leuk)
summary(leuk.cox)
## Call:
## coxph(formula = Surv(time) ~ ag + log10(wbc), data = leuk)
##
## n= 33, number of events= 33
##
## coef exp(coef) se(coef) z Pr(>|z|)
## agpresent -1.0691 0.3433 0.4293 -2.490 0.01276 *
## log10(wbc) 0.8467 2.3318 0.3132 2.703 0.00687 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## agpresent 0.3433 2.9126 0.148 0.7964
## log10(wbc) 2.3318 0.4288 1.262 4.3083
##
## Concordance= 0.726 (se = 0.047 )
## Likelihood ratio test= 15.64 on 2 df, p=4e-04
## Wald test = 15.06 on 2 df, p=5e-04
## Score (logrank) test = 16.49 on 2 df, p=3e-04
The relative risk comparing patients with the morphologic characteristic of the white blood cells to the ones without the characteristic is
round(exp(leuk.cox$coefficients[1]),2)
## agpresent
## 0.34
assuming the patients have the same white blood cell count. The 95% confidence interval for this relative risk is
round(exp(confint(leuk.cox)[1,]),2)
## 2.5 % 97.5 %
## 0.15 0.80
Comparing patients in the same morphologic characteristic group that differ ten-fold in the white blood cell count, the relative risk is
round(exp(leuk.cox$coefficients[2]),2)
## log10(wbc)
## 2.33
The 95% confidence interval for this relative risk is
round(exp(confint(leuk.cox)[2,]),2)
## 2.5 % 97.5 %
## 1.26 4.31
There are no censored observations in these data, so we could also use a linear model to analyze the data, with a log-transformed response.
par(mfrow=c(1,2))
hist(leuk$time, xlab="", main="survival time")
hist(log2(leuk$time), xlab="", main="log2 survival time")
Let’s explore the relationship between the predictors (test results and log-transformed white blood cell counts), and the log-transformed response.
par(mfrow=c(1,2))
plot(log10(leuk$wbc), log2(leuk$time), xlab="log10 white blood cell count", ylab="log2 survival time")
abline(lsfit(log10(leuk$wbc),log2(leuk$time)), col="red", lty=2)
boxplot(split(log2(leuk$time),leuk$ag), ylab="log2 survival time")
The linear model.
leuk.lm <- lm(log2(time) ~ ag + log10(wbc), data=leuk)
summary(leuk.lm)
##
## Call:
## lm(formula = log2(time) ~ ag + log10(wbc), data = leuk)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1355 -1.3521 0.2903 0.9249 3.7165
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.1914 2.3297 4.804 4.05e-05 ***
## agpresent 1.4258 0.6291 2.266 0.03081 *
## log10(wbc) -1.8963 0.5470 -3.467 0.00161 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.802 on 30 degrees of freedom
## Multiple R-squared: 0.3799, Adjusted R-squared: 0.3386
## F-statistic: 9.191 on 2 and 30 DF, p-value: 0.00077
leuk.lm$coefficients
## (Intercept) agpresent log10(wbc)
## 11.191360 1.425766 -1.896327
confint(leuk.lm)
## 2.5 % 97.5 %
## (Intercept) 6.4334939 15.9492258
## agpresent 0.1409149 2.7106166
## log10(wbc) -3.0134224 -0.7792313
Comparing patients with the morphologic characteristic of the white blood cells to the ones without the characteristic, we expect the survival time to be
round(2^leuk.lm$coefficients[2],2)
## agpresent
## 2.69
times higher in the group with the characteristic, assuming the patients have the same white blood cell count. The 95% confidence interval for this fold change is
round(2^(confint(leuk.lm)[2,]),2)
## 2.5 % 97.5 %
## 1.10 6.55
Comparing patients in the same morphologic characteristic group that differ ten-fold in the white blood cell count, we expect the ratio of survival times to be
round(2^leuk.lm$coefficients[3],2)
## log10(wbc)
## 0.27
The 95% confidence interval for this fold change is
round(2^(confint(leuk.lm)[3,]),2)
## 2.5 % 97.5 %
## 0.12 0.58
All models are wrong, but some are useful.