* CL10EX1.DO Grouped survival data * AML data: weeks in remission -vs- treatment group * * * Raw data: AML data included below * Assumes files are in folder [path]\bio623 * If files are in another folder, change cd command below to * point Stata to the correct folder * To run this program, use the following Stata commands: * cd [path]\bio623 ... change directory to folder bio623 * do cl10ex1 * OUTLINE: * Part a. Input data, define as a survival dataset * Part b. Define survival variables: stset * Part c. Descriptive summaries: stdes, stsum * Part d. Bin the time for grouped survival analysis: stsplit * Part e. Tabulate rates by categorical variable group(x) and bins: strate * Part f. Calculate survivor function, S(t) from grouped data * Part g. Plot Survivor function S(t) for grouped data * Part h. Fit different log-linear models for group(x) , get deviances and AIC * Part i. Plot estimated hazard functions for models A-E * Part j. Fit non-proportional hazard for group effect -- Model F * Part k. Use Model D to estimate and plot smoothed S(t) * Housekeeping * Clear workspace clear * Turn off -more- pause set more off * Save log file on disk, use .log so Notepad will open it capture log close log using cl10ex1.log, replace * Make subfolder for graphs shell md cl10ex1 * Extend linesize for log set linesize 100 * Part a. Input data, define as a survival dataset * id, x(0=no maint 1=maint), t = time to relapse, failed=(1=relaspsed 0=censored) input id x t failed 1 1 9 0 2 1 13 0 3 1 13 1 4 1 18 0 5 1 23 0 6 1 28 1 7 1 31 0 8 1 34 0 9 1 45 1 10 1 48 0 11 1 161 1 12 0 5 0 13 0 5 0 14 0 8 0 15 0 8 0 16 0 12 0 17 0 16 1 18 0 23 0 19 0 27 0 20 0 30 1 21 0 33 0 22 0 43 0 23 0 45 0 end * Part b. Define survival variables: stset stset t , failure(failed==1) id(id) * Save as Stata dataset save cl10ex1.dta , replace * Part c. Descriptive summaries: stdes, stsum * Simple counts of persons, events, time at risk stdes if x==1 stdes if x==0 * Summary stats: time at risk, rates, subject, 25,50,75 %tiles (K-M estimates) stsum , by(x) * Compare overall incidence rates by group: stir stir x * Part d. Bin the time for grouped survival analysis: stsplit * Note: Expands dataset, 1 record for each person-time interval combination * Specify ends of intervals, last interval extends to infinity stsplit tbin , at( 2.5 (2.5) 20, 25,30,35,40,45,50,161 ) * Part e. Tabulate rates by categorical variable group(x) and bins: strate * Output to new dataset: _D=events _Y=time at risk _Rate=rate * NOTE: The strate command REQUIRES STATA 6 or 7 strate tbin x , output(binrates.dta,replace) * Part f. Calculate survivor function, S(t) from grouped data * Access rates, time at risk dataset use binrates.dta , clear * First, calculate interval lengths, L, for grouped survival analysis * Make sure in order by group(x) and time bin sort x tbin * L = subtract lower limtits for interval _n+1 -vs- _n ; last(_N) interval is undefined by x: gen L = cond( _n < _N , tbin[_n+1] - tbin , . ) * Calculate midpoints f intevals for log-linear models -- last intervals must be * treated as special cases gen midT = tbin + L/2 replace midT = 42.5 if (x==1 & midT==.) replace midT = 105.5 if (x==0 & midT==.) * Calculate survival probs P for each interval: rate x length , (correct if P <0) gen P = min(1 - _Rate*L,1) * Calculate S(t) = Prob (Surviving beyond t) = Product P1 P2 ... Pt gen S = P by x: replace S = cond( _n>1, P*S[_n-1] , S) * Show results _Y=time at risk _D=failures list midT x _Y _D P S * Part g. Plot Survivor function S(t) for grouped data * Plot S(t) for grouped data at end of intervals; connect with lines * To plot S(t) for each of two groups, need two variables * Plot S(t) at end of interval = lower limit + length/2 , last interval not used * by convention, plot S(0)=1 gen T = tbin by x: gen MAINT = cond(_n>1, S[_n-1], 1) if x==1 by x: gen NOMAINT = cond(_n>1, S[_n-1], 1) if x==0 * Check shifted plotting points *list tbin L T S MAINT NOMAINT set textsize 140 twoway scatter MAINT NOMAINT T , ms(O S) c(l l) ylab(0(0.1)1) ytitle("S(t) = PROBABILITY RELAPSE > t ") xtitle("WEEKS") title("Survivor Function for Binned AML Data") graph export "cl10ex1\figg1.wmf", replace drop T P S MAINT NOMAINT * Close log file -- Only when all errors have been fixed *log close * Cl12EX1.DO Survival analysis * Kaplan-Meier curves, log-rank test, Cox PH regression model * Data: AML Weeks in remission -vs- Treatment group * * * Raw data: AML data included below * Contents: * Part a. Input data, define as a survival dataset * Part b. Define survival variables: stset * Part c. Descriptive summaries: stdes, stsum * Part d. Calculate and print Kaplan-Meier estimates for each group * Part e. Plot Kaplan-Meier estimates for each group * Part f. Log-rank test for comparing survival experience across groups * Part g. Fit Cox proportional hazards model * Assumes files are in folder [path]\bio623 * Use Stata command cd "[path]\bio623" to point to the correct folder * To run this program, use the following Stata command: * do cl12ex1 * Housekeeping * Clear workspace clear * Turn off -more- pause set more off * Save log file on disk, use .txt so Notepad will open it capture log close log using cl12ex1.txt, replace * Make subfolder for graphs shell md cl12ex1 * Extend linesize for log set linesize 100 * Part a. Input data, define as a survival dataset * id, x(0=no maint 1=maint), t = time to relapse, failed=(1=relapsed 0=censored) input id x t failed 1 1 9 1 2 1 13 1 3 1 13 0 4 1 18 1 5 1 23 1 6 1 28 0 7 1 31 1 8 1 34 1 9 1 45 0 10 1 48 1 11 1 161 0 12 0 5 1 13 0 5 1 14 0 8 1 15 0 8 1 16 0 12 1 17 0 16 0 18 0 23 1 19 0 27 1 20 0 30 0 21 0 33 1 22 0 43 1 23 0 45 1 end * Part b. Define survival variables: stset stset t , failure(failed==1) id(id) * Save as Stata dataset save cl12ex1.dta , replace * Part c. Descriptive summaries: stdes, stsum * Summary stats: time at risk, rates, subject, 25,50,75 %tiles (K-M estimates) stsum , by(x) * Part d. Calculate and print Kaplan-Meier estimates for each group sts list if x==1 sts list if x==0 sts list , by(x) compare * Part e. Plot Kaplan-Meier curves for each group sts graph , by(x) lost t1("Kaplan-Meier Survival Curves") t2(" Maintained -vs- Not Maintained") b2("t, Weeks") l2("Pr Survived Beyond Time t") graph export "cl12ex1\fige2.wmf", replace * Part f. Log-rank test for comparing survival experience across groups sts test x * Part g. Fit Cox proportional hazards model stcox x * Show coefficients stcox , nohr * Close log file -- Only when all errors have been fixed *log close