►First lab Wednesday January 28 from 9:15 to
10:15 am in W3204. TA: Sorina Eftim
►First TA office hrs, Wed. January 28 from 12:15
to 1:15 pm in W4007. TA: Yi Huang
- Course
info
- Announcements
- Exams
- Lecture
Notes
- Lab
Notes
- Data
sets
- Software
(S+, STATA, SAS)
- LDA
books
- Acknowledgments
- COURSE OBJECTIVES [ps]
[pdf]
- Dr. Dominici's Office Hour M 12:30-1:30 pm Room E3148
- Sorina's and Yi's Office Hours: Wed. 12:15-1:15 pm Room W4007
ANNOUNCEMENTS AND
IMPORTANT DATES |
►First class Wednesday January 21 from 10:30 am to 12:00 pm W3204
First lab Wednesday January 28 from 9:15 to 10:15 am in
W3204
First office hour Wednesday January 28 from 12:15 to 1:15 pm in W4007
► Please send your e-mail address to yhuang@jhsph.edu, to be included in the
class mailing list
Problem Set 1 due F February 6 by 5:00 pm in Yi's mailbox
Mid Term Assignment due F February 20 by 5:00 pm in Francesca's mailbox
Problem Set 2 due F March 5 by 5:00 pm in Sorina's mailbox
Final Assignment due F Match 19 by 5:00 pm in Francesca's mailbox
1. Examples of Longitudinal Data Sets [ps] [pdf]
2. Exploratory Data Analysis [ps] [pdf]
3. Linear Regression: a review [ps] [pdf]
4. Linear Models for Correlated data: examples [ps] [pdf]
5. Linear Models for Correlated data: inference [ps] [pdf]
6. Parametric Models for Covariance Structure [ps] [pdf]
7. Parametric Models for Covariance Structure: examples [ps] [pdf]
8. Generalized Linear Models for Longitudinal Data [ps] [pdf]
READING ASSIGNEMENT
Longitudinal Data Analysis Using Generalized Linear Models by Liang K.Y.
and Zeger S.L. Biometrika 1986 [pdf]
9. Marginal Logistic Regression Model and GEE [ps] [pdf]
10. Marginal Poisson Regression Model and GEE [ps] [pdf]
11. Generalized Linear Models with Random Effects [ps] [pdf]
12. Transition Models [ps] [pdf]
Download each of the STATA *.ado files and the
*.hlp files. Please use "Save As Source" when ou save them to your hard disk
from your web browser. To use the *.ado files, put them in your current
directory, in your STATA "ado" directory, or in a directory where STATA will
know where to look for them. These are ***not*** throughly tested functions.
Please let me know of any bug you find in these functions.
INTRODUCTION AND EXPLORATORY DATA ANALYSIS
►LAB 1, Monday 1/26 : CANCELLED DUE TO WEATHER
CONDITIONS
►LAB 2, Wednesday 1/28: Introduction to Statistical software: STATA, SAS
Introduction to Matrix Algebra [matrix_intro.pdf]
, extra: trace of matrix, determinant, calculation of the inverse of matrix
(2 by 2)
Introduction to STATA [stata_intro.pdf]
[stata_intro2.pdf]
Exploratory Data Analysis [lab1_2.pdf]
(Autocorrelation, Scatterplot Matrix, Line Plots, and Lowess Smoother)
Examples using the CD4+ cell numbers data set [cd4.example.pdf]
[cd4.example2.pdf]
STATA analysis for calculating the autocorrelation function of the CD4 data
[cd4.do]
Variogram Plot [variogram.ado]
[variogram.hlp]
This function requires [xtdiff.ado]
and ksmapprox.ado
STATA analysis for calculating the variogram of cows data [cows.do]
Additional Material
Introductions to SAS [sas_intro1.pdf]
[sas_intro2.pdf]
Glossary of Macros [sascode.pdf]
Faster function to generate smooth model fits [ksmapprox.ado][ksmapprox.hlp]
Function for making plots of means over time [xtgraph.ado]
[xtgraph.hlp]
pdf
demonstration file
Function to compute sample autocorrelation function for fixed time points of
equal lag [autocor.ado]
pdf
help file
LINEAR MODELS FOR INDEPENDENT AND CORRELATED DATA
►LAB 3, Monday 2/2: Linear regression using STATA
Multiple regression in matrix notation [matrix.pdf]
Estimating variance within subjects and between subjects [xtsumcorr.ado]
[xtsumcorr.hlp]
Ordinary Least Squares in STATA [pdf]
STATA data analysis by use of Ordinary Least Squares [pdf]
SAS and STATA analyses of the CD4+ data [output]
LINEAR MODELS FOR CORRELATED DATA
STATA analysis of the weights of pigs data set: Autocorrelation function,
Uniform Correlation Model, OLS, and WLS [pdf]
Exponential Correlation Model[pdf]
STATA analysis of the dental data set (solutions of Problem set 2) [output]
STATA analysis of the Nepal data set (solutions of Problem set 3) [output]
Fit regression splines to the Nepal Data set in SAS [output]
Ordinary Least Squares and Weighted Least Squares
STATA analysis of sitka spruce trees (population average model) [output]
STATA analysis of CD4+ cell numbers (random effect model) [output]
Robust Estimation
Robust Estimation of the sitka spruce data set and fitting splines to the
CD4 data set: STATA analysis [output]
Parametric Models for Covariance Matrices and Introduction to Logistic
Regression
STATA analysis with exponential correlation model and SAS PROC MIX of the
cow's milk data [pdf]
Handout with STATA Commands for analysis of continuous longitudinal data [pdf]
Introduction to Logistic Regression: STATA Analysis of the Myocardical
Infarction Data [pdf]
Introduction to Logistic Regression: SAS Analysis of the Myocardial
infarction data [program]
and output [output]
Logistic Regression for Longitudinal Data
STATA Analysis of the 3x3 Pain Crossover Trial Data [pdf]
Problem set 4: Analysis of the Indonesian Children's health study [pdf]
STATA do file [ichs.do]
Poisson Regression and GEE
STATA Analyses of the Epileptic seizures data set (Marginal Poisson
Regression and GEE) [pdf]
Analysis of epileptic seizure data using a population-averaged model and
GEE, PROC GENMOD) [program]
and output [output]
STATA Analyses of the Epileptic seizures data set (Poisson Regression with
Random Effects) [pdf]
The data sets are posted in a raw format to be analyzed under SAS,
STATA, Splus and R. Please look at the readme file for columns names
Readme file of all the data sets below [readme]
Example 1.1. CD4+ cell numbers [cd4.raw]
Example 1.3. Growth of Sitka spruce [trees.raw]
[sitka.raw]
Example 1.4. Protein content of milk [barley.raw]
[lupins.raw]
[mixed.raw]
Example 1.6. Epileptic seizures [seize.raw]
Example 3.1. Weights of pigs [pigs.raw]
in STATA format [pigs.stata.dat]
Nepal Clinical Trial Data set [readme]
[nepal.raw]
in STATA format [nepal.stata.dat]
Dental Data [dental.raw]
Weight Loss Data [weightloss.raw]
HIV Study Data [hivstudy.raw]
Multiple Sclerosis Data [afcr.raw]
Back Pain Data [back.raw]
Myocardial infarction data [infarc.raw]
Indonesian children's health study [ICHS.raw]
Wheezing data [wheeze2.raw]
3 by 3 Pain crossover trial [crossover33.raw]
S-plus functions:
Plotting longitudinal data sets [exploratory.s]
Data analysis of the sitka spruce trees data set [sitka.s]
and handout [ps]
[pdf]
Link to Oswald: Software for the Analysis of Longitudinal Data in S-plus
[website]
Stata functions:
Convert ASCII file into STATA format using Stat/Transfer [plaintext]
Inputting Your Data into STATA [website]
Reshaping Data from Wide to Long [website]
Resources to help you learn and use STATA [website]
A wonderful archive of STATA programs (requires Stata version 6.0) [website]
STATA Frequently Asked Questions [website]
The xtgee command [website]
STATA 6.0 allows direct reading of datasets and command updates over the
Web. The how-to is here.
Stata Analysis with GEE of the Epileptic Seizures data [1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
SAS functions:
Macro for calculating autocorrelation function in SAS [pdf]
[readme]
Macro for fitting splines to Nepal Data [splinfit.sas]
Generate correlated normal data [gendat.sas]
PROC MIXED for the sitka.data [sitka.sas]
and handout [ps]
[pdf]
Fit OLS and WLS models for gendat.sas data [owlsfit.sas]
SAS analysis of the dental data set [program]
[output]
Analysis of dental data using a random coefficient model, PROC MIXED [program]
and output [output]
Analysis of dental data using linear mixed effects model, PROC MIXED) [program]
and output [output]
Fit a Logistic Regression Model to the Myocardial infarction data [program]
and output [output]
Analysis of epileptic seizure data using a population-averaged model and
GEE, PROC GENMOD) [program]
and output [output]
Comparing the SAS GLM and Mixed Procedures for Repeated Measures [pdf]
Analysis of Longitudinal Data, Peter J. Diggle, Kung-Yee Liang and Scott
L. Zeger, Oxford (1999) (TEXTBOOK) [table
of contents] [errata]
Nonlinear Models for Repeated Measurement Data, Marie Davidian and David
Giltiman Chapman and Hall (1995) [table of contents]
Linear Mixed Models for Longitudinal Data, G. Verbeke, Katholieke
Universiteit Leuven, Leuven, Belgium; G. Molenberghs, Springer Series in
Statistics (2000) [table
of contents] [book datsets]
Linear Mixed Models in Practice : An Sas-Oriented Approach, Geert Verbeke,
Geert Molenberghs, Springer-Verlag (2000). [table
of contents]
A Handbook of Statistical Analyses using Stata, Sophia Rabe-Hesketh and
Brian Everitt, Chapman & Hall/CRC (2004) [table of contents]
This web page
contains lecture notes, example, data sets, and software also developed by
students and colleagues. In particular I would like to thank biostat students
Nikhil Gupte, Hongfei Guo for their help in posting LDA materials; Dr. Scott L.
Zeger for providing his course notes and problem sets; Dr. Marie Davidian for
sharing SAS software, course notes, and data sets; Dr. Paul Rathouz for sharing
STATA code, course notes, and data sets; Dr. Irizarry for providing Splus
software; Dr. McDermott for assistance to html programming; and biostatistics
students for provinding Stata output of homeworks. For comments and suggestions
please e-mail me at fdominic@jhsph.edu.