John McGready

Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health


Research

Classical Methods versus Resampling Methods in Small Sample Comparisons

As statisticians we are often flummoxed by what to advise when comparing a parameter of interest between 2 populations via information from two small samples. For example, while we collectively understand the sampling behavior of a difference in sample means from large samples (> 30? >60? depends on who you ask), there is no standard party line on how to estimate confidence intervals for a true mean difference using smaller samples. Some will suggest non-parametrics (not very user friendly for CI construction), some will ascertain that if you are sure of the normality of your populations (not easy when you have little data to asses this) you can safely use classical methods (t-test based), and some recommend alternative methods such as a bootstrap approach.

However, little research has been done to compare such approaches, and in fact, there are some methodological challenges to bootstrapping a difference in sample means - especially in small sample scenarios. The built in programs in R for example, are really only equipped to handle bootstrapping for a single quantity from a single sample.

I am currently running simulations to compare the performance (coverage probabilities), and size of the resulting confidence intervals estimated by classical, t-test approaches (with and without Satterwaite) and bootstrap-t, bootstrap percentile, and bootstrap bias corrected accelerated methods. I am comparing population means via small samples selected from various choices for population distributions (normal, lognormal, exponential, beta) with known parameters. My goal is to offer practical advice to practitioners and also better quantify just how "badly" (or well) the methods perform in these small sample comparisons.

Statistical Education

I currently teach two sections of an introductory biostatistics course, "Statistical Reasoning in Public Health". Both sections have the same learning materials, assignments, and tests - the difference is in the mode of information delivery. One section is offered "live" on campus with two weekly lectures, and TA office hours face to face - the other section is offered online, with students viewing lecture materials at their preferred times, and instructor/student communication generally performed asynchronously via email and a class bulletin board system.

Having taught both sections concurrently for the past 3 years, I have plenty of anecdotal information about some of the student characteristics in each class, and the overall experience of students in both sections. However, until now, I have never collected data systematically to more formally update my impressions. I have now collected data on the students via two surveys (beginning and end of the two-quarter sequence), and have synced it up with course performance data. I am in the process of analyzing this data, and look forward to providing some evidence about the similarities and differences between on campus and online learning.

Applications of Modern Statistical Methodology to Current Public Health and Medical Research

I am currently working with collaborators on a variety of projects including:
Return to John McGready's Home Page