Statistical Genomics
Asian Institute in Statistical Genetics and Genomics, July 21-22, 2017.

Last update: Thu Jul 20 04:39:03 EDT 2017


Outline We will discuss a selection of high-throughput technologies to obtain genomic measurements, and the subsequent analyses. Emphasis will be on data processing to eliminate technological biases and to address unwanted variability, and on simple principles for proper and powerful Biostatistical inference. Discussion topics will include the role of genomic measurements for individualized and public health, experimental design, data pre-processing, dimension reduction, variance stabilization, statistical inference (including disease association), multiple comparisons, and enrichment analyses. We emphasize the importance of data provenance and reproducible research. Practical illustrations using data from DNA and RNA sequencing studies, genomic arrays, methylation and proteomics are given during the lectures. Additional software examples using freely available Bioconductor tools will be available on the course web page for students to practice.

This course is intended for researchers who are actively engaged in genomics research and interested beginners, including laboratory scientists and clinicians with a basic quantitative background, and quantitatively trained researchers with basic knowledge in human genetics. The lectures will be given in English. Participants are expected to have some basic knowledge of core statistical principles (such as hypothesis testing and p-values). Basic knowledge of the open source statistical environment R will be helpful to take full advantage of the examples given in class.


Additional material - Please see the material on the additional material page for more on relevant software, case studies, online classes, etc.


Credits - The slides and code contain material from classes for Biologists and Statisticians we teach at Johns Hopkins [ 140.615 ] [ 140.688 ] [ 140.751 ] [ 140.752 ] with material from Karl Broman [ KB ], Kasper Hansen [ KH ], Rafa Irizarry [ RI ], Ben Langmead [ BL ], Jeff Leek [ JL ], Luigi Marcionni [ LM ], and Rob Scharpf [ RS ] which is gratefully acknowledged.


Introduction to statistical genomics
Summarizing and presenting genomic data
Statistical modeling I : means and two-group comparisons
Multiple hypothesis testing
Differential expression
Pathway and gene set analyses
Experimental design
Dimension reduction
Batch effects
Statistical modeling II : linear models in genomics
Statistical modeling III : pre-processing genomic data