Data
Datasets from Hastie, Tibshirani and Friedman's book are available as an R package. Try the following in R to get a list of datasets:
> install.packages("ElemStatLearn")
> library(help=ElemStatLearn)
There are number of datasets available in the mlbench R package as well:
> install.packages("mlbench")
> library(help=mlbench)
Olive oils data (text file)
Code
Other
Héctor Corrada Bravo
Bloomberg School of Public Health E3140
Office hours: M 3-4, Th 12-1, F 1-2
Rafael A. Irizarry
Bloomberg School of Public Health E3619
TA: Hao Wu
Bloomberg School of Public Health E3037
Meeting times
Monday and Wednesday 1:30pm-2:50pm
JHU Bloomberg School of Public Health
615 N. Wolfe Street
Room W4019
Evaluation
There will be three assignments including programming, data analysis and pen-and-paper problems. There will also be a graded open project.
Materials
There is no required textbook. Lecture notes and slides will be made available before each lecture. Recommended books for reference are
It is required that programming assignments are done with R. For the project, programs can be written in any other language. However, instructors and the TA must be able to compile and run them in the Biostats department computing environment.
Description
Machine Learning methods have found their way into the modern data analyst's toolbox. This course introduces popular methods with an emphasis on their practical usage for data analysis. Some basic coverage of their statistical and computational theoretical underpinnings acquaints students with methods to evaluate statistical machine learning models defined in terms of algorithms or function approximations.
Topics covered include: regression and prediction, tree-based methods, overview of supervised learning theory, support vector machines, kernel methods, ensemble methods, clustering, visualization of large datasets and graphical models.
Examples of method applications covered include cancer prognosis from microarray data, visualization and analysis of social network data, and graphical models for clinical decision-making.
Prerequisites: Linear algebra and statistical principles at a 651-654 level. Proficiency in R, or programming language with suitable numerical capabilites (e.g. matlab, python). Listed in catalog as 160.644.
There is an open project. Details here