Supplementary Material

Scharpf RB, Parmigiani G, Pevsner J, Ruczinski I (2008).
Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays
The Annals of Applied Statistics, 2(2): 687-713.

Color Figures

Figure 1:

HapMap genotype calls (the gold standard) for a bad SNP (left) and a good SNP (right) for 269 samples measured on Affymetrix 100k SNP chips. The HapMap consensus genotype call (taken to be the gold standard) is indicated by color: AA (blue), AB (yellow), and BB (red). The separation between genotype clusters is SNP-specific. This figure motivates an approach that incorporates uncertainty estimates to control smoothing.


Figure 2:

A simulated chromosome with 9165 SNPs. Top: The simulated genotype estimates with uniform noise added to reduce overplotting (vertical axis) plotted against physical position (horizontal axis). Bottom: A magnification of region A. Two SNPs in region A with high simulated confidence scores are indicated by the square plotting symbol. Regions A-E are described in detail in the manuscript. In truth, there are 4 different segments in state loss. The predicted hidden states from the vanilla (Van) and ICE HMMs are denoted by color in the two bars beneath the data points. The ICE HMM detects each of the 4 loss segments, whereas the vanilla HMM smooths over a segment in A containing two heterozygous SNPs at position 52.8 Mb. Utilizing confidence scores for the genotype predictions, the ICE HMM may provide more precise locations for loss breakpoints.


Figure 3:

Top: Copy number estimates (vertical axis) versus physical position (horizontal axis) for 9165 SNPs on a simulated chromosome. Bottom: A magnification of regions D, B, and E. High confidence scores for the copy number estimates were simulated for the square points in regions D, B, and E. The two bars beneath the data points in each figure show the predicted hidden states from the vanilla (Van) and ICE HMMs. Note that where the predictions differ in regions D, B, and E, the ICE correctly classified the hidden states. Note that the vanilla HMM also indicates a (spurious) deletion to the left of region A, not indicated by the ICE HMM due to high variability in those copy number estimates.


Figure 4:

Top: The copy number estimates in Figure 2 are superimposed on the genotype estimates in Figure 3. We fit HMMs to the joint observation sequence of copy number eand genotype estimate without (vanilla) and with (ICE) confidence scores of the SNP-level summaries. The predictions from these two HMMs are represented by different shades of grey in the two bars beneath the data points in each panel. We used square plotting symbols to indicate SNPs for which we assigned high confidence scores to the genotype and copy number estimates.


Figure 5:

Top: A confirmed UPD between 190 and 200 Mb is detected by both HMMs in a HapMap sample from the CEPH dataset. Note that the vanilla HMM incorrectly predicts a small deletion of 3 base pairs in the middle of this region, whereas the ICE HMM provides a more global smoothing of the copy number estimates. Bottom left: a magnified view of three possible LOH regions (not confirmed). Only the middle region (143 Mb) is identifed by both HMMs as LOH. Because the CRLMM genotype calls agree with the HapMap consensus, the chromosomal segment containing the two heterozygous SNPs at 140 Mb is not a region of LOH, as predicted by the vanilla HMM. Bottom right: magnification of the vanilla (top) and ICE (bottom) predictions for the feature at 150 Mb. Again, the true genotype calls are heterozygous, and so the ICE HMM correctly identifies the chromosomal segment containing the two heterozygous SNPs as normal.


Figure 6:

An image of the predictions from the vanilla HMM fit to chromosome 2 of the 60 parental samples in the CEPH trios dataset (top). The x-and y-coordinates used for the image are physical position and subject, respectively. Subject NA07056 has a confirmed UPD at 195 Mb. Also plotted are the frequencies of LOH across the 60 samples (middle) and the cytoband (bottom).


Questions? Mail Ingo