Sameer's Home Page

Welcome

The analysis I have done in this project has interesting results. Although it has been earlier shown by Lin et al that CRLMM outperforms Birdseed, the ADPs plotted in this project show how CRLMM outperforms Birdseed on different Affymetrix platforms with varying sample size. The Accuracy vs Confidence measure plots show that both algorithms are quite resistant to variability across the different datasets used. However, CRLMM has higher accuracy overall. The cluster plot represented in Figure 8 shows SNPs which were called incorrectly by Birdseed. For these SNPs, CRLMM calls and Hapmap calls both agreed. The plots show that Birdseed calls these as Homozygous where in fact the calls are Heterozygous. The 2 calls where Birdseed calls are Heterozygous are in fact Homozygous. The cluster plots representing the SNP5 samples with plate information also show similar differences. Calls called as Heterozygous by CRLMM are called as Homozygous by Birdseed. This problem is similar to the problem faced by the Dynamic Model (DM) algorithm (Shin Lin, Benilton Carvalho, David J Cutler, Dan E Arking, Aravinda Chakravarti and Rafael A Irizarry: Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays. Genome Biology 2008, 9:R63 )

Figure 9 is a slightly different comparison. It compares CRLMM with and without the spline correction step. I used a linear model instead of a spline and compared the results. There were 8 SNPs where CRLMM with a linear model had a higher percent error (compared to Hapmap calls) than CRLMM with the spline model. I plotted 3 of those SNPs. It turned out that CRLMM without the spline correction step called some of the calls as heterozygous (AB) where in fact they were homozygous (called correctly by CRLMM with spline correction). This leads me to believe that for some of the SNPs, fitting a spline model instead of a linear model does seem to pay off.

The accuracy vs shift plots show that accuracy increases with increasing shift. The coefficient for shift for both AA and BB is statistically significant (P < 2e-16) even though the values of the coefficients for the shifts are very low. The Accuracy vs SNR plot shows that CRLMM gives higher accuracies than Birdseed when compared on arrays with the same Signal to Noise ratio. Also, CRLMM has less variability of accuracy when compared to Birdseed.

Last but not least, the M vs S plots show how plates with different quality can affect genotyping and the distinction between genotype regions. Figure 16a shows the array with the worst SNR, 16b shows the array with the mean SNR and 16c shows the array with the best SNR. One can observe how the distinction between the genotype regions becomes clearer with increasing SNR.