Welcome
Datasets
Affy HapMap 500K high quality arrays
Chakravarti HapMap 6.0 first pass arrays
Broad HapMap 500K first pass arrays
Affy HapMap 6.0 high quality arrays
Chakravarti HapMap 6.0 first pass arrays
Genetic Association Information Network (GAIN) Affy SNP 6.0 arrays
Genetics of Kidneys in Diabetes (GoKinD) arrays Affy SNP 5.0 arrays (Part of GAIN)
You can download these datasets at rafalab.biostat.jhsph.edu
Methodology
ADP Plots
CRLMM and Birdseed were run on both the Affy SNP 6.0 and 500k platform on the datasets mentioned above. C code (GNU) was written to take as input, the calls file and confidence file. From the distribution of the confidences, quantiles were calculated. (Minimum value to the 20th quantile). The calls were then compared to the "truth" (Hapmap Calls) and the mean of the confidences of the correct calls greater than the given threshold (drop rate) were calculated. The final plotting was done using the R statistical package.ACM plots
Like the ADR plots but the Drop rate was replaced by confidence metrics threshold.Cluster plots
CRLMM and Birdseed were run on the Affy SNP 6.0 and Affy SNP 5.0 platforms on the datasets mentioned above. C code was written to select SNPs which had a significant difference in percent error across the samples between the two algorithms by comparing the calls with the HapMap calls. For the samples where there was no gold standard available, SNPs were selected that were called differently by the two programs at least 40 % of the time. Normalized intensities (log 2) of alleleA and alleleB were plotted. (log A vs log B). This was done by R. Ellipses around the genotype regions were plotted for Birdseed calls using the means and covariances for alleles A and B generated as one of the output files by Birdseed.Accuracy vs Shift
For genotypes AA and BB, accuracy vs shift were plotted. This was done using the shift file generated by CRLMM. The shift file from CRLMM (called centersAfter.txt) gives for each SNP, the shift from each genotype region (AA, AB, BB). The plotting was done by R.Accuracy vs Signal to Noise Ratio plots
For each array in the sample of 54 files (Genetic Association Information Network (GAIN) Affy SNP 6.0 arrays), Accuracy vs SNR was plotted for both CRLMM and Birdseed. The SNR was read from the binary SNR file generated from CRLMM. The plotting was done by RM vs S plots
For the arrays with the best, worst and mean SNR from the 54 samples of Genetic Association Information Network (GAIN) Affy SNP 6.0 arrays, SNR, M (log (A/B) vs S (log A+ log B)/2 were plotted. (For description of SNR, see Carvalho et al). This was done by R .