Srfim is an R package for model-based quality assessment and base-calling of second-generation sequencing data produced by Illumina's GA technology. It implements the methods described here:
H. Corrada Bravo, R.A. Irizarry. "Model-based quality assessment and base-calling for second-generation sequencing data" (Novmeber 2009). Biometrics. Published online before print, Novemeber 13, 2009. doi10.1111/j.1541-0420.2009.01353.x
[Biometrics]
[pdf]
This software is under heavy development. Please email with any comments or questions. We also provide Rsolid for ABI's SOLiD second-generation sequencing platform.
Table of contents
Requirements
- R: You can download and install R here
- MPI (Suggested): Second-generation sequencing produces a lot of data. We highly recommend using a cluster to run Srfim. If Rmpi is available, srfim will use MPI to use multiple processes to do basecalling.
- R packages: We use code from other R packages. In particular,
Bioconductor's
ShortRead
package, and thesfit
andaroma.light
pacakges. We suggest theRmpi
package. Run the following commands in R:> source("html://bioconductor.org/biocLite.R")
> biocLite("ShortRead")
> install.packages(c("Rmpi","matrixStats", "aroma.light"))
>install.packages("sfit",repos="http://R-forge.R-project.org")
Installation
Download the source tarball here and install srfim using the standard R source installation procedure:
> R CMD INSTALL srfim_0.0-931.tar.gz
This has been tested with R-2.11 on both Linux and Mac OSX.
Basic Usage
The easiest way to use Srfim is by taking advantage ofShortRead
to traverse Illumina GA result directories, and MPI for job scheduling.
The following R code will look for intensity files in run directory run_dir
, and will write fastq
files to dest_dir
using MPI for job scheduling:
> library(srfim)
> library(Rmpi)
> sbatch <- SrfimBatch(SolexaPath(run_dir))
> srfim(sbatch, dest_dir)
The
fastq
files can be processed by your favorite
sec-gen analysis pipeline (use bowtie
for mapping to a large reference genome, for example). Removing the
library(Rmpi)
from the above will run srfim locally on
a single process (not recommended for full sequencing runs).