Authors: Héctor Corrada Bravo, Hao Wu and Rafael A. Irizarry

 

Srfim is an R package for model-based quality assessment and base-calling of second-generation sequencing data produced by Illumina's GA technology. It implements the methods described here:


H. Corrada Bravo, R.A. Irizarry. "Model-based quality assessment and base-calling for second-generation sequencing data" (Novmeber 2009). Biometrics. Published online before print, Novemeber 13, 2009. doi10.1111/j.1541-0420.2009.01353.x [Biometrics] [pdf]


This software is under heavy development. Please email with any comments or questions. We also provide Rsolid for ABI's SOLiD second-generation sequencing platform.

 

Table of contents

 

Requirements

  • R: You can download and install R here
  • MPI (Suggested): Second-generation sequencing produces a lot of data. We highly recommend using a cluster to run Srfim. If Rmpi is available, srfim will use MPI to use multiple processes to do basecalling.
  • R packages: We use code from other R packages. In particular, Bioconductor's ShortRead package, and the sfit and aroma.light pacakges. We suggest the Rmpi package. Run the following commands in R:

    > source("html://bioconductor.org/biocLite.R")
    > biocLite("ShortRead")
    > install.packages(c("Rmpi","matrixStats", "aroma.light"))
    >install.packages("sfit",repos="http://R-forge.R-project.org")

 

Installation

Download the source tarball here and install srfim using the standard R source installation procedure:

> R CMD INSTALL srfim_0.0-931.tar.gz

 

This has been tested with R-2.11 on both Linux and Mac OSX.

 

Basic Usage

The easiest way to use Srfim is by taking advantage of ShortRead to traverse Illumina GA result directories, and MPI for job scheduling. The following R code will look for intensity files in run directory run_dir, and will write fastq files to dest_dir using MPI for job scheduling:

 

> library(srfim)
> library(Rmpi)
> sbatch <- SrfimBatch(SolexaPath(run_dir))
> srfim(sbatch, dest_dir)

 

The fastq files can be processed by your favorite sec-gen analysis pipeline (use bowtie for mapping to a large reference genome, for example). Removing the library(Rmpi) from the above will run srfim locally on a single process (not recommended for full sequencing runs).

Help and Comments: