This page will help you install and run a simple example using ChIPXpress. Note, ChIPXpress has already been submitted and accepted by Bioconductor and will be released in version 2.11 of Bioconductor.
[Preliminary Step] If you do not have the latest relase of R
(v2.15) installed
on your computer, please first download and install R from here.
Choose the
version of R appropriate for your operating system.
If you have R version 2.14 or lower, ChIPXpress may not install properly.
[Step 2] Change biocLite to download from the devel version
of Bioconductor.
source("http://bioconductor.org/biocLite.R")
useDevel(TRUE)
[Step 3] Install the ChIPXpress and ChIPXpressData package by typing in R:
biocLite("ChIPXpress")
Please be patient, the installation process may be take a
while since the ChIPXPressData package
contains two large compendiums of gene expression data.
[Step 2] Install the ChIPXpress and ChIPXpressData package by typing in R:
source("http://bioconductor.org/biocLite.R")
biocLite("ChIPXpress")
Congratulations, you have successfully installed ChIPXpress!
If the instructions above did not work, and you would prefer to
install ChIPXpress manually, scroll down to the bottom for manual
installation
instructions.
Now you are ready for an introduction to ChIPXpress and a quick example.
Here, we illustrate an example of how to use the ChIPXpress function
to produce functional TF target gene rankings. Suppose we are
interested in studying Oct4 regulation in mouse embryonic stem cells
(ESCs). First, we process the ChIP-seq data using CisGenome (or other
method) to obtain a list of predicted Oct4-bound target genes in
ESCs. This has already been done previously and is stored as a dataset
in the package ready for input into the ChIPXpress function.
For the below, the following italicized commands should be
typed in R:
library(ChIPXpress)
data(Oct4ESC_ChIPgenes)
Next, we need to load the pre-built mouse database of gene expression
profiles from the GPL1261 platform by loading the ChIPXpressData
package. Remember, since the database is stored in big.matrix
format, we need to use the functions specially designed to work with
big.matrixes. This requires installing and loading the bigmemory
package.
library(ChIPXpressData)
library(bigmemory)
path <- system.file("extdata", package="ChIPXpressData")
DB_GPL1261 <- attach.big.matrix("DB_GPL1261.bigmemory.desc",
path=path)
To be more clear on exactly what we just did, we first located the
path in which the DB GPL1261 database is stored - which would be in
the extdata folder of the installed ChIPXpressData package - and then
specified the file name and the path to load the DB GPL1261
database. To load the DB GPL570 database for human data, we would
simply replace DB GPL1261 with DB GPL570.
We are now ready to run the ChIPXpress function. We specify the Entrez
GeneID of the TF-of-interest (18999 is the Entrez GeneID of Oct4), the
vector of Oct4 bound genes, and the database:
Output <-
ChIPXpress(TFID="18999",ChIP=Oct4ESC_ChIPgenes$EntrezID,DB=DB_GPL1261)
head(Output[[1]])
18999 17865 381591 22702 22271 99377
5.3 6.1 15.8
20.8 22.0 26.0
head(Output[[2]])
[1] "338369" "238555" "257963" "242860" "212569"
"243881"
The output is a list of size two. The first item in the list is the
Oct4 target gene rankings, where the names of the vector correspond to
the Entrez GeneID of each gene and each individual value is the
ChIPXpress score of each gene in the database. The second item reports
the TF bound genes that were not found in the database (i.e. not
measured by the microarray platform).
For the final step, you can convert the Output into a clean table with
genes names or any other preferred gene identifier by using any of
your favorite annotation packages (e.g., biomaRt). Here, we can use
the original Oct4ESC_ChIPgenes dataframe to do so directly.
GeneNames <-
Oct4ESC_ChIPgenes$Annotation[match(names(Output[[1]]),Oct4ESC_ChIPgenes$EntrezID)]
Result <-
data.frame(1:length(Output[[1]]),GeneNames,names(Output[[1]]),Output[[1]])
colnames(Result) <-
c("Rank","GeneNames","EntrezID","ChIPXpressScore")
head(Result)
Good job! You have just completed a ChIPXpress analysis of real
Oct4 ChIP-seq data. Result
contains the final ChIPXpress rankings.
[ChIPXpress Example 2]
For users who are less familiar with R, this example shows how to use read.delim to read in a tab-delimited file containing the list of predicted TF-bound genes. This file contains the peak detection output from CisGenome, where each peak is assigned to a corresponding gene by Entrez GeneID and sorted from the largest to smallest peak signal. Also, only the highest-ranked peak for each gene is retained in this input file.
First, download the
tab-delimited file containing the analyzed Oct4 ChIPx data results. Next, to read in the tab-delimited file in R, we type in R:
Oct4ESC_ChIPgenes <- read.delim(".../Oct4ESC_ChIPgenesEX.txt")
Here, "..." corresponds to the file path that you saved the Oct4 example
file.
Now, you are once again ready to follow the ChIPXpress analysis steps from
Example 1. Load in the ChIPXpress package by typing in R:
library(ChIPXpress)
Next, we need to load the pre-built mouse database of gene expression
profiles from the GPL1261 platform by loading the ChIPXpressData
package. Remember, since the database is stored in big.matrix
format, we need to use the functions specially designed to work with
big.matrixes. This requires installing and loading the bigmemory
package.
library(ChIPXpressData)
library(bigmemory)
path <- system.file("extdata", package="ChIPXpressData")
DB_GPL1261 <- attach.big.matrix("DB_GPL1261.bigmemory.desc",
path=path)
To be more clear on exactly what we just did, we first located the
path in which the DB GPL1261 database is stored - which would be in
the extdata folder of the installed ChIPXpressData package - and then
specified the file name and the path to load the DB GPL1261
database. To load the DB GPL570 database for human data, we would
simply replace DB GPL1261 with DB GPL570.
We are now ready to run the ChIPXpress function. We specify the Entrez
GeneID of the TF-of-interest (18999 is the Entrez GeneID of Oct4), the
vector of Oct4 bound genes, and the database:
Output <-
ChIPXpress(TFID="18999",ChIP=Oct4ESC_ChIPgenes$EntrezID,DB=DB_GPL1261)
head(Output[[1]])
18999 17865 381591 22702 22271 99377
5.3 6.1 15.8
20.8 22.0 26.0
head(Output[[2]])
[1] "338369" "238555" "257963" "242860" "212569"
"243881"
The output is a list of size two. The first item in the list is the
Oct4 target gene rankings, where the names of the vector correspond to
the Entrez GeneID of each gene and each individual value is the
ChIPXpress score of each gene in the database. The second item reports
the TF bound genes that were not found in the database (i.e. not
measured by the microarray platform).
For the final step, you can convert the Output into a clean table with
genes names or any other preferred gene identifier by using any of
your favorite annotation packages (e.g., biomaRt). Here, we can use
the original Oct4ESC_ChIPgenes dataframe to do so directly.
GeneNames <-
Oct4ESC_ChIPgenes$Annotation[match(names(Output[[1]]),Oct4ESC_ChIPgenes$EntrezID)]
Result <-
data.frame(1:length(Output[[1]]),GeneNames,names(Output[[1]]),Output[[1]])
colnames(Result) <-
c("Rank","GeneNames","EntrezID","ChIPXpressScore")
head(Result)
You can also save the output into a tab-delimited file by typing in R:
write.table(Result,file=".../Oct4_Output.txt",row.names=FALSE,sep="\t",quote=FALSE)
where ... is your path to where you want to save the file.
Good job! You have just completed a ChIPXpress analysis of real Oct4 ChIP-seq data. Oct4_Output.txt contains the final ChIPXpress rankings.
[Step 2] Download the ChIPXpressData package from
here
[Step 3a] Open R.
[Step 3b] If you do not have the following Bioconductor packages that
ChIPXpress imports from, please install them by typing in R:
source("http://bioconductor.org/biocLite.R")
biocLite(c("affy","frma","GEOquery"))
[Step 3c] If you do not have the following CRAN packages that
ChIPXpress imports from, please install them by typing in R:
install.packages(c("bigmemory","biganalytics"))
[Step 4] Install the ChIPXpressData package by typing in R:
install.packages(".../ChIPXpressData_0.99.0.tar.gz",repos=NULL,type="source")
where ... is the path to the location of the
package (where you downloaded the package to).
For linux, you can install by typing in the
linux command line R CMD
INSTALL .../ChIPXpressData_0.99.0.tar.gz .
*Note, the ChIPXpressData package is rather large and may take a long
time to download and install since it contains thousands of gene
expression profiles.
[Step 5] Install the ChIPXpress package by typing in R:
install.packages(".../ChIPXpress_0.99.5.tar.gz",repos=NULL,type="source")
where ... is the path to the location of the
package.
For linux, you can install by typing in the
linux command line R CMD
INSTALL .../ChIPXpress_0.99.6.tar.gz .