dPCA |
dPCA: differential principal component analysis of ChIP-seq |
[Introduction]
We propose Differential Principal Component Analysis (dPCA) for analyzing multiple ChIP-seq datasets to identify differential protein-DNA interactions between two biological conditions. dPCA integrates unsupervised pattern discovery, dimension reduction, and statistical inference into a single statistical framework. It uses a small number of principal components to concisely summarize the major multi-protein differential patterns between the two conditions. For each pattern, it detects and prioritizes differential genomic loci by comparing the between-condition differences with the within-condition variation among replicate samples. dPCA provides a new tool for efficiently analyzing large amounts of ChIP-seq data to study dynamic changes of gene regulation across different biological conditions.
dPCA is part of CisGenome project. Currently, it can be run as a command line program. We will consider incorporating it into CisGenome GUI in the future.
[Supporting information for the dPCA paper]
Supporting Figures: FigureWeb1.pdf Additional Simulations: dPCA_TechReport_2.pdf
[News]
Several new functions have been released, including: (1) dpca_peakcalls: a program that uses CisGenome peak calling function to eliminate input genomic loci not bound by any protein in any dataset. (2) -r option of dpca: allows one to compute the R^B statistic for identifying differential loci with significant absolute binding. (3) -z option of dpca: allows one to use dPCA-Z to filter out differences without significant binding activities before dPCA. See readme for details.
[Download]
Software: Windows, Linux, Mac OS
Example Data: The data below are normalized read count data used in the dPCA paper. You can run dpca directly using these data without using dpca_importdata.
MYC analysis (Example I): Ebox_data; Ebox_peakprob (peak calls for R^B, dPCA-Z) Promoter analysis (Example II): Prom_data; Prom_peakprob (for R^B, dPCA-Z) ASB analysis (Example III): ASB_data; ASB_peakprob (for R^B, dPCA-Z)
Example commands for analyzing these data are:
(1) Promoter analysis (dPCA-P) > dpca -I Prom_data.txt -d /home/ -o Prom_output -t 1
(2) Promoter analysis (dPCA-Z) > dpca -i Prom_data.txt -d /home/ -o PromZ_output -t 1 -z 1 -r Prom_peakprob.txt
(3) Promoter analysis (dPCA-P and compute R^B) > dpca -i Prom_data.txt -d /home/ -o PromP_output -t 1 -r Prom_peakprob.txt
(4) MYC analysis (dPCA-P) > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1 -cm 0 OR > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1 -cm 0 -r Ebox_peakprob.txt
(5) ASB analysis (dPCA-P, paired sample) > dpca -i ASB_data.txt -d /home/ -o ASB_output -t 1 -sm 1 -cm 0 OR > dpca -i ASB_data.txt -d /home/ -o ASB_output -t 1 -sm 1 -cm 0 -r ASB_peakprob.txt
[Installation]
For Windows: An executable program is provided. To run dPCA, click the start menu of your windows system (typically on the bottom left corner of your screen). Choose ‘Accessories > Run’, type ‘cmd’ and then press Enter. A command window will show up. In this window, enter the folder that contains dPCA, for example, by typing:
> cd D:\Users\dPCA\
Now type: > dpca_importdata > dpca > dpca_peakcalls
You will be able to see some usage information which indicates that you can start to use dPCA now.
For Linux, Mac OS: (dPCA is bundled with CisGenome. You can follow the cisgenome installation procedure to install dPCA. dPCA is written in C language. Before installation, you need to have a C compiler such as g++ or gcc installed on your computer.)
1. Unzip using ‘gzip -d *.gz’ (here * is the name of the file you have downloaded) 2. Untar using ‘tar xvf *.tar’ 3. Enter cisgenome folder; 4. compile by typing ‘./makefile’. 5. Now enter the subfolder named ‘bin’ by typing ‘cd bin’. 6. Type ‘ls’, you will find three files named ‘dpca_importdata’, ‘dpca’, and ’dpca_peakcalls’, respectively. 7. Now type
> dpca_importdata > dpca > dpca_peakcalls
If you installed cisgenome correctly, you will be able to see some usage information after you type these two commands.
8. You can now start to use dPCA.
[Readme]
In order to know how to use dPCA, please read the following readme file.
Examples and sample parameter files:
(1) Basic dPCA (Note: the test data for STEP1 are just toy examples illustrating the data formats and the dpca_importdata function. We keep them small to avoid overloading our web server. The test data for STEP2 is a different dataset. It is the data used in our paper. In real applications, you should use data generated by STEP 1 as input for STEP 2.)
STEP1: run dpca_importdata Download the test data and regions here and run the command > dpca_importdata sample_importdata_arg.txt
(another more complicated sample file sample_importdata2_arg.txt)
STEP2: run dpca Download the data below and run the following command: > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1
(2) dPCA-P + R^B (Note: the test data for STEP1 and STEP 2 are just toy examples illustrating the data formats. The test data for STEP3 is a different dataset. It is the data used in our paper. In real applications, you should use data generated by STEP 2 as input for STEP 3.)
STEP1: run dpca_importdata Download the test data and regions here and run the command > dpca_importdata sample_importdata_arg.txt
STEP2: run peak calling > dpca_peakcalls -i sample_peakcall_sampledescription.txt -p sample_peakcall_experimentdesign.txt -d /user/cisgenome/bin
Here, -d specifies the folder that contains the cisgenome and dpca executable files.
(Here are two more complicated sample parameter files: sample_peakcall_sampledescription2.txt and sample_peakcall_experimentdesign2.txt)
STEP3: run dpca Download the data below and run the following command: > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1 -cm 0 -r Ebox_peakprob.txt
(3) dPCA-Z (Note: the test data for STEP1 and STEP 2 are just toy examples illustrating the data formats. The test data for STEP3 is a different dataset. It is the data used in our paper. In real applications, you should use data generated by STEP 2 as input for STEP 3.)
STEP1: run dpca_importdata Download the test data and regions here and run the command > dpca_importdata sample_importdata_arg.txt
STEP2: run peak calling > dpca_peakcalls -i sample_peakcall_sampledescription.txt -p sample_peakcall_experimentdesign.txt -d /user/cisgenome/bin
Here, -d specifies the folder that contains the cisgenome and dpca executable files.
(Here are two more complicated sample parameter files: sample_peakcall_sampledescription2.txt and sample_peakcall_experimentdesign2.txt)
STEP3: run dpca Download the data below and run the following command: > dpca -i Ebox_data.txt -d /home/ -o Ebox_output -t 1 -cm 0 -r Ebox_peakprob.txt
[Contact]
Hongkai Ji [hji@jhsph.edu] |