1. Analysis workflow

Spiker is specifically designed to analyze ChIP-seq data with spike-in control.

2. Input and output

The input files can be single-end or paired-end FASTQ files or pre-aligned BAM files. It supports both narrow peak (such as H3K27ac) and broad peak analysis (such as H3K79me2 in this tutorial).

Input single-end fastq files

$ spiker.py --t1 H3K27ac.fastq.gz --c1 control.fastq.gz --bt2-index /data/GRCh38 --spikeIn --csf 1.23 --tsf 0.95 -o H3K27ac
$ spiker.py --broad --t1 H3K27ac.fastq.gz --c1 control.fastq.gz --bt2-index /data/GRCh38 --spikeIn --csf 1.23 --tsf 0.95 -o H3K27ac

Input paired-end fastq files

$ spiker.py --t1 H3K27ac_R1.fastq.gz --t2 H3K27ac_R2.fastq.gz --c1 control_R1.fastq.gz --c2 control_R2.fastq.gz --bt2-index /data/GRCh38 --spikeIn --csf 1.23 --tsf 0.95 -o H3K27ac
$ spiker.py --broad --t1 H3K27ac_R1.fastq.gz --t2 H3K27ac_R2.fastq.gz --c1 control_R1.fastq.gz --c2 control_R2.fastq.gz --bt2-index /data/GRCh38 --spikeIn --csf 1.23 --tsf 0.95 -o H3K27ac

Input BAM files

$ spiker.py -t H3K27ac.sorted.bam -c control.sorted.bam --spikeIn --csf 1.23 --tsf 0.95 -o H3K27ac
$ spiker.py --broad -t H3K27ac.sorted.bam -c control.sorted.bam --spikeIn --csf 1.23 --tsf 0.95 -o H3K27ac

Output files

ENCODE narrowPeak

ENCODE gappedPeak

bigWig

bedGraph

3. Spiker.py options

Options:

--version: show program’s version number and exit
-h, --help: show this help message and exit
--t1=CHIP_R1: FASTQ file (read1) for ChIP sample. Can be regular plain text file or compressed file (.gz, .bz2). Mutually exclusive with ‘-t’.
--t2=CHIP_R2: FASTQ file (reas2) for ChIP sample. Can be regular plain text file or compressed file (.gz, .bz2). Mutually exclusive with ‘-t’. Ignore this for single- end sequencing.
-t CHIP_BAM, --treat=CHIP_BAM: BAM file of ChIP sample. The BAM file must be sorted and indexed. Mutually exclusive with ‘–t1’ and ‘– t2’.
--c1=CTRL_R1: FASTQ file (read1) for Control sample. Can be regular plain text file or compressed file (.gz, .bz2). Mutually exclusive with ‘-c’.
--c2=CTRL_R2: FASTQ file (reas2) for Control sample. Can be regular plain text file or compressed file (.gz, .bz2). Mutually exclusive with ‘-c’. Ignore this for single- end sequencing.
-c CTRL_BAM, --control=CTRL_BAM: BAM file of Control sample. Mutually exclusive with ‘ –c1’ and ‘–c2’. The BAM file must be sorted and indexed.
-o OUTFILE, --output=OUTFILE: Prefix of output files.
--bt2-index=BT2_INDEX: The prefix (minus trailing .X.bt2) for bowtie2 index files. Ignore this option if BAM files were provided by ‘-t’ and ‘-c’.
-n N_READS: Number of alignments from the BAM file used to tell the sequencing layout (PE or SE), and estiamte the fragment size ‘d’. default=1000000
-g G_SIZE, --genome-size=G_SIZE: Effective genome size. It can be 1.0e+9 or 1000000000, or shortcuts:’hs’ for human (2.7e9), ‘mm’ for mouse (1.87e9), ‘ce’ for C. elegans (9e7) and ‘dm’ for fruitfly (1.2e8). default=hs
-p N_THREADS, --proc=N_THREADS: Number of threads. default=8
--mfold=M_FOLD: Select the regions within MFOLD range of high- confidence enrichment ratio against background to build model. Fold-enrichment in regions must be lower than upper limit, and higher than the lower limit. Use as “-m 10 30”. DEFAULT:5 50
--spikeIn: Set this flag if ChIP and control samples contains exogenous reads as splike-in. Please note, you also need to specify –tsf and –csf.
--tsf=TREAT_SF: Scaling factor for treatment. This will be applied to the pileup bedgraph file of treatment (*.treat.pileup.bdg).
--csf=CONTROL_SF: Scaling factor for control. This will be applied to the pileup bedgraph file of maximum background (*.control.pileup.max.bdg).
--q-peak=Q_CUTOFF: Qvalue cutoff for peaks. default=0.05
--q-link=Q_LINK_CUT: Qvalue cutoff for linking regions. default=0.1
--bw: If set, generate bigwig files for ChIP pileup and control pileup.
--maxgap=MAX_GAP: maximum gap between significant points in a peak. default=100
--broad: If set, call broad peaks.
--frip: If set, calculate FRiP (the Fraction of Reads In called Peaks) score using the BAM and peak files.
--cleanup: If set, clean up the intermediate files. When not set, intermediate files are kept so that rerun the workflwo will be much faster.
--refine: If set, detect peak summit position.
--verbose: If set, print detailed information for debugging.

4. split_bam.py options

Options:

--version: show program’s version number and exit
-h, --help: show this help message and exit
-i BAM_FILE: BAM file of the composite genome (such as human + fly)
-o OUT_PREFIX, --output=OUT_PREFIX: Output prefix. The original BAM file will be split into four BAM files: ‘prefix_human.bam’, ‘prefix_exogenous.bam’, ‘prefix_both.bam’, ‘prefix_neither.bam’.
-p CHR_PREFIX, --exo-prefix=CHR_PREFIX: Prefix added to the exogenous chromosome IDs. For example. ‘chr2L’ -> ‘dm6_chr2L’. default=dm6_
-q MAP_QUAL, --mapq=MAP_QUAL: Mapping quality (phred scaled) threshold. Alignments with mapping quality score lower than this will be assigned to ‘prefix_neither.bam’. default=30
--threads=N_THREAD: Number of threads to use for BAM sorting. default=1