CIpipe:
​CRISPR indel pipe

Apr22, 2016
​Yingxiang Li


Content

  • Introduction
  • Installation
  • Synopsis
  • Commands and Options
  • Workflow Charts
  • Annotation of example.input.tab
  • Test Cases
  • Thanks

Introduction

CRISPR-Cas9 is a powerful tool for sequence-specific genome editing. The Cas protein cuts genomic DNA at locations complementary to a single guide RNA. Insertions and deletions (indels) often result when the cuts are repaired. Currently, there is no easy-to-use computational pipeline to determine the locations, identities, and frequencies of the indels. We have developed a pipeline, named CIpipe (CRISPR Indel pipeline), to identify indels in high-throughput DNA sequencing data and provide the statistical characterization of these indels.

Installation

CIpipe can only run on Mac OS or Linux OS.
You need python 2.7.10, R 3.2.2, bwa 0.7.5a; fastqc v0.11.2; samtools 1.3; java 1.7.0_95 first.
After installation of pip, type in your terminal:

(sudo) pip install CIpipe (--upgrade)

Synopsis

CIpipe -R ref.fa -D data/ -O output/
CIpipe -R ref.fa -D data/ -O output/ -N test1
CIpipe -R ref.fa -D data/ -O output/ -N test1 -P 0.01 -B 15 -A 0.001
CIpipe -R ref.fa -D data/ -O output/ -N test1 -F -X -VI
CIpipe -R ref.fa -D data/ -O output/ -N test1 -VS -VC -VR
CIpipe -R ref.fa -D data/ -O output/ -N test1 -T chr1:100 -US 20 -DS 20


CIpipe -E
CIpipe -I test2.input.tab

Commands and Options

For a single sample analysis.
​

​CIpipe -R reference -D data -O output 


optional arguments:
​

-h, --help 

-R, --reference 

-D, --data 

 
​-O, --output

​-N, --name 


-RK, --rank 

-P, --pvalue 

-B, --basequality 

-A, --varfreq 

-T, --target

​

​-US, --upstream 

-DS, --downstream

-F, --fastqc         

-X, --index           

-U, --unlimited       

-VI, --indel          

-VS, --snp            

-VC, --consensus     

-VR, --readcount     
show this help message and exit

sample reference file, fasta format. (eg: my_ref.fa)

sample data directory, fastq-ONLY. one file for single end, two files for paired end. (eg: my_data/)

output directory, will be created if not exists. (eg: my_output/)     

​sample name, default is name of output directory. (eg: my_sample)


sample rank. (eg: 1)

minimal p value, default: 0.05.

minimal base quality, default: 30.

minimal variant frequency, default: 0.0001.

CRISPR target position. indel in target range will be picked out, mutiple targets separated by ',', default: ''. (eg: gene1:100,gene2:200)

​
up stream distance from CRISPR target position, default: 20.

down stream distance from CRISPR target position, default: 10.

fastq quality control by FastQC, default: ON. -F will turn OFF.

build reference index by BWA, default: ON. -X will turn OFF.

no read depth limit in mpileup by SAMtools, default: OFF.

search for indel by VarScan, default: ON. -I will turn OFF.

search for SNP by VarScan, default: OFF.

search for consensus call by VarScan, default: OFF.

search for read counts by VarScan, default: OFF.
For multiple samples ​and advanced analysis.

CIpipe -E 
CIpipe -I input

​
optional arguments:
​

-h, --help

​-E, --example



​-I, --input


 
show this help message and exit

​create example input data. modify the example.input.tab to fit your data.


​information table of all input data. all settings should be in it. (eg. example.input.tab)​


Workflow Charts


For a single samples analysis.
​

Picture
Picture


For multiple samples and advanced analysis.​
​

Picture
Picture

Annotation of example.input.tab 

Type: CIpipe -E to create an example.input.tab​.
Then modify it to fit your real data.
example.input.tab
File Size: 0 kb
File Type: tab
Download File

example.input.xlsx
File Size: 36 kb
File Type: xlsx
Download File

Picture

Test Cases

Data: parts from 
Li,Y. et al. (2015) A versatile reporter system for CRISPR-mediated chromosomal rearrangements. Genome Biol., 16, 111.
full data is: PRJNA283020. Here I only get the top 20,000 lines of each fastq file.
​
srr2007490.zip
File Size: 12539 kb
File Type: zip
Download File

refer.zip
File Size: 2 kb
File Type: zip
Download File

srr2007491.zip
File Size: 12363 kb
File Type: zip
Download File

srr2007493.zip
File Size: 11846 kb
File Type: zip
Download File

1. Download the data: SRR2007490, SRR2007491, SRR207493. (12.5MB, 12.4MB, 11.8MB)
​2. Download the reference:  refer.zip (LSL_1008bp.fa, iGFP_448bp.fa). (2KB)
3. For a single sample analysis (name: test1):
  3.1. Extract refer.zip to refer/
  3.2. Extract SRR2007490 to data/SRR2007490/
  3.3. After the installation of CIpipe, in the terminal, type:

      CIpipe -R refer/LSL_1008bp.fa -D data/SRR2007490/ -O output/ -N test1
​

  3.4. CIpipe will show the progress on your terminal screen like this:
Picture
  3.5. The files in output/test1/result folder include:
       test1.data.infor.txt  (the map and data information)
Picture
       ​test1.indel.brief.tab  (the brief indel result from VarScan/test1.indel.tab)
Picture
       test1.indel.potential.LSL_1008bp:88.tab (the indel target position range result. if user didn't point out the cut position, CIpipe will assume that the position with the max varfreq was the cut position and add a 'potential' in the file name.)
       test1.indel.potential.LSL_1008bp:88.pdf (the indel target position region detail plot. it's ordered by positions and from small to large and indel types from deletion to insertion)
Picture
       test1.indel.potential.LSL_1008bp:88.sort.pdf (the indel target position detail plot. it's ordered by variant frequency from high to low.)
Picture
4. For multiple samples and advanced analysis:
  4.1. Extract refer.zip to refer/
  4.2. Extract SRR2007490, SRR2007491, SRR207493 to
      
data/SRR2007490/, data/SRR2007491/, data/SRR2007493/
  4.3. In the terminal, type:
      CIpipe more -E 
      exmaple.input.tab
will be generated in the current working directory like this:
example.input.tab
File Size: 0 kb
File Type: tab
Download File

Picture
  4.4. Open example.input.tab and modify it to test2.input.tab as follows:
test2.input.tab
File Size: 0 kb
File Type: tab
Download File

Picture
  4.5. In the terminal, type:

      CIpipe -I input/test2.input.tab

​  4.6. CIpipe will show the progress on your terminal screen like this:
Picture
  4.7. In result folder of each sample, there are such files (example: LSL2):
       LSL2.data.infor.txt (the map and data information)
       LSL2.indel.brief.tab (the brief indel result)
       LSL2.indel.LSL_1008bp:88.tab (the indels only in target region, if user pointed out the cut position, there will be no 'potential' in the file name.)
​       LSL2.indel.LSL_1008bp:88.pdf (the indel target position region detail plot)
       LSL2.indel.LSL_1008bp:88.sort.pdf (the indel target position region detail plot)

Picture
 4.8. In the result folder of batch (test2.result/), there are such files:
      test2.indel.iGFP_448bp.mat (indels across all iGFP samples)
      test2.indel.LSL_1008bp.mat (indels across all LSL samples)
test2.indel.lsl_1008bp.mat
File Size: 0 kb
File Type: mat
Download File

Picture

​5. You can change all kinds of parameters to filter the results. For example, you can change the p value to 0.01 to get a stricter indels result table; change base quality to 15 to get more potential indels.

Thanks

We thank the members of the Weng and Xue laboratories for helpful discussions, in particular Chunqing Song, Pengpeng Liu, Yu Fu, Tyler Borrman, Michael Purcaro and Arjan van der Velde for their insightful suggestions. 

✕