CRISPResso

Analysis of CRISPR-Cas9 genome editing outcomes from deep sequencing data

What is CRISPResso?

CRISPResso is a software pipeline for the analysis of targeted CRISPR-Cas9 deep sequencing data. This algorithm allows for the quantification of both non-homologous end joining (NHEJ) and homologous directed repair (HDR) occurrences.

What can I do with CRISPResso?

CRISPResso automatizes and performs the following steps summarized in the figure below:
  1. filters low quality reads,
  2. trims adapters,
  3. aligns the reads to a reference amplicon,
  4. quantifies the proportion of HDR and NHEJ outcomes,
  5. quantifies frameshift/inframe mutations (if applicable) and identifies affected splice sites,
  6. produces a graphical report to visualize and quantify the indels distribution and position.

As shown in the previous figure we provide also 4 companion tools, to cover other common sequencing strategies that are currently used to assess genome editing efficiency:

  • CRISPRessoPooled: a tool for the analysis of pooled amplicon experiments
  • CRISPRessoWGS: a tool for the analysis of WGS data or prealigned reads in .bam format
  • CRISPRessoCompare:a tool for the comparison of two CRISPResso analyses, useful for example to compare treated and untreated samples or to compare different experimental conditions
  • CRISPRessoPooledCompare: a tool to compare experiments involving several regions analyzed by either CRISPRessoPooled or CRISPRessoWGS
Please download the command line version of CRIPResso to use these companion tools from here.

Usage

CRISPResso requires two inputs: (1) paired-end reads (two files) or single-end reads (single file) in FASTQ format (fastq.gz files are also accepted) from a deep sequencing experiment and (2) a reference amplicon sequence to assess and quantify the efficiency of the targeted mutagenesis. The amplicon sequence expected after HDR can be provided as an optional input to assess HDR frequency. An sgRNA sequence (without PAM sequence) can be provided, to compare the predicted cleavage position to the position of the observed mutations. Coding sequence/s may be provided to quantify frameshift and potential splice site mutations.

The reads are first filtered based on the quality score (phred33) in order to remove potentially false positive indels. The filtering based on the phred33 quality score can be modulated by adjusting the optimal parameters (see additional notes below). The adapters are trimmed from the reads using Trimmomatic and then sequences are merged with FLASh (if using paired-end data).The remaining reads are then aligned with needle from the EMBOSS suite, an optimal global sequence aligner based on the Needleman-Wunsch algorithm that can easily accounts for gaps. Finally, after analyzing the aligned reads, a set of informative graphs are generated, allowing for the quantification and visualization of the position and type of outcomes within the amplicon sequence.

NHEJ events:

The required inputs are: a single file for single-end reads or two files for paired-end reads in FASTQ format (fastq.gz files are also accepted). The reads are assumed to be already trimmed for adapters (‘No Trimming’ is selected under the ‘Optional Parameters’ heading. If reads are not trimmed, select the adapters used for trimming under the ‘Trimming Adapter’ heading under the ‘Optional Parameters’ heading. The second required input is the reference amplicon sequence.

HDR events

The required inputs are: a single file for single-end reads or two files for paired-end reads in FASTQ format (fastq.gz files are also accepted). The reads are assumed to be already trimmed for adapters (‘No Trimming’ is selected under the ‘Optional Parameters’ heading). If reads are not trimmed, select the adapters used for trimming under the ‘Trimming Adapter’ heading under the ‘Optional Parameters’ heading. The reference amplicons with and without the donor sequence substituted must also be provided (‘Expected HDR Amplicon sequence’).

CRISPResso will quantify identified instances of mixed or sequential NHEJ-HDR. For example, a region is repaired by HDR using the donor template and then subsequently modified with indels through NHEJ repair (or vice versa). This outcome can be observed if the donor template is not immune to re-cleavage by Cas9.

Sometime it is possible to see mixed or sequential NHEJ-HDR in the reads (a sequence is perfectly repaired trough a HDR and the modified by a NHEJ or viceversa). CRISPResso will also quantify these events.

IMPORTANT:You must input the entire reference amplicon sequence (’Expected HDR Amplicon sequence’ is the reference for the sequenced amplicon, not simply the donor sequence). If only the donor sequence is provided, an error will result.

Frameshift/In-Frame and Splice Sites Analysis

In order to enable the frameshift analysis of CRISPResso, it is necessary to provide the subsequences of the reference amplicon sequence that correspond to coding sequences (not the whole exon sequence(s)!). If you amplicon sequences contains more than one coding exonic subsequence, please provide them separated by commas.

Troubleshooting

  • Please check that your input file(s) are in FASTQ format (compressed fastq.gz also accepted).
  • If you get an empty report, please double check that your amplicon sequence is correct and in the right orientation. It can be helpful to inspect the first few lines of your FASTQ file - the start of the amplicon sequence should match the start of your sequences. If not, check to see if the files are trimmed (see point below).
  • It is important to check if your reads are trimmed or not. CRISPResso assumes that the reads ARE ALREADY TRIMMED! If reads are not already trimmed, select the adapters used for trimming under the ‘Trimming Adapter’ heading under the ‘Optional Parameters’. This is FUNDAMENTAL to CRISPResso analysis. Failure to trim adaptors may result in false positives. This will result in a report where you will observe an unrealistic 100% NHEJ in the pie chart in figure 2 and a sharp peak at the edges of the reference amplicon in figure 4.
  • It is possible to use CRISPResso with single end reads. In this case, select ‘Single end reads’ under the ‘Experimental Design’ heading.
  • The quality filter assumes that your reads uses the Phred33 scale, and it should be adjusted for each user’s specific application. A reasonable value for this parameter is 30.
  • If you need to process large or many files, please consider installing the command line utility of CRISPResso on your machine. It is free and you can get here: https://github.com/lucapinello/CRISPResso.
  • Paired end sequencing files require overlapping sequence from the paired sequencing data.

The output of CRISPResso

The output of CRISPResso consists of a set of informative graphs that allow for the quantification and visualization of the position and type of outcomes within an amplicon sequence. An example is shown below:

All the processed raw data used to generate the figures are available in the following plain text files provided with the report:

  • Mapping_statistics.txt: this file contains number of: reads in input, reads after preprocessing (merging or quality filtering) and reads properly aligned.
  • Quantification_of_editing_frequency.txt: quantification of editing frequency: number of reads aligned, reads with NHEJ, reads with HDR, and reads with mixed HDR-NHEJ); In addition to each of these categories we also provide an overall report summarizing the total numbers of insertions, deletions and substitutions;
  • Alleles_frequency_table.txt: number or reads and percentage for each allele discovered in the sequencing data
  • Frameshift_analysis.txt: number of modified reads with frameshift, in-frame and noncoding mutations;
  • Splice_sites_analysis.txt: number of reads corresponding to potential affected splicing sites;
  • effect_vector_combined.txt: location of mutations (including deletions, insertions, and substitutions) with respect to the reference amplicon;
  • effect_vector_deletion.txt : location of deletions;
  • effect_vector_insertion.txt: location of insertions;
  • effect_vector_substitution.txt: location of substitutions.
  • position_dependent_vector_avg_insertion_size.txt: average length of the insertions for each position.
  • position_dependent_vector_avg_deletion_size.txt: average length of the deletions for each position.
  • indel_histogram.txt: processed data used to generate figure 1 in the output report.
  • insertion_histogram.txt: processed data used to generate the insertion histogram in figure 3 in the output report.
  • deletion_histogram.txt: processed data used to generate the deletion histogram in figure 3 in the output report.
  • substitution_histogram.txt: processed data used to generate the substitution histogram in figure 3 in the output report.

Try it now!

Are you still waiting to get the data of your experiment? In the meantime you can learn already how to use CRISPResso with our example dataset.

It only requires few clicks:

  1. Download these two files in your laptop:
  2. Open the CRISPResso website: http://crispresso.rocks
  3. Fill in the required fields and click 'Submit':
    • For the Fastq file R1 field: CLICK BROWSE and select the first file reads1.fastq.gz
    • For the Fastq file R2 field: CLICK BROWSE and select the second file reads2.fastq.gz
    • In the field Amplicon Sequence insert this without spaces:
      AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCC​CAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT
    • Insert your email address (Optional)
Just leave the page open and it will load the results once it has finished to running, and/or wait to receive a link to the analysis report by email.

Easy no?

Explore the output of CRISPResso

In order to help you to familiarize with the output of CRISPResso we provide several precomputed analyses, using the standard settings, for different simulated sequencing datasets with sequencing artifact modeled after the Illumina Miseq platform (using the ART simulation tool: http://www.niehs.nih.gov/research/resources/software/biostatistics/art/ ) and with known editing efficiency and mutagenesis profiles:

How to cite CRISPResso

If you use CRISPResso in your work please cite:

Pinello L, Canver MC, Hoban MD, Orkin SH, Kohn DB, Bauer DE, Yuan GC. Analyzing CRISPR genome-editing experiments with CRISPResso. Nat Biotechnol. 2016 Jul 12;34(7):695-697. doi: 10.1038/nbt.3583. PubMed PMID: 27404874.

Acknowledgements

We are grateful to Feng Zhang and David Scott for useful feedback and suggestions; the FAS Research Computing Team , in particular Daniel Kelleher, for great support in hosting the web application of CRISPResso; and Sorel Fitz-Gibbon from UCLA for help in sharing data. Finally, we thank all members of the Guo-Cheng Yuan lab for testing the software.

Need to run CRISPResso on your machine or cluster?

Pooled targets or WGS data?

Compare multiple conditions or with untreated samples?

We got you covered!

Try our command line utilities freely available here.

© Copyright by Luca Pinello