CHOPCHOP can be run using the default mode or by specifying advanced options. In the default mode, users must select the target (gene ID, chromosome coordinates or pasted sequence), the organism and the CRISPR mode (e.g. Cas9 knockout). Advanced options can be specified by clicking the ‘Options’ tab. The available options are explained below.
- Common errors
- CRISPR modes
- General advanced options
- CRISPR-specific options
- General advanced options
- TALEN-specific options
- Primer options
- Interpreting the results
1. Common errors
- Under some settings, the ‘Efficiency’ column in the results page will be populated with zeros. This is due to the incompatibility of selected settings (e.g. certain PAM sequences) with the generation of an efficiency score. In these cases, try other efficiency scoring systems, such as ‘Xu et al. 2015’, which work with more exotic PAMs since their model is based on less strict input.
- G20 is not a real model. It simply checks whether there is a G at position 20 of the gRNA target (i.e. 1 bp upstream of the PAM). Therefore, the ‘Efficiency’ column will return a 1 if there is a G at position 20, and a 0 if there is not.
2. CRISPR modes
CHOPCHOP will adjust the settings depending on the selected application.
Knock-out - For frameshift mutations in the gene of interest.
For Cas9 applications, we can predict the frameshift rate of each gRNA (Shen et al. 2018), which is displayed on the detailed gRNA results page. For most knock-out applications, we recommend: (1) Using traditional Cas9 (20bp-NGG); (2) Making sure the selected gRNA has no off-targets; (3) Ensuring the gRNA is expected to target all isoforms of the target gene (only available for select genomes in which there is isoform-level information); (4) Selecting gRNAs downstream of any in-frame ATG sites (green boxes in the blue coding region) to avoid expression of truncated proteins.
Knock-down - For targeting mRNA with CRISPR/Cas13. Human and mouse only.
This mode searches for off-targets in the transcriptome. Specify your Cas13 system of choice in ‘Options’ by specifying the protospacer flanking site (PFS) and length of the gRNA.
- MM0, MM1, MM2 and MM3 columns specify the number of off-target transcripts (with 0, 1, 2 or 3 mismatches, respectively) that your gRNA may bind to, outside of your target gene. The Isoforms columns specify which isoforms are targeted by each gRNA (with 0, 1, 2 or 3 mismatches). You can use this information to design gRNAs that knock-down specific isoforms or all isoforms of the selected gene.
- The ‘Constitutive’ column informs you whether a gRNA is targeting all isoforms of the specified gene. If yes, the value is 1.
- ‘Local structure’ values are a form of RNA accessibility score computed using RNAfold from the ViennaRNA package. Accessibility in the RNA is calculated in windows of 70 nucleotides, obtaining the probability that a given nucleotide position in the transcript is unpaired (and therefore accessible to a gRNA). For each target, we take the mean probability across each position. The smaller the value, the less likely there is secondary structure in the transcript at that gRNA target site.
Knock-in - For knocking in DNA sequence at the locus of interest.
We highly recommend users get acquainted with the different types of knock-ins available to choose the appropriate one for their experiments (for example, as reviewed in Nami et al. 2018). We below provide a short overview of the aspects to consider when choosing a method.
CRISPR knock-in allows for site-specific gene engineering via programmable nuclease-induced DNA double-strand break (DSB) and subsequently DSB repair by cellular repair pathways: both nonhomologous end-joining (NHEJ) and homology-directed repair (HDR). One key aspect to consider when deciding which type of technique to use is whether the modification will need to take place in dividing or non-dividing cells. HDR is restricted to the S and G2 phases of the cell cycle, and hence not suitable for knockins in nondividing cells (Nami et al. 2018).
Homology arm sequences are provided in the detailed gRNA results page. You can adjust the position of the microhomology arms in relation to the 5’ end of the gRNA (default: -3 bp from the PAM), and specify the arm length. It might be wise to check for complementarity between inserted sequence and your microhomology arms. In the defaults settings for knock-in, homology arms up to 800 bp can be obtained, if longer are desired (up to 2 kb), then the user needs to change size of the flanking sequence shown in the initial page (Options > General > Displayed flanking sequence length in detailed view).
Alternatively, we recommend users looking at single base changes to use a base-editor (Rees and Liu 2018; Gaudelli et al. 2017). This can be done without the need of a double stranded break, following engineering of a base editor (Hess et al. 2017), Cas9 fusion proteins that enable C-G to T-A conversion, and more recently also A-T to G-C transition (Rees and Liu 2018; Gaudelli et al. 2017). Depending on the base change desired and the species used, different methods can be used. We recommend using Addgene for this type of Cas9-based editing.
Additionally, insertion of single base as duplication of A or T at -4bp from the PAM can be achieved as probable repair outcome as described in Shen et al. 2018, only for traditional Cas9. CHOPCHOP also allows for basic prediction of repair outcomes from Shen et al. 2018 (Options > Cas9 > Repair prediction), but for complete repair profile visit Shen et al. 2018 original website here.
Activation/Repression - For targeting promoter regions.
Default targets: activation mode: 300 bp upstream of the transcription start site (TSS); repression mode: 200 bp downstream and upstream of the TSS. It is recommended to use more than one gRNA for these applications.
Nanopore Enrichment - for Oxford Nanopore enrichment with the use of CRISPR.
It is recommended to have restrictive prefiltering conditions - by default we require gRNAs to have self-complementarity of 0 and GC content between 10 and 90%. In this mode, CHOPCHOP allows targeting of genomic regions up to 40 kb in size.
3. GENERAL ADVANCED OPTIONS
Gene identifiers: Gene IDs are retrieved from the ENSEMBL (ensGene) and RefSeq (refGene) tables from the UCSC genome browser, or from gene tables provided to us by users. Isoform targeting only supports the latest ENSEMBL (or GENCODE) identifiers and common gene names e.g. mt2a.
Target a specific region of the gene: You may choose to target: (i) only the coding region (default), (ii) the entire exonic sequence (including 5' and 3' UTR), (iii) splice sites, (iv) 5' UTR only, (v) 3' UTR only, (vi) the promoter (specify how many basepairs upstream and downstream of the TSS you would like to search, or (vii) a specific exon/subset of exons. If you wish to target an intron, specify the genomic coordinates of the intron (max size, 40,000 bp).
Restrict targeting: When searching for target sites in a region of interest, in the default mode CHOPCHOP allows the gRNA or one TALE to bind just outside of that region so that the cut (in most cases) still occurs within the targeted region. You can turn off this functionality.
Isoform consensus: When targeting the whole gene, genes with multiple isoforms can be targeted with the ‘Intersection’ or ‘Union’ mode: in Intersection mode, CHOPCHOP only searches for gRNAs present in every isoform, so you will target all of the isoforms with a given gRNA. In Union mode, CHOPCHOP searches for gRNAs in every exon of every isoform. Therefore, you can use this mode to target one (or more) specific isoforms.
Pre-filtering: It is possible to pre-filter gRNAs based on the GC content and the self-complementarity score (set to -1 to disable). The gRNAs that do not fulfill these requirements will not be reported in the final result table.
Restriction company preference: Some users may choose to assess successful mutagenesis using restriction enzyme digestion. CHOPCHOP displays restriction sites at the target locus, and allows you to restrict the search to restriction sites of enzymes from a particular company.
4. CRISPR-SPECIFIC OPTIONS
sgRNA length: According to recent papers e.g. (Fu et al., 2014), using truncated sgRNAs may improve specificity. You can select different Cas9/Cpf1 sgRNA lengths or keep the standard 20 nt (default).
sgRNA PAM sequence: The Cas9 3’ PAM default is NGG; the Cpf1 5’ default is TTN. You can alternatively select a PAM from an orthologous type II CRISPR/Cas system (Fonfara et al., 2014) or enter a custom PAM.
Self-complementarity: Data suggests that self-complementarity within the gRNA or between the gRNA and RNA backbone can inhibit gRNA efficiency (Thyme et al., 2016 ). This option searches for complementarity within the gRNA, and between the gRNA and either a standard backbone (AGGCTAGTCCGT), an extended backbone (AGGCTAGTCCGT, ATGCTGGAA) or a custom backbone. Some users will choose to replace the leading nucleotides of their gRNA with “GG” for T7 transcription. Check this box to search for complementarity with the GG replacement.
Method for determining off-targets in the genome: There are three options. According to (Cong et al., 2013), single-base mismatches up to 11 bp 5' of the PAM completely abolish cleavage by Cas9. However, mutations further upstream of the PAM retain cleavage activity. We have created a uniqueness method that searches for mismatches only in the first 9 bp, since a mismatch further towards the PAM motif is predicted to cause no cleavage. According to (Hsu et al., 2013), mismatches can be tolerated at any position except in the PAM motif. We have therefore created a second uniqueness method that searches for mismatches only in the 20 bp upstream of the PAM. This is the default method.
Efficiency score: We have implemented a number of efficiency scores based on the current literature. They are normalized to a 0-1 interval and are displayed in the “Efficiency” column. The simplest form of efficiency score is “G20”, which prioritizes a guanine at position 20, just upstream of PAM. The other methods use more sophisticated metrics (see references below).
Doench et al. 2014 - only for NGG PAM
Doench et al. 2016 - only for NGG PAM (default)
Chari et al. 2015 - only NGG and NNAGAAW PAM's in hg19 and mm10
Xu et al. 2015 - only for NGG PAM (but can be used for other PAMs)
Moreno-Mateos et al. 2015 - only for NGG PAM
Repair profile prediction: Using the model from Shen et al. 2018 , it is possible to predict the DNA repair profile for a given gRNA. For this option, select which cell type you use (mESC - default, U2OS, HEK293, HCT116, K562), or use mESC if you don’t know the cell type you are targeting. You can also disable this option as it is very time consuming. Predictions are available in the detailed view of the selected gRNA.
Requirements for the 5' end of the sgRNA: Depending on the polymerase used for gRNA synthesis, you may wish to limit the 5' dinucleotides to, for example, 5' GN- (for the U6 promoter) or 5' GG- (for T7 polymerase).
Distance between Cas9 guides: Cas9 nickases are effective within a range of distances. The default distance measured in (Shen et al., 2014) is 10-31 bp, but you can change this to your preference.
Maximum distance between off-targets: CHOPCHOP searches the genome for off-targets with 0, 1, 2 or 3 mismatches. In addition, in nickase mode each pair of gRNAs is evaluated for off-targets where binding and cutting within a given distance (default: 100 bp) would result in a double-strand break. This distance can be changed to your preference.
5'PAM: Cpf1 recognizes a T-rich 5’ PAM. CHOPCHOP offers a 5’ TTTN PAM search option, 5’ TTN (default), or you can specify your own PAM.
5. TALEN-SPECIFIC OPTIONS
Distance between TALENs: TALENs seem to be effective at cutting at a range of distances, but the architecture may depend upon the assembly kit being used. The default distance is 14-20 bp but you can change this to your preference.
Number of mismatches when searching off-targets: CHOPCHOP can search the genome for off-targets with 0, 1, 2 or 3 mismatches. The greater the number of mismatches, the more rigorous the calculation of off-target activity, but the longer it takes for the program to run.
RVD preference: Depending upon the assembly kit being used, you may choose to use either the repeat variable di-residue (RVD) 'NN' for guanines, or 'NH', which has been shown to bind guanines more specifically than 'NN' (Cong et al., 2012, Streubel et al. 2012).
6. PRIMER OPTIONS
In order to analyze whether Cas9, Cpf1 or TALENs have successfully cleaved the target locus, users may need to amplify the region of interest for further analysis by deep sequencing or a T7E1 assay. CHOPCHOP integrates primer design with gRNA/TALEN target site design using Primer3. Primers are designed to amplify the target cut site, and are mapped against the genome to avoid off-targets producing amplicons of similar length. You can adjust the amplicon size, primer Tm, primer length, and the minimum distance between each primer and the target site. We are currently unable to support primer design for the isoform targeting mode using Cas13.
7. INTERPRETING THE RESULTS
CHOPCHOP displays the results of the query in a dynamic visualization and interactive table. The dynamic visualization displays all of the target site options for the given region color-coded according to our scoring system: green (best), amber (okay), and red (bad). In all cases the gene is displayed 5' to 3'. All isoforms of the gene are displayed with their names, and downstream, inframe ATG sites (green box). Click on a target in the visualization or an option in the table to be taken to an individual gRNA results page containing information about any off-targets, repair predictions (where applicable), microhomology arms (knock-in mode), and the restriction sites and primer designs for that region.
Off-targets: CHOPCHOP lists how many off-targets each target site has with 0 (“MM0”), 1 (“MM1” etc.), 2 or 3 mismatches. Clicking on the result will reveal more information about the off-targets.
Ranking: Target sites are ranked according to: (i) efficiency score (Cas9 mode, see above),
(ii) number of off-targets and whether they have mismatches,
(iii) existence of self-complementarity regions longer than 3 nt (Thyme et al., 2016 , the number indicates how many regions of self-complementarity are predicted),
(iv) GC-content (Cas9 mode): sgRNAs are most effective with a GC-content between 40 and 70% (Wang et al. 2014, Tsai et al. 2015),
(v) location of sgRNA within a gene (5’ (best) -> 3’ (worst)).
Due to the few sequence restrictions for TALEN designs, there are many options to choose from. CHOPCHOP therefore suppresses some output and groups similar TALEN pairs in a 'cluster'. The results table and visualization only displays the highest rank in each cluster, but other cluster members can be seen on the individual results pages.
More information about how we rank our results is available from the Scoring page.
Individual results page: This provides (i) a visualization of the target site (with the predicted cut site in blue in Cas9/Cpf1 mode), (ii) Primer designs (purple), (iii) restriction sites (green - unique in the amplicon, red - not unique), (iv) details about the off-targets (genomic location, number of mismatches and sequence of off-targets), (v) primer designs.