TIDER provides rapid and reliable assessment of template-mediate genome editing. It quantifies the efficiency of homology-directed repair ("HDR") in an edited sample by decomposing the sequence trace data from three simple Sanger reactions.
The input to TIDER is Sanger sequencing data.
The output of TIDER is a comprehensive profile of all insertions and deletions (indels) in the edited sample and the frequency of the desired reference (or template) sequence.
The TIDE Software is being provided as a free web service for research, educational, instructional and non-commercial purposes only. This webtool and the associated R code are open source software under GNU General Public License version 3. Your uploaded data are only used for the duration of the analysis session and are not stored or used for any other purpose.
All copyright is exclusively owned by Stichting het Nederlands Kanker Insituut - Antoni van Leeuwenhoek ziekenhuis (The Netherlands Cancere Institute). The availability and use of this software is subject to a license from the copyright holder. If you use this software for data analysis in a publication, please cite (Brinkman et al, Nucl. Acids Res. (2018)).
R code of the TIDER can be provided upon request. Contact the Bas van Steensel lab.
This web tool was developed by Eva Brinkman, Christ Leemans and Bas van Steensel from the Bas van Steensel lab.
For more information and to report bugs, please contact support@datacurators.nl
R
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. www.R-project.org . R version 3.1.1.
Biostrings
H. Pages, P. Aboyoun, R. Gentleman and S. DebRoy. Biostrings: String objects representing biological sequences, and matching algorithms. R package version 2.32.1.
sangerseqR
J.T. Hill and B. Demarest (2014). sangerseqR: Tools for Sanger Sequencing Data in R. R package version 1.3.1. http://www.bioconductor.org/packages/devel/bioc/html/sangerseqR.html
nnls
K. M. Mullen and I. H. M. van Stokkum. The Lawson-Hanson algorithm for non-negative least squares (NNLS). R package version 1.4.
msa
E. Bonatesta, C. Horejs-Kainrath, and U. Bodenhofer. Multiple Sequence Alignment. R package version 1.6.0.
plyr
H. Wickham. Tools for Splitting, Applying and Combining Data. R package version 1.8.4.
shiny
RStudio and Inc. (2013). shiny: Web Application Framework for R. R package version 1.0.0. http://shiny.rstudio.com
Currently, ABIF (.ab1) and SCF (.scf) files are supported. SCF is an open standard and several tools exist to convert other formats to SCF files.
The following parameters have default settings but can be adjusted if necessary by checking the 'advance settings' box.
The window used to align control and test sequences to determine any offset between the two reads. Default settings are recommended, except when long repetitive sequences are present.
left boundary:
Default is 100, because the beginning of a Sanger sequence trace is often of poor quality.
right boundary:
This is automatically set to break site minus 10bp
The sequence segment used for decomposition. It is mandatory that the decomposition window covers the location of the designed mutation(s) in the donor template. Default is a 100bp window that starts 20bp upstream of the break site. TIDER performance may improve with smaller window sizes if the designed mutations are subtle (e.g. one or two single-nucleotide substitutions). The window may also be adjusted if part of the sequence read is of low quality or contains repetitive sequences. If possible, settings are automatically corrected in case of invalid values.
left boundary:
20bp downstream of the break site.
right boundary:
left boundary + 100bp
Maximum size of base pairs upstream of the break site to be considered in the decomposition window. Default is 20.
Maximum size of non-templated deletions to be modeled. Default is 10.
The maximum size of non-templated insertions is fixed to 5.
Once the data are uploaded and parameters are set, submit the data by clicking on the "update view" button. After 10-30 seconds the plots will appear in the "Decomposition" tab. If the settings are incorrect or too stringent, warnings or error messages will be displayed.
A depiction of the best combination of the models in the decomposition matrix (all possible indels and designed mutation in the template) in the experimental sample as determined by non-negative linear modelling. An R2 value is calculated as a measure of the goodness of fit, and the statistical significance of the detection of the incorporation frequency of template-directed mutation(s).
The signals of all nucleotides: A, G, T, C at each position in the sequence file are used. In general, each position in the sequence trace is represented by one predominant nucleotide signal indicative of the actual nucleotide. The minor signals from the other three nucleotides are normally considered as background. The percentage of these aberrant nucleotides is plotted along the sequence trace of the control and the experimental sample. Thus, a value of 0% at a position indicates that the detected nucleotide does not differ from the control sequence while a value of 100% indicates that the expected nucleotide was not detected at all (and instead only one or more of the other three nucleotides). The percentages of aberrant nucleotides in the control should be low along the whole sequence trace. However, the experimental sample consists of a mixture of multiple sequences due to the presence of indels and possible point mutations. Around the break site the sequences start to deviate from the control, which is visible with consistently elevated signal of the aberrant sequence signal. Note that there is a 25% chance that an identical nucleotide in a mutated sequence is found as is present in the control sequence at the same position, because there are only 4 different nucleotides available.
Two additional quality plots are generated. In one, the aberrant signal of the reference trace compared to the control trace is plotted. This can be used to verify whether the designed mutation(s) is/are present at the expected location. In the second one, the percentage of the designed mutation(s) present in the experimental sample is plotted, representing the relative incorporation of the donor template.
Quality measures: Results depend on the quality of the sequence reads. As a rule of thumb, we recommend to aim for an average aberrant sequence signal strength before the breaksite < 10% (both control and test sample), and R2 > 0.9 for the decomposition result.
A plot will be shown here when the valid sequencing files and guide string have been uploaded.
Check the following criteria to determine the quality of your data (use the quality plots)
Alignments
Aberrant sequence signal (quality plot1)
Designed sequence signal - reference (quality plot2)
Designed sequence signal - test sample (quality plot3)
For TIDER, 3 PCR amplicons (all from the same primer set) are needed:
Amount | Sample |
---|---|
21-x µL | H2O |
2 µL | primer a (10 µM stock) |
2 µL | primer b (10 µM stock) |
x µL | genomic DNA (~50ng) |
25 µL | 2x pre-mix of buffer, Taq polymerase and dNTPs (e.g. BioLine MyTaq, BIO-25044) |
Step | Temperature | Time, min:sec | Number of cycles |
---|---|---|---|
Initial denaturation | 95 °C | 1:00 | 1 |
Denaturation | 95 °C | 0:15 | |
Annealing | 58 °C | 0:15 | 25 |
Extension | 72 °C | 0:10 | |
4 °C | hold |
Amount | Sample |
---|---|
21-x µL | H2O |
2 µL | primer a (10 µM stock) |
2 µL | primer c (10 µM stock) |
x µL | genomic DNA (~50ng) |
25 µL | 2x pre-mix of buffer, Taq polymerase and dNTPs (e.g. BioLine MyTaq, BIO-25044) |
Amount | Sample |
---|---|
21-x µL | H2O |
2 µL | primer d (10 µM stock) |
2 µL | primer b (10 µM stock) |
x µL | genomic DNA (~50ng) |
25 µL | 2x pre-mix of buffer, Taq polymerase and dNTPs (e.g. BioLine MyTaq, BIO-25044) |
Step | Temperature | Time, min:sec | Number of cycles |
---|---|---|---|
Initial denaturation | 95 °C | 1:00 | 1 |
Denaturation | 95 °C | 0:15 | |
Annealing | 58 °C | 0:15 | 25 |
Extension | 72 °C | 0:10 | |
4 °C | hold |
Amount | Sample |
---|---|
48 µL | annealing buffer (=10 mM Tris, 50mM NaCl, 1mM EDTA) |
1 µL | PCR mix1 |
1 µL | PCR mix2 |
Amount | Sample |
---|---|
18 µL | H2O |
2 µL | primer a (10 uM stock) |
2 µL | primer b (10 uM stock) |
3 µL | annealed oligo mix |
25 µL | 2x pre-mix of buffer, Taq polymerase and dNTPs (e.g. BioLine MyTaq, BIO-25044) |
Step | Temperature | Time, min:sec | Number of cycles |
---|---|---|---|
Extension | 72 °C | 0:15 | 1 |
Denaturation | 95 °C | 0:15 | |
Annealing | 58 °C | 0:15 | 25 |
Extension | 72 °C | 0:10 | |
4 °C | hold |
We strongly recommend that all three PCR products (control, reference and experimental sample(s)) are sequenced in parallel. Either primer a or b may be used. Sequence trace files must be saved in .ab1 or .scf format.
The requirements of sequence length are flexible. The region upstream of the break site is used to align the sequencing traces. The region from -20 to +80 relative to the break site (the decomposition window) is used for the actual calulations, but can be shortened or extended a bit useing the Advanced settings. We advise to sequence a stretch of DNA ~700bp enclosing the designed editing site. The projected break site should be located preferably ~200bp downstream from the sequencing start site. The designed mutations should be within 20 bp of the break site.
Note that often with shorter sequences than 700 bp, the break site is too close to the start of the sequence read in the default setting (see figure). The alignment window can be changed under Advanced settings.
R2 is a measure for the reliability of the estimated values. For example, if the R2 value is 0.95, it means that 95% of the variance can be explained by the model; the remainder 5% consists of random noise, very large indels, non-templated point mutations, and possibly more complex mutations. If R2 is < 0.9, it usually means that the quality of the sequence reads is low, which compromises the accuracy of the TIDER estimates.
The overall efficiency refers to the estimated total fraction of DNA with mutations (templated mutations plus non-tempalted indels) around the break site. It is calculated as R2 - % wildtype.
The different bars represent the different insertions, deletions and templated mutations in the population. For example, if the estimated HDR fraction is 20%, then 20% of the DNA molecules in the cell pool are predicted to carry the designed mutation. You can not tell for an individual cell what the specific mutation of each allel is. To determine allel specific information you have to isolate a cell clone and perform TIDER analysis.
The different bars represent the different insertion, deletion or homologous directed repaired mutations in the allels in a cell clone. With a diploid cell you should get a percentage of ~50% per mutation.
TIDER can determine how large the non-templated indels are, but not accurately which nucleotides are inserted or deleted. To know the precise sequence of the non-templated mutations you can use next generation sequencing or Sanger sequencing of individual cloned DNA molecules.
TIDER is able to discrimate natural occuring deletions and insertions from templated indel. In general TIDER is able to discrimate natural occuring deletions and insertions from templated "designed" indel. Only in the presence of a small designed deletion (-1, -2) near the expected break site the designed mutation may be underestimated somewhat. Also, in case the designed mutation consists of an insertion larger than +1, TIDER does not consider natural insertions of the same size, because we found the decomposition to become less robust, and because we and others have rarely observed natural insertions larger than +1.
We recommend that results are verified by sequencing of the opposite strand. Note, when designed mutations are present >20 bp away from the break site this may confound TIDER estimates when such distal mutations are combined with mutations close to the break site. It has been reported that the incorporation of donor template is less efficient when the designed point mutations are further away from the break site. By comparing different settings for the decomposition window and by visual inspection of the TIDER plots it is possible to infer such biases
TIDER is currently only designed for regular Cas9. But it can be tricked to analyze data from another nuclease, provided it creates a blunt cut with a precisly predicalbe location. TIDER assumes that the dsDNA break is induced between nucleotides 17 and 18 in the sgRNA sequence. If you do not know the exact cutting position, then TIDER results are not reliable. In the future we hope to include functionality for multiple nuclease types.
In theory, TIDER works for both short or long inserts. For the TIDER algorithm, the size of the insert does not matter since it decomposes a mixture of peaks at a specific location in the sanger sequence trace. As long as a reference sequence file can be created (or ordered as synthetic DNA) that contains the to-be introduced insert. To generate reference file for inserts <50 bp the two-step PCR protocol can be used. It is advised to include at least 10 complementary nucleotides on the 3 ? side of the insert for the primers carrying the insert. Alternatively, TIDE can also be used for these inserts. The maximum indel settings should be changed to size 50 bp. The frequency of same sized insertion as in the donor template represents the HDR frequency. However, this is not possible for insert >50bp then the 'read length' of the sequencing is a limiting factor. For larger inserts, it may be possible that the donor template itself can function as template for reference PCR in case the homoloy arms carry primer a, b site.
Potentially, a donor template that was transfected into the cells could co-purify with genomic DNA and be co-amplified in the PCR if it contains the primer sequences. This could result in an overestimation of the HDR events. This is generally not a problem with short ssODN donors, but with plasmid templates with long homology arms the primers a,b should be chosen outside of these homology arms. Alternatively, the donor plasmid may be cleared from the cells by a few passages of culturing.
If the page turns grey, there may be one of two problems. (1) You are using an incompatible web browser. For a list of compatible browsers, please check at the bottom left of the TIDER page. (2) The firewall of your institute does not allow WebSocket connections. This is essential for TIDER. Before you contact us, please first try to access TIDER from a different location outside your institute, and talk to your systems administrator. If you continue to experience problems please contact us and let us know the nature of the problem and the exact date and time when you tried to access TIDER. Your feedback is helpful for us to improve this webtool.
Unfortunately, some .ab1 sequence files are not adhering to the official format specifications, and may therefore not be compatible with the TIDER webtool. Various software programs that process the raw sequence data can cause this problem. We recommend that you export the data as a .scf file and then uploaded in TIDER. The .ab1 format can also be converted to .scf using 4peaks (Mac) or FinchTV (Windows & Mac).
Sometimes the quality of the peaks in chromatogram looks fine, but the file has some wrongly unannotated or wrongly inserted annotated nucleotides. These will interfere with the mutation spectra (see figure wrongly unannotated nucleotide). TIDER gives a warning when the spacing between the nucleotides in the chromatogram of the sequence trace are not consistent, which is often an indication for wrongly unannotated or wrongly inserted annotated nucleotides. Then the sequence file cannot be used for a reliable TIDER analysis. If possible, try to set the right boundary of the decomposition window lower. In case this warning stays, carefully investigate your chromatogram.
A low R2 can be caused by sub-optimal or by poor sequence quality
Settings
By default, the size of the decomposition window is 100bp and the indel size range is set to 10. The settings can be adjusted in advanced settings. Possible issues and solutions:
When the peak heights of the control and reference chromatogram are very different, it might happen that background signals are estimated as HDR events. Make sure that all three sequencing traces are generated in parallel.
Sometimes a mismatch occurs in the control sequence at the location of the sgRNA. This will stop the TIDER analysis. In this case, change the chromatogram file into identical IUPAC nucleotides as the expected control sequence.
This error message can occur when the settings are not optimal or when the breaksite is too close to the sequence start or end. Try if possible to set decomposition window boundaries further apart or use smaller indel size limits; or use lower the alignment window. If that does not help, you might have to re-sequence to perform the TIDER analysis. We advise to sequence a stretch of DNA ~700bp enclosing the designed editing site. The projected break site should be located preferably ~200bp downstream from the sequencing start site.
When the beginning of the sequence is of poor quality, the alignment function can make a mistake. This can be observed in the quality plot that has high aberrant sequence signal over the whole length of the sequence trace (see figure). The aberrant sequence signal should only increase around the expected cut site (blue dotted line).
In case of poor alignment, try to shift the start of the alignment window (Advanced settings).