The Dovetail Analysis Portal is available to all Dovetail Genomics customers and enables the streamlined analysis of NGS fastq files generated from Dovetail linked-read libraries. The portal accepts standard paired-end sequences in fastq format and outputs results in a packaged, easily downloadable archive. Each analysis requires a Dovetail credit. One credit for files equal or less 60Gb and two credits for files 61GB-100GB. If your files are larger than 100GB please contact support@cantatabio.com.
The analysis pipeline is fully supported and maintained by Dovetail Genomics. Should you have any inquiries, please refer to our FAQ page or contact support@cantatabio.com.
Fastq files are submitted for analysis in three simple steps:
1. Upload new fastq files for analysis to your Dovetail Analysis Portal account.
2. Select your fastq files, your reference genome and submit for analysis.
3. Download result files to your computer.
The analysis output includes the following files:
File | Description |
---|---|
Report (.html) | Summary of the genomic analysis findings presented in an easily digestible format |
Cooler / Multi-Resolution Cooler (.cool / .mcool) | Linked-read density data, also known as a contact matrix |
Binary Alignment Map (.bam) | Sequencing reads aligned to the human reference genome (hg38) |
Browser Extensible Data Paired-End (.bedpe) | Structural variants detected in the aligned sequencing data and/or contact matrices |
Copy Number (.cnv.somatic.tsv) | Genome-wide copy number variants and allele fractions in a tab-delimited format |
Variant Call Format (.vcf) | Single nucleotide variants, insertions, and deletions detected in the aligned sequencing data |
Mutation Annotation Format (.maf) | Variants annotated with their predicted impacts to genes, transcripts, protein sequences, and regulatory regions in a tab-delimited format |
Q. Is there a specific browser I should use for the Dovetail® Analysis Portal?
Yes, the Dovetail® Analysis Portal should be accessed using Chrome.
Q: Can I provide metadata information?
No. The Dovetail® Analysis Portal is not HIPAA-compliant. Please make sure your file (including filename) contains no protected health information or personally identifiable information.
Q. Can I use the Dovetail® Analysis Portal for diagnosis?
No. The Dovetail® Analysis Portal is for Research Use Only and not for use in diagnostic procedures on patients.
Q: I set up my account, but I'm not able to upload data or submit an analysis. Why?
Credits are required to upload and analyze your own data. However, you can explore the portal and review publicly available example results from the NIST-HG008T dataset without using any credits. Credits are added to your account once payment has been received. Please contact your sales representative or submit this form to place an order. If you've already purchased credits but don't see them reflected in your account, reach out to support@cantatabio.com.
Q: How many credits are required for one analytical run on the Dovetail® Analysis Portal?
A single credit covers one analytical run of up to 60 GB (typically ~30X genomic coverage for gzipped FASTQ files) of total FASTQ data per sample (combined read 1 + read 2 files). Data sets between 60–100 GB (~80X genomic coverage) require two credits.
Q: When are credits deducted from my account?
Credits are deducted only upon the successful completion of an analysis run. If incorrect input files are selected and the run completes successfully, a credit will still be deducted. Please ensure you are selecting the correct input files before submitting your analysis.
Q: What happens if my run fails?
If a run fails, you can retry it up to two times using the RETRY button. If the run is still unsuccessful after two attempts, please contact support@cantatabio.com for assistance. Retrying a failed run does not deduct additional credits from your account.
Q: When is it appropriate to use the RETRY button?
The RETRY button can be used if a run times-out. Analysis timeouts can happen during computationally heavy processes or data transfer post analysis. If your run fails to display results and complete you can try to resume it through the RETRY button. The RETRY button does not allow you to reconfigure your run.
Q: Are credits refundable?
No, credits are non-refundable.
Q: Do I need bioinformatics training to use the portal?
No. The portal is designed to be user-friendly and does not require command-line knowledge. Analysis can be performed through a few simple button clicks.
Q. I did not use the Dovetail® LinkPrep™ Kit to generate my libraries, and I’m interested in somatic variation detection. Can I still use the Dovetail® Portal for the analysis?
While the Dovetail® Analysis Portal can be used for somatic variation detection with data not generated using the LinkPrep™ Kit, we cannot guarantee the success of the analysis run or provide support for results generated from non-LinkPrep data, and is therefore not recommended. To find out more about the unique features of LinkPrep data, please see this technote.
Q. What input files are needed to run the analysis?
You will need the R1 and R2 FASTQ files generated from sequencing a Dovetail® LinkPrep™ Library. The portal currently supports human samples only.
Q: How do I upload data to the Dovetail® Analysis Portal?
Data can be uploaded directly from your computer to the portal. For datasets larger than 60 GB, contact support@cantatabio.com to arrange an SFTP transfer, recommended for larger datasets. If you encounter upload issues, contact support@cantatabio.com.
Q: My sample has four R1 and R2 FASTQ files? If I upload these to the portal do I get charged for 1 run or 4? How can I upload these to the portal?
If your 30X sequencing sample is split between multiple sequencing lanes or runs and has
available multiple FASTQs, you will need to combine the FASTQ files before upload.
You can use the following command to combine FASTQ files:
cat file1.R1.fastq.gz file2.R1.fastq.gz [file3.R1.fastq.gz ...] > merged.R1.fastq.gz
cat file1.R2.fastq.gz file2.R2.fastq.gz [file3.R2.fastq.gz ...] > merged.R2.fastq.gz
Q: My FASTQ pairs are >60GB even though they are supposed to be ~ 30X coverage. What do I do?
Make sure your files are gzipped before upload to compress their file size. Gzipped FASTQ files
typically have the following extension: fastq.gz. You can use the following command to gzip the
file:
gzip file.R1.fastq
gzip file.R2.fastq
Q: My sequencing file is > 60GB, can I down-sample (sub-sample) it to run it through the 30X analysis workflow?
Yes, you can down-sample your FASTQ files before upload to perform 30X analysis. FASTQ pairs 60-100Gb will be charged 2 credits. However, keep in mind certain samples (i.e. highly heterogenous or low purity) and/or low VAF variant detection will benefit from higher sequencing depth. You can use the following approach to subsample the data:Q: What deliverables will I receive from a run?
Deliverables from a Dovetail® Analysis run include:
Q: How do I prioritize SV call quality?
High quality SV calls tend to: 1. Have high read support at high resolution. 2. Generate refined breakpoints 3. Contain features that can be confirmed by IGV or Linked Read Density Plots. Be suspicious of low-resolution calls (i.e. 1Mb, sometimes 100kb). These often false positive (FP) calls tend to have lower scores and should be manually checked either in IGV or via Linked Read Density Plots.
Q: The purity solution output by the analysis portal does not match my pathology report. What should I do?
While modest differences are to be expected between any estimate of tumor purity, if the pathology-based purity estimate is substantially higher (e.g., more than 20%) than the bioinformatics-based estimate provided on the report, you can increase the minimum purity setting to align with the estimate from pathology. For instance, if the Dovetail analysis estimates a sample’s tumor purity to be 30%, but its estimate from pathology is 80%, an increased minimum purity threshold of 0.5 (50%) or even 0.7 (70%) may help guide the tool to identify an improved purity solution. Furthermore, if a sample is known a priori to have very high purity (e.g., cell line), then it is suggested that all analyses are run with a minimum purity threshold of at least 0.8 (80%).
Q: How long is my data stored on the portal?
Uploaded FASTQ files are stored for 10 days from the upload date. Analysis result files are available for 10 days after the run is completed. All output are available for download within 10 days for future analyses and viewing outside the portal.
Q: How do I cite the analysis in my publication?
Please refer to the Appendix section at the bottom of the example report available on the portal for method and tool citation. You can access the report here.