The Dovetail® Analysis Portal is available to all Dovetail Genomics customers and enables the streamlined analysis of NGS FASTQ files generated from Dovetail® linked-read libraries. Supported workflows include 1. Somatic variant calling (human only) and 2. Epigenetic feature calling (human and mouse only). The portal delivers results in a summary report and packaged, easily downloadable archive. In addition, it offers interactive data browsing directly through the portal. Each analysis requires a Dovetail® Analysis Portal credit. One credit for files up to 60GB in total and two credits for files 61GB-100GB in total. If your files are larger than 100GB please contact support@cantatabio.com.
The analysis pipeline is fully supported and maintained by Dovetail Genomics. Should you have any inquiries, please refer to our FAQ below or contact support@cantatabio.com.
We support inter-AWS account file transfers for data upload and results delivery. Data delivered to your personal AWS S3 are not subject to the 10-day storage time through our portal. Please see our General FAQ for more information.
Q: Is there a specific browser I should use for the Dovetail® Analysis Portal?
Yes, the Dovetail® Analysis Portal should be accessed using Chrome.
Q: Can I provide metadata information?
No. The Dovetail® Analysis Portal is not HIPAA-compliant. Please make sure your file (including filename) contains no protected health information or personally identifiable information.
Q: Can I use the Dovetail® Analysis Portal for diagnosis?
No. The Dovetail® Analysis Portal is for Research Use Only and not for use in diagnostic procedures on patients.
Q: I set up my account, but I'm not able to upload data or submit an analysis. Why?
Credits are required to upload or analyze your own data. However, you can explore the portal and review publicly available example datasets and results - WGS Variant Analysis: 30X NIST HG008T LinkPrep or WGS Epigenetic Analysis: 80X GM12878 LinkPrep - without using any credits. Credits are added to your account once payment has been received. Please contact your sales representative or submit our webstore to place an order. If you've already purchased credits but don't see them reflected in your account, reach out to reach out to support@cantatabio.com.
Q: How many credits are required for one analytical run on the Dovetail® Analysis Portal?
A single credit covers one analytical run of up to 60 GB (typically ~30X genomic coverage for gzipped FASTQ files) of total FASTQ data per sample (combined read 1 + read 2 files). Data sets between 60-100 GB (~80X genomic coverage) require two credits.
Q: Are credits refundable?
No, credits are non-refundable.
Q: Do I need bioinformatics training to use the portal?
No. The portal is designed to be user-friendly and does not require command-line knowledge. Analysis can be performed through a few simple button clicks.
Q: What input files are needed to run the analysis?
You will need the R1 and R2 FASTQ files generated from sequencing a Dovetail LinkPrep™, Micro-C, HiChIP, or promoter capture library. The portal currently supports human samples only for Somatic variant calling and human and mouse samples only for Epigenetic feature calling.
Q. My sample has four R1 and R2 FASTQ files? How can I upload these to the portal?
If your 30X sequencing sample is split across multiple sequencing lanes or runs and results in multiple FASTQ files, there is no need to combine them before upload. You can upload the files separately—just be sure to select all relevant files when submitting the FASTQs for analysis.
Q: My FASTQ pairs are >60GB even though they are supposed to be ~ 30X coverage. What do I do?
Make sure your files are gzipped before upload to compress their file size.
Gzipped FASTQ files typically have the following extension: fastq.gz.
You can use the following command to gzip the file:
gzip file.R1.fastq
gzip file.R2.fastq
Q: How do I use my own s3 bucket for storing result files?
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::100977841808:root",
]
},
"Action": [
"s3:PutObject",
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::BUCKET_NAME/*",
"arn:aws:s3:::BUCKET_NAME"
]
}
]
}
[
{
"AllowedHeaders": [
"Authorization",
"x-amz-date",
"x-amz-content-sha256",
"content-type"
],
"AllowedMethods": [
"GET"
],
"AllowedOrigins": [
"https://portal.cantatabio.com"
],
"ExposeHeaders": [
"ETag",
"Location"
],
"MaxAgeSeconds": 3000
}
]
Q: How do I cite the analysis in my publication?
Please refer to the Appendix section at the bottom of the report that is included in the deliverables for method and tool citation.
Q: How do I submit an analysis through the portal?
Submitting an analysis through the Dovetail Analysis Portal is quick and easy. Just follow these three steps:
1. Upload your FASTQ files to your Dovetail Analysis Portal account.
2. Select the FASTQ files and submit them for analysis.
3. Once the analysis is complete, download the result files to your computer and explore the interactive features.
For a step-by-step walkthrough, please refer to our How-To video included above.
Q: How do I upload data to the Dovetail® Analysis Portal?
Data can be uploaded directly from your computer to the portal. For datasets larger than 60 GB, contact support@cantatabio.com to arrange an SFTP transfer, recommended for larger datasets. If you encounter upload issues, contact support@cantatabio.com.
If your FASTQ files are stored on your AWS S3 bucket, we can facilitate a transfer directly from your AWS S3 bucket to our portal. For assistance with AWS transfers, please contact support@cantatabio.com.
My sequencing file is > 60GB, can I down-sample (sub-sample) it to run it through the 30X analysis workflow?
Yes, you can down-sample your FASTQ files before upload to perform 30X analysis. FASTQ pairs 60-100GB will be charged 2 credits. However, keep in mind certain samples (i.e. highly heterogenous or low purity) and/or low VAF variant detection will benefit from higher sequencing depth.
You can use the following approach to subsample the data:
First, install seqtk by following the instructions here: https://github.com/lh3/seqtk
After seqtk is installed, then run the following commands to subsample 400M read pairs from the full FASTQ files:
seqtk sample -s100 full.R1.fastq.gz 400000000 | gzip -c > subsample.R1.fastq.gz
seqtk sample -s100 full.R2.fastq.gz 400000000 | gzip -c > subsample.R2.fastq.gz
Q: How long is my data stored on the portal?
FASTQ files are stored for 10 days from the upload date. Analysis result files are available for 10 days after the run is completed. All output files are available for download within 10 days for future analyses and viewing outside the portal.
Q: When are credits deducted from my account?
Credits are deducted only upon the successful completion of an analysis run. If incorrect input files are selected and the run completes successfully, a credit will still be deducted. Please ensure you are selecting the correct input files before submitting your analysis.
Q: What happens if my run fails?
If a run fails, you can retry it up to two times using the RETRY button. If the run is still unsuccessful after two attempts, please contact support@cantatabio.com for assistance. Retrying a failed run does not deduct additional credits from your account.
Q: When is it appropriate to use the RETRY button?
The RETRY button can be used if a run times-out. Analysis timeouts can happen during computationally heavy processes or data transfer post analysis. If your run fails to display results and complete, you can try to resume it through the RETRY button. The RETRY button does not allow you to reconfigure your run.
Q: I did not use the Dovetail® LinkPrep™ Kit to generate my libraries, and I’m interested in somatic variation detection. Can I still use the Dovetail® Portal for the analysis?
While the Dovetail® Analysis Portal can be used for somatic variation detection with data not generated using the LinkPrep™ Kit, we cannot guarantee the success of the analysis run or provide support for results generated from non-LinkPrep data, and is therefore not recommended. To find out more about the unique features of LinkPrep data, please see this technote.
Q: What deliverables will I receive from a run?
Deliverables from a Dovetail® Analysis run include:
Q: How do I prioritize SV call quality?
High quality SV calls tend to: 1. Have high read support at high resolution. 2. Generate refined breakpoints 3. Contain features that can be confirmed by IGV or Linked Read Density Plots. Be suspicious of low-resolution calls (i.e. 1Mb, sometimes 100kb). These false positive (FP) calls tend to have lower scores and should be manually checked either in IGV or via Linked Read Density Plots.
Q: The purity solution output by the analysis portal does not match my pathology report. What should I do?
While modest differences are to be expected between any estimate of tumor purity, if the pathology-based purity estimate is substantially higher (e.g., more than 20%) than the bioinformatics-based estimate provided on the report, you can increase the minimum purity setting to align with the estimate from pathology. For instance, if the Dovetail analysis estimates a sample’s tumor purity to be 30%, but its estimate from pathology is 80%, an increased minimum purity threshold of 0.5 (50%) or even 0.7 (70%) may help guide the tool to identify an improved purity solution. Furthermore, if a sample is known a priori to have very high purity (e.g., cell line), then it is suggested that all analyses are run with a minimum purity threshold of at least 0.8 (80%).
Q. I purchased Dovetail Analysis Portal Credits for epigenetics analysis. How do I get my analysis started?
Once your credits are purchased, one of our project managers will contact you to coordinate the data transfer and ask you to fill in an analysis sample submission form. They will then run the analysis and you will receive a notification once it is complete. At that point, you will be instructed to create an account on the portal, where you can log in to view your results and use the interactive features.
Q: How do I view my analysis results?
Once the analysis is completed, you will receive an email notification from one of our project managers. Log in to your account on the portal, then click on the "Analyze" tab in the top right corner of the page to access the results and explore the interactive features.
Q: How do I download my analysis result files once it's completed?
Q: How long is my data stored on the portal?
Analysis result files are available for 10 days after the run is completed. All output are available for download within 10 days for future analyses and viewing outside the portal.
Q: What deliverables will I receive from a run?
Whole Genome Epigenetics
| File description | File Format(s) |
|---|---|
| Matrix files | .hic, .cool, .mcool |
| Alignment files | .bam, .bai |
| Valid pairs files | .gz, .px2 |
| Pairtools stats file | .csv |
| Fan-C AB compartment calls | .ab, .bed Indexed files for IGV: .bed.gz, .bed.gz.tbi, .bedgraph.gz, .bedgraph.gz.tbi |
| Arrowhead TAD calls (5, 10, 25kb) | .txt Indexed files for IGV: .bed.gz, .bed.gz.tbi |
| Mustache loop calls (5,10kb) | .tsv Indexed files for IGV: .bedpe.gz, .bedpe.gz.tbi |
| Hiccups loop calls (5, 10kb) | .bedpe |
| Whole genome epigenetics report | .html |
Pan Promoter Capture Epigenetics
| File description | File Format(s) |
|---|---|
| Matrix files | .hic, .cool, .mcool |
| Alignment files | .bam, .bai |
| Pairtools stats file | .csv |
| Valid pairs files | .gz, .px2 |
| Enrichment stats file | .txt |
| Chicago analysis files (10, 20kb) | .ibed, .txt, .png |
| Capture epigenetics report | .html |
HiChiP Epigenetics
| File description | File Format(s) |
|---|---|
| Matrix files | .hic, .cool, .mcool |
| Alignment files | .bam, .bai |
| Pairtools stats file | .csv |
| Valid pairs files | .gz, .px2 |
| FitHiChiP loop calls | .ibed, .txt, .png(packed in .tar.gz) |
| HiChiP epigenetics report | .html |
Whole Genome -- Differential analysis
| File description | File Format(s) |
|---|---|
| WGS differential analysis results | .txt |
| GO Pathway analysis results | .txt |
Pan Promoter Capture -- Differential analysis
| File description | File Format(s) |
|---|---|
| Chicdiff differential analysis results | .txt, .png |
| GO Pathway analysis results | .txt |
HiChiP -- Differential analysis
| File description | File Format(s) |
|---|---|
| FitHiChIP differential analysis results | .bed |
| GO Pathway analysis results | .txt |
Q. What Epigenetic Analyses does my data enable?
| Data Type | Genomic Coverage | # Read Pairs(2 x 150bp) | # of Libraries per sample | Epigenetic Analysis | Optional Add-on | Topological Feature | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Whole Genome data |
|
|
|
Whole Genome Epigenetics | Differential Analysis |
|
||||||||
| Pan Promoter Capture | N/A | 150M-300M | 1 | Pan Promoter Capture Epigenetics | Differential Analysis | Loop calls | ||||||||
| HiChIP Data | N/A | 300M | 2 | HiChIP Epigenetics | Differential Analysis | Loop calls |
Q: The IGV is not displaying any data. What should I do?
Make sure to zoom into a specific chromosome for IGV to load and display the data.