Logo

Welcome to the Dovetail Analysis Portal

The Dovetail® Analysis Portal is available to all Dovetail Genomics customers and enables the streamlined analysis of NGS FASTQ files generated from Dovetail® linked-read libraries. Supported workflows include 1. Somatic variant calling (human only) and 2. Epigenetic feature calling (human and mouse only). The portal delivers results in a summary report and packaged, easily downloadable archive. In addition, it offers interactive data browsing directly through the portal. Each analysis requires a Dovetail® Analysis Portal credit. One credit for files up to 60GB in total and two credits for files 61GB-100GB in total. If your files are larger than 100GB please contact support@cantatabio.com.

The analysis pipeline is fully supported and maintained by Dovetail Genomics. Should you have any inquiries, please refer to our FAQ below or contact support@cantatabio.com.

We support inter-AWS account file transfers for data upload and results delivery. Data delivered to your personal AWS S3 are not subject to the 10-day storage time through our portal. Please see our General FAQ for more information.

For general portal information, see FAQ below

Q: Is there a specific browser I should use for the Dovetail® Analysis Portal?

Yes, the Dovetail® Analysis Portal should be accessed using Chrome.

Q: Can I provide metadata information?

No. The Dovetail® Analysis Portal is not HIPAA-compliant. Please make sure your file (including filename) contains no protected health information or personally identifiable information.

Q: Can I use the Dovetail® Analysis Portal for diagnosis?

No. The Dovetail® Analysis Portal is for Research Use Only and not for use in diagnostic procedures on patients.

Q: I set up my account, but I'm not able to upload data or submit an analysis. Why?

Credits are required to upload or analyze your own data. However, you can explore the portal and review publicly available example datasets and results - WGS Variant Analysis: 30X NIST HG008T LinkPrep or WGS Epigenetic Analysis: 80X GM12878 LinkPrep - without using any credits. Credits are added to your account once payment has been received. Please contact your sales representative or submit our webstore to place an order. If you've already purchased credits but don't see them reflected in your account, reach out to reach out to support@cantatabio.com.

Q: How many credits are required for one analytical run on the Dovetail® Analysis Portal?

A single credit covers one analytical run of up to 60 GB (typically ~30X genomic coverage for gzipped FASTQ files) of total FASTQ data per sample (combined read 1 + read 2 files). Data sets between 60-100 GB (~80X genomic coverage) require two credits.

Q: Are credits refundable?

No, credits are non-refundable.

Q: Do I need bioinformatics training to use the portal?

No. The portal is designed to be user-friendly and does not require command-line knowledge. Analysis can be performed through a few simple button clicks.

Q: What input files are needed to run the analysis?

You will need the R1 and R2 FASTQ files generated from sequencing a Dovetail LinkPrep, Micro-C, HiChIP, or promoter capture library. The portal currently supports human samples only for Somatic variant calling and human and mouse samples only for Epigenetic feature calling.

Q. My sample has four R1 and R2 FASTQ files? How can I upload these to the portal?

If your 30X sequencing sample is split across multiple sequencing lanes or runs and results in multiple FASTQ files, there is no need to combine them before upload. You can upload the files separately—just be sure to select all relevant files when submitting the FASTQs for analysis.

Q: My FASTQ pairs are >60GB even though they are supposed to be ~ 30X coverage. What do I do?

Make sure your files are gzipped before upload to compress their file size. Gzipped FASTQ files typically have the following extension: fastq.gz. You can use the following command to gzip the file:

gzip file.R1.fastq
gzip file.R2.fastq

Q: How do I use my own s3 bucket for storing result files?

  • Select s3 bucket to which the policy will be applied
  • Navigate to the "Permissions" tab for the selected bucket
  • Locate the "Bucket policy" and click "Edit"
  • Add JSON policy document below and make sure to update "BUCKET_NAME" to the name of your bucket
                            {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": [
                        "arn:aws:iam::100977841808:root",
                    ]
                },
                "Action": [
                    "s3:PutObject",
                    "s3:ListBucket",
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::BUCKET_NAME/*",
                    "arn:aws:s3:::BUCKET_NAME"
                ]
            }
        ]
    }
                        
  • Locate "Cross-origin resource sharing (CORS)" and click "Edit" and add JSON policy document below
    [
        {
            "AllowedHeaders": [
                "Authorization",
                "x-amz-date",
                "x-amz-content-sha256",
                "content-type"
            ],
            "AllowedMethods": [
                "GET"
            ],
            "AllowedOrigins": [
                "https://portal.cantatabio.com"
            ],
            "ExposeHeaders": [
                "ETag",
                "Location"
            ],
            "MaxAgeSeconds": 3000
        }
    ]

Q: How do I cite the analysis in my publication?

Please refer to the Appendix section at the bottom of the report that is included in the deliverables for method and tool citation.

For genetic variation specific inquiries, see FAQ below

Q: How do I submit an analysis through the portal?

Submitting an analysis through the Dovetail Analysis Portal is quick and easy. Just follow these three steps:

1. Upload your FASTQ files to your Dovetail Analysis Portal account.

2. Select the FASTQ files and submit them for analysis.

3. Once the analysis is complete, download the result files to your computer and explore the interactive features.

For a step-by-step walkthrough, please refer to our How-To video included above.

Q: How do I upload data to the Dovetail® Analysis Portal?

Data can be uploaded directly from your computer to the portal. For datasets larger than 60 GB, contact support@cantatabio.com to arrange an SFTP transfer, recommended for larger datasets. If you encounter upload issues, contact support@cantatabio.com.

If your FASTQ files are stored on your AWS S3 bucket, we can facilitate a transfer directly from your AWS S3 bucket to our portal. For assistance with AWS transfers, please contact support@cantatabio.com.

My sequencing file is > 60GB, can I down-sample (sub-sample) it to run it through the 30X analysis workflow?

Yes, you can down-sample your FASTQ files before upload to perform 30X analysis. FASTQ pairs 60-100GB will be charged 2 credits. However, keep in mind certain samples (i.e. highly heterogenous or low purity) and/or low VAF variant detection will benefit from higher sequencing depth.

You can use the following approach to subsample the data:

First, install seqtk by following the instructions here: https://github.com/lh3/seqtk

After seqtk is installed, then run the following commands to subsample 400M read pairs from the full FASTQ files:

seqtk sample -s100 full.R1.fastq.gz 400000000 | gzip -c > subsample.R1.fastq.gz
seqtk sample -s100 full.R2.fastq.gz 400000000 | gzip -c > subsample.R2.fastq.gz

Q: How long is my data stored on the portal?

FASTQ files are stored for 10 days from the upload date. Analysis result files are available for 10 days after the run is completed. All output files are available for download within 10 days for future analyses and viewing outside the portal.

Q: When are credits deducted from my account?

Credits are deducted only upon the successful completion of an analysis run. If incorrect input files are selected and the run completes successfully, a credit will still be deducted. Please ensure you are selecting the correct input files before submitting your analysis.

Q: What happens if my run fails?

If a run fails, you can retry it up to two times using the RETRY button. If the run is still unsuccessful after two attempts, please contact support@cantatabio.com for assistance. Retrying a failed run does not deduct additional credits from your account.

Q: When is it appropriate to use the RETRY button?

The RETRY button can be used if a run times-out. Analysis timeouts can happen during computationally heavy processes or data transfer post analysis. If your run fails to display results and complete, you can try to resume it through the RETRY button. The RETRY button does not allow you to reconfigure your run.

Q: I did not use the Dovetail® LinkPrep Kit to generate my libraries, and I’m interested in somatic variation detection. Can I still use the Dovetail® Portal for the analysis?

While the Dovetail® Analysis Portal can be used for somatic variation detection with data not generated using the LinkPrep Kit, we cannot guarantee the success of the analysis run or provide support for results generated from non-LinkPrep data, and is therefore not recommended. To find out more about the unique features of LinkPrep data, please see this technote.

Q: What deliverables will I receive from a run?

Deliverables from a Dovetail® Analysis run include:

  • Alignment data and index files (.bam/.bai)
  • Structural variant calls (.bedpe)
  • SNV/InDel calls (.vcf)
  • CNVs (.tsv)
  • DNA co-localization matrices (.mcool/. hic)
  • Summary report of detected variants (.html), including:
    • SV Breakpoint Location
    • SV Variant Allele Frequency (VAF) scores
    • SV Read support
    • Annotated SNV/InDel calls
    • Cancer genes in altered copy number states
  • Interactive sessions using IGV, Juicebox, and SV interpretation tools

Q: How do I prioritize SV call quality?

High quality SV calls tend to: 1. Have high read support at high resolution. 2. Generate refined breakpoints 3. Contain features that can be confirmed by IGV or Linked Read Density Plots. Be suspicious of low-resolution calls (i.e. 1Mb, sometimes 100kb). These false positive (FP) calls tend to have lower scores and should be manually checked either in IGV or via Linked Read Density Plots.

Q: The purity solution output by the analysis portal does not match my pathology report. What should I do?

While modest differences are to be expected between any estimate of tumor purity, if the pathology-based purity estimate is substantially higher (e.g., more than 20%) than the bioinformatics-based estimate provided on the report, you can increase the minimum purity setting to align with the estimate from pathology. For instance, if the Dovetail analysis estimates a sample’s tumor purity to be 30%, but its estimate from pathology is 80%, an increased minimum purity threshold of 0.5 (50%) or even 0.7 (70%) may help guide the tool to identify an improved purity solution. Furthermore, if a sample is known a priori to have very high purity (e.g., cell line), then it is suggested that all analyses are run with a minimum purity threshold of at least 0.8 (80%).

For epigenetic analysis specific inquiries, see FAQ below

Q. I purchased Dovetail Analysis Portal Credits for epigenetics analysis. How do I get my analysis started?

Once your credits are purchased, one of our project managers will contact you to coordinate the data transfer and ask you to fill in an analysis sample submission form. They will then run the analysis and you will receive a notification once it is complete. At that point, you will be instructed to create an account on the portal, where you can log in to view your results and use the interactive features.

Q: How do I view my analysis results?

Once the analysis is completed, you will receive an email notification from one of our project managers. Log in to your account on the portal, then click on the "Analyze" tab in the top right corner of the page to access the results and explore the interactive features.

Q: How do I download my analysis result files once it's completed?

  • Navigate to the "Analysis" page, click on a specific run then locate "All Results" tab
  • Download each file individually by clicking on the name of the file
  • Download all results:
    • Click on "Download" all analysis results" button to download a bash script (download_all.sh)
    • Run "bash download_all.sh" to proceed with downloading all result files to your local machine

Q: How long is my data stored on the portal?

Analysis result files are available for 10 days after the run is completed. All output are available for download within 10 days for future analyses and viewing outside the portal.

Q: What deliverables will I receive from a run?

Whole Genome Epigenetics

File description File Format(s)
Matrix files .hic, .cool, .mcool
Alignment files .bam, .bai
Valid pairs files .gz, .px2
Pairtools stats file .csv
Fan-C AB compartment calls .ab, .bed
Indexed files for IGV: .bed.gz, .bed.gz.tbi, .bedgraph.gz, .bedgraph.gz.tbi
Arrowhead TAD calls (5, 10, 25kb) .txt
Indexed files for IGV: .bed.gz, .bed.gz.tbi
Mustache loop calls (5,10kb) .tsv
Indexed files for IGV: .bedpe.gz, .bedpe.gz.tbi
Hiccups loop calls (5, 10kb) .bedpe
Whole genome epigenetics report .html

Pan Promoter Capture Epigenetics

File description File Format(s)
Matrix files .hic, .cool, .mcool
Alignment files .bam, .bai
Pairtools stats file .csv
Valid pairs files .gz, .px2
Enrichment stats file .txt
Chicago analysis files (10, 20kb) .ibed, .txt, .png
Capture epigenetics report .html

HiChiP Epigenetics

File description File Format(s)
Matrix files .hic, .cool, .mcool
Alignment files .bam, .bai
Pairtools stats file .csv
Valid pairs files .gz, .px2
FitHiChiP loop calls .ibed, .txt, .png(packed in .tar.gz)
HiChiP epigenetics report .html

Whole Genome -- Differential analysis

File description File Format(s)
WGS differential analysis results .txt
GO Pathway analysis results .txt

Pan Promoter Capture -- Differential analysis

File description File Format(s)
Chicdiff differential analysis results .txt, .png
GO Pathway analysis results .txt

HiChiP -- Differential analysis

File description File Format(s)
FitHiChIP differential analysis results .bed
GO Pathway analysis results .txt

Q. What Epigenetic Analyses does my data enable?

Data Type Genomic Coverage # Read Pairs(2 x 150bp) # of Libraries per sample Epigenetic Analysis Optional Add-on Topological Feature
Whole Genome data
30X
80X
350M
950M
1-2
3-4
Whole Genome Epigenetics Differential Analysis
A/B compartments, TADs, Limited loop calling
A/B compartments, TADs, Loops
Pan Promoter Capture N/A 150M-300M 1 Pan Promoter Capture Epigenetics Differential Analysis Loop calls
HiChIP Data N/A 300M 2 HiChIP Epigenetics Differential Analysis Loop calls

Q: The IGV is not displaying any data. What should I do?

Make sure to zoom into a specific chromosome for IGV to load and display the data.