DRAGEN™ – 2021 – A YEAR IN REVIEW

It’s hard to believe we are wrapping up the second year with COVID still active around us.

At Illumina and within the DRAGEN team, we are committed to supporting efforts to fight COVID-19 and enable the global research community to combat the pandemic by providing a host of NGS analysis tools at no cost.

Our mission to improve human health requires investments across the entire workflow. The DRAGEN team’s continued investment has seen even more gains in secondary analysis accuracy over the last year. This includes developing targeted callers for the difficult to map regions, carrier screening, increasing accuracy with graph reference and machine learning tools, and expanding our cloud presence —all with the single goal of enabling you to get insights from your sequencing data faster. 

Highlights from 2021

DRAGEN supports the fight against SARS-COV-2

More than 1.25 million samples processed year to date.

The Illumina SARS-CoV-2 NGS Data Toolkit leverages the speed and accuracy of DRAGEN to accelerate infectious disease surveillance and outbreak response. The toolkit includes a DRAGEN Metagenomics Pipeline for identifying novel infectious disease pathogens, a DRAGEN RNA Pathogen Detection Pipeline for detecting viral pathogens, and the DRAGEN Lineage app for detecting SARS-CoV-2 mutations for epidemiology. You can access the DRAGEN coronavirus tools on BaseSpace Sequence Hub.

Figure 1: Illumina SARS-Cov-2 NGS Data Toolkit 

These tools make it simpler and easier to detect mutations, identify the SARS-CoV-2 viral sequence in their samples, examine host immune responses, and contribute your findings to critical public databases. Year-to-date, more than 1.25 million samples (as of Dec 9th, 2021) were processed. The core tools are available of free of charge and the DRAGEN RNA Pathogen Detection Pipeline and DRAGEN Metagenomics Pipeline are free of charge until December 31, 2022.

Read more about the Illumina SARS-Cov-2 NGS Data Toolkit and the recent Covid-10 Apps update.

Accelerating Genomic Analysis

Driving advancements in accuracy and comprehensiveness

As we continue to unlock the power of the genome with new and advanced applications, the amount of data generated from next-generation sequencing (NGS) is rapidly expanding 1. In 2021, we made significant improvements to our platforms and pipelines to provide ultra-rapid, highly accurate, comprehensive, and scalable secondary analysis.

  • We made major advances in accuracy by refining our graph-based read mapping and introducing a machine learning algorithm
    •  DRAGEN v3.7 introduced graph-based read mapping. DRAGEN v3.9 further improved read mapping and SNV/indel variant calling (VC) accuracy in the Major Histocompatibility Complex (MHC) with up to 80% accuracy increase 2.  V3.9 also included updates to alternative (ALT) contig handling and graph reference updates, which leads to improved mapping and variant calling accuracy for hg19/hg38 genomes. With the v3.9 release, we also provided access to an alpha version of a powerful machine learning tool that refines small variant calls, leading to a ~33% Germline SNV accuracy improvement 2.
  • We introduced several new features and pipelines, including RNA variant calling, single cell updates including genotype demultiplexing and cell hashing, and DNA amplicon. 
  • We’ve extended the comprehensiveness of our oncology pipeline by adding new biomarkers. With DRAGEN 3.8, we added Tumor Mutational Burden (TMB) and Microsatellite Instability (MSI), and with DRAGEN 3.9, we added a new biomarker for Homologous Recombination Deficiency (HRD). 
  • Operationalize secondary analysis pipelines with DRAGEN on Illumina Connected Analytics (ICA) – available as prepackaged tools to incorporate into custom pipelines. 
  • DRAGEN is now commercially available with bring your own license on Azure, in addition to AWS. Our goal is to give our customers more choices to use DRAGEN in their preferred cloud environment. 
  • ORA Compression – DRAGEN Original Read Archive (ORA) compression is now available with the on-prem server, on AWS and Azure, and onboard the NextSeq 1000/2000.  ORA reduces the burden of data storage with lossless compression of FASTQ Files, up to 5x.
Figure 2: File sizes and compression ratios for human data generated on NextSeq 20002. 

Learn more about the DRAGEN v3.9 release and v3.8 release.

DRAGEN processed One Million Genomes in 2021

Accelerating discoveries with exponential usage

DRAGEN processed its one millionth genome in 2021 with peak data usage of 1.5PB in one day. The data processed on DRAGEN has exponentially increased every year since 2016. Our customers continue to unlock the genome with the power of DRAGEN.

Figure 3: Comprehensive Whole Genome Equivalents Processed by DRAGEN

2022 for DRAGEN will continue to be exciting with new product developments coming soon. We look forward to sharing these with you – stay tuned!  

For Research Use Only.  Not for use in diagnostic procedures.

References

  1. https://sapac.illumina.com/company/news-center/blog/solving-for-the-information-gap-in-genomics-breakthroughs.html
  2. Data on file, Illumina Inc. 2021

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.