First Data Set from FastTrack Long Reads Early Access Service

At Illumina we’re continually fine-tuning and diversifying our sample preparation, sequencing, and informatics products to deliver the right sequencing workflow for the many interesting applications you’re working on. But the short-read nature of Illumina sequencing (currently 2 x 150 bp reads on the HiSeq 2500)  limits the options for certain experimental methods that require contiguous reading of longer stretches of genomic DNA.

With the acquisition of Moleculo last year, we’re bringing new library prep and data analysis methods to address this gap. As a first step towards commercialization, we recently announced the Early Access launch of a new Services product: FastTrack Long Reads. This service will be useful for a myriad of innovative applications, including assembly of complex genomes (polyploid, containing excessive long repeat regions, etc.), accurate transcript assembly, metagenomics of complex communities, and phasing of long haplotype blocks.

In conjunction with the Early Access launch, we’re sharing an example long reads data set (delivered to Dr. Dmitri Petrov’s group at Stanford, and made available with his kind permission). This data set comprises two libraries of Drosophila melanogaster, each generated on a single HiSeq lane and producing ~30Gb of raw sequence data. Analysis was done using a specialized pipeline that performs graph-based contig generation and post-processing. The final output is a set of synthetic long reads in FASTQ format, along with a text file (scaffold.txt) that lists known linkages between the long reads. As the following figure from an accompanying .pdf report shows, a large fraction of the synthetic long reads are distributed around 8,500 base pairs.

Read length distribution of synthetic long reads for a D Melanogaster library
Read length distribution of synthetic long reads for a D. melanogaster library

The data set, available as a single project in BaseSpace, can be accessed here.

FASTQ files are available under the “Samples” section, and the .pdf report summarizing key analysis metrics, along with the scaffold.txt file, are available under the “App Sessions” section.

We hope this data set will help you see for yourself the nature and quality of the synthetic long reads, and will inspire you to explore new and exciting applications enabled by long contiguous reads of DNA.


  1. Would it be possible to post information about the nature of the sample used to generate these data? Specifically, which strain was used, where was the strain obtained, how many flies was the DNA prepped from, was the DNA prepped from adults or embryos, and were both males and females were included in the DNA prep?

    Many thanks!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.