We are looking forward to 2015 as we will continue to launch new Apps and support additional applications, but we are excited to close out 2014 with the release of three new Illumina Core Apps in BaseSpace:
The Amplicon-DS App enables analysis of the Illumina TruSight Tumor library prep kit. This solution is specifically design for analysis of all tumor samples, including FFPE. Using targeted TruSeq Amplicon chemistry and a unique, mirrored dual strand (“DS”) assay, researchers can easily detect low frequency somatic mutations. Amplicon-DS also leverages the mirrored dual strand design to reconcile variant calls and capture deamination events due to FFPE, providing confident measurements even in degraded samples.
The Isaac and BWA Enrichment v2.0 Apps add significant functionality over the Enrichment v1.0 Apps. Both Isaac and BWA can now analyze Nextera Rapid Capture Custom panels built in Illumina’s DesignStudio. Isaac Enrichment v2.0 includes Illumina’s own Isaac pipeline for alignment and variant calling. BWA Enrichment v2.0 incorporates the latest aligner, BWA-MEM, which provides improved accuracy (especially when calling structural variants) and increased speed. Both the Isaac and BWA Enrichment v1.0 Apps are available concurrently with v2.0 Apps in BaseSpace Cloud.
In addition to the above Illumina Core Apps we are also launching a BaseSpace Labs App called FASTQ Toolkit v1.0.
This App enables the user to have enhanced control over their data, allowing manipulation of FASTQ files including adapter trimming, quality trimming, length filtering, and down-sampling.Users can now down-sample or quality-trim their data and determine what effect that has on their variants, gene expression results, or bacterial classifications. Users could also assess their sample data with the FastQC App and then use that information to optimize their samples with the FASTQ Toolkit v1.0.
Specs for the FASTQ toolkit v1.0 are as follows:
Input- BaseSpace samples (max=200GB per analysis) and user specified parameters that define how the input sample(s) should be processed.
Output- Samples that can be accessed on the “Samples” page of the selected output project. In addition, the App generates a statistics summary file in JSON format that is used to generate the BaseSpace report.
Adapter Trimming- performed using the approximate matching approach described in TagCleaner. The adapter sequence can be specified separately for the 5′- and 3′-end. Poly-A/T tails are considered repeats of As or Ts at the sequence ends. Trimming them can reduce the number of false positives during database searches, as long tails tend to align well to sequences with low complexity or sequences with tails (e.g. viral sequences) in the database.
Bases can be trimmed from either the 5′- or 3′-end. Alternatively, reads can be trimmed to a maximum read length. Quality trimming on the 3’-end is also available. Note: Aligners such as BWA and Isaac perform trimming internally during alignment. The trimming logic was adapted from BWA.
Down-sampling is performed when only a subset of the sample is needed for an application, such as de novo assembly with memory constraints, or when it is not necessary to process a full sample, like validating an approach at varying levels of genomic coverage.
Filtering- Paired-end reads are only filtered (and removed from the sample) if both reads are filtered out. Otherwise, the filtered mate is replaced by a sequence of Ns (number of Ns will be the minimum read length) to keep the order of pairs in the FASTQ files, which is necessary for many secondary analysis tools.
Nextera Mate-pair conversion- The App supports conversion of Nextera Mate-Pair oriented reads to paired-end oriented reads.
The output of the App contains a set of before and after metrics so you can quickly see the properties of your new data. The table below is an example of the results of down sampling 2,957,468 read pairs to 500,000 read pairs and at the same time performing quality trimming (< Q30) from the 3’ end of the reads.
A read length distribution is also provided as shown below for Read 1. The read length distribution provides the distribution of read lengths in your data before and after trimming and allows the user to quickly asses what effect the trimming had on their data.
Finally a read filtering summary is provided as shown below. Read filtering will only contain numbers if an option that turns on read filtering such as quality trimming (filters reads < 32 bps) is selected.
We are very proud of the hard work our team has put into providing these Apps for the NGS community and look forward to and even more exciting 2015.