With the launch of the TruSight RNA Pan-Cancer panel, a targeted enrichment panel for the detection of variants, fusions, and gene expression profiles in 1385 cancer-associated genes, Illumina is pleased to provide an intuitive BaseSpace App, RNA-Seq Alignment v 1.0, that supports simple push button analysis of the data.
The RNA-Seq Alignment App is easy to use – just stream your data to BaseSpace, then open the App, select your samples and analysis options, and go!
In addition to supporting the TruSight RNA Pan-Cancer panel, the App also enables many other RNA-Seq analyses and supports many common organisms as shown below.
- Homo sapiens UCSC hg19 (RefSeq & Gencode gene annotations)
- Homo sapiens UCSC hg38 (RefSeq & Gencode gene annotations)
- Mus musculus UCSC mm10 (RefSeq gene annotation)
- Mus musculus UCSC mm9 (RefSeq gene annotation)
- Rattus norvegicus UCSC rn5 (RefSeq gene annotation)
- Drosophila melanogaster UCSC dm3 (RefSeq gene annotation)
- Bos taurus UCSC bosTau6 (RefSeq gene annotation)
- Sus scrofa UCSC susScr3 (RefSeq gene annotation)
- Gallus gallus UCSC galGal4 (RefSeq gene annotation)
- Danio rerio UCSC danRer7 (RefSeq gene annotation)
- Caenorhabditis elegans UCSC ce10 (RefSeq gene annotation)
- Zea mays Ensembl AGPv3 (Ensembl gene annotation)
- Arabidopsis thaliana Ensembl TAIR10 (Ensembl gene annotation)
- Oryza sativa japonica Ensembl IRGSP-1.0 (Ensembl gene annotation)
- Saccharomyces cerevisiae Ensembl R64-1-1 (Ensembl gene annotation)
The RNA-Seq Alignment App also allows users to select multiple aligners for their RNA-Seq data.
- TopHat (Bowtie)
- TopHat (Bowtie2)
The common options, including the ability to include ERCC spike-in controls, are shown below.
Users can select to call fusions as well as perform novel transcript assembly. Fusion calling is only available with the STAR or Bowtie aligner and is not available with Bowtie2. If STAR is selected as the aligner, fusion calling will be performed with the Manta fusion caller developed by Illumina. If Bowtie is selected, then fusion calling will be performed with TopHat Fusion. Illumina recommends selecting STAR for analysis of the RNA Pan-Cancer panel. The App also contains an advanced parameter section which allows users to modify many of the parameters available for the different tools in the pipeline.
The RNA-Seq Alignment App provides the analyzed data in an easy to understand format as shown in the MiSeq: TruSight RNA Pan-Cancer (Multiple samples) dataset. This analysis of the TruSight RNA Pan-Cancer was performed using the STAR aligner, fusion calling with Manta, and RefSeq annotation.
The Summary page shows an aggregate of the data across all of the samples including some standard RNA-Seq metrics such as Number of Reads, % Total Aligned reads and % Abundant reads, that help you to assess the quality of your libraries. Scrolling down, you will see insert length and alignment distributions, as well as transcript coverage. The Pan-Cancer panel enriches for the coding regions plus 160 bp of the 3’ and 5’ UTR.
To learn more about each sample, click the sample name in the Analysis Reports section in the left hand column at the top of the page, for example UHR1, prepared from Universal Human Reference RNA (Agilent). Some similar metrics to the Summary are displayed at the top of the page, but they are specific to this sample. % Duplicate Reads helps to show the number of non-unique library fragments sequenced, and is calculated from a sub-sampled set of 500,000 reads. Because the panel is designed for high coverage of a targeted set of 1385 genes, the % Duplicates will be higher than whole-transcriptome RNA-Seq libraries. % Duplicates is typically <20% for a control sample like UHR but may be higher, for example, with poor quality FFPE samples.
Scrolling further down the page, the alignment data is broken out into Coding, UTR, Intron and Non-targeted regions. For high quality libraries the majority of the reads will be Coding and UTR, as the Pan-Cancer panel enriches for the coding regions plus ~150 bp of the 3’ and 5’ UTR. The Gene-Level Coverage data provides an indication of the number of expressed transcripts in the sample and the sequencing coverage from 1x – 100x, whilst the Variant Calls summarizes the differences observed from the reference genome used. These data are then expressed graphically below with a Small Variants table that provides details on the variants found.
This table is searchable, or it can be sorted by any of the headers, for example, by Gene name, Read depth, Alternate frequency (the fraction of reads with the alternate rather than reference sequence), or entries in the COSMIC or ClinVar databases.
The next table shows the Fusion calls made by the analysis software.
Genes highlighted in yellow are targeted by the panel – genes not highlighted have been identified via pull down of the fusion gene partner that is targeted. The enrichment approach used in this assay allows the paired end sequencing reads to detect fusions in two ways; Paired Reads or Split Reads. A Paired Read means that the fusion junction lies within the library fragment but is in between the sequenced regions (with Read 1 aligning to Gene 1 and Read 2 aligning to Read 2, for example). A Split Read means that either Read 1 or Read 2 contains the actual fusion breakpoint.
The table shows the number of Paired and Split reads that enabled the fusion to be called. This information along with the number of non-fusion supporting reads and additional quality metrics are used to calculate the Fusion score, which can be used to provide confidence in the call given (0 being lowest confidence, 1 being highest confidence). We have seen that fusion scores greater than 0.6 typically represent calls from abundant or highly expressed fusions. Scores lower than 0.6 may represent fusions expressed at a low level or with very few supporting reads, or where the aligner has lower confidence (such as genes in close proximity and on the same strand that potentially result from transcription read-through rather than translocation).
In the UHR1 sample, 4 fusions are called; BCR-ABL1, BCAS4-BCAS3, and NUP214-XKR3. The BCR-ABL1 and BCAS4-BCAS3 calls each have ≥12 fusion supporting reads and fusion scores of >0.8, representing fusion calls of high confidence. The NUP214-XKR3 and BCAS3-ATXN7 have fusion scores of <0.6 with <5 fusion supporting reads. Since these are expected fusions, the lower Fusion score may represent a low expression fusion transcript. If we compare the fusion calls from this replicate with those from the duplicate sample UHR2, the BCAS3-ATXN7 fusion is not called in the second replicate, so likely represents an event occurring at extremely low levels that may require an orthogonal approach to verify. In contrast, the BCR-ABL1, BCAS4-BCAS3 and NUP214-XKR3 fusions are called in this replicate, with Fusion scores of 0.914, 0.688 and 0.614 respectively. Since UHR RNA is a mixture of RNA from 10 different cell lines, fusion transcripts are present at a ratio of 1/10th of relative expression level in the source cell line. Indeed, we have observed that the BCAS3-ATXN7 fusion appears to be close to the limit of detection in UHR libraries, but is detected in both replicates of its source cell line, MCF7, with Fusion scores >0.7.
The Brain samples (Brain1 and Brain2) in this run represent negative controls for fusion calling as these samples do not contain any fusions.
At the bottom of the page for each sample are additional files for download. These include the Targeted reference FPKM values, which provides gene expression information for all of the targeted genes in the panel, and the Reference FPKM values, which provides a comprehensive FPKM values for the complete transcriptome. Additionally, VCF tables are available with variant calls for targeted genes and the complete transcriptome. Lastly, fusion call output is available for download in the Manta fusion output file, providing additional details on fusions displayed in the Fusion Calls table, and also listing all candidate fusions that were identified by the pipeline, but filtered from the final table. This file can be helpful in identifying candidate fusions if no high-confidence fusions are called.
In addition to the RNA-Seq Alignment App, we are also launching Cufflinks Assembly & DE v2.0.
Cufflinks Assembly and DE v2.0
This App consumes the output of the RNA-Seq Alignment App and performs differential expression. One of the new features of this App is support for multiple groups of samples for differential expression. After selecting the samples and labeling the groups users can then select pair-wise differential expression or select which sample groups to perform differential expression on, as shown below.
Like the RNA-Seq Alignment App, the new Cufflinks App also supports advanced parameters for the tools present in the workflow. Example App results for Brain vs UHRR and Total vs mRNA are available here. The sample project contains 64 samples.
The output of the new Cufflinks App is extremely interactive. The summary page which is a global aggregate across all the groups contains a sample correlation matrix which allows users to zoom in and out on correlations of interest.
The output also includes a 2-D and a 3-D PCA plot across all the samples.
In addition to the summary page there are also analysis results for each differential expression group. The output contains standard metrics such as the number of differentially expressed genes and transcripts. The differential expression group reports also contain interactive charts such as a differential expression heat map.
The differential expression heat map allows users to select a gene of interest and quickly jump to the region of the heat map containing the gene.
The report also contains an interactive gene browser containing an FPKM Log/Log plot and an interactive table of differentially expressed genes.
The log/log plot and table can be used to intuitively explore the data.
We are excited about bringing these new Apps and features to the NGS community and look forward to delivering more high quality and intuitive Apps to the NGS community.