Shotgun metagenomics can now be analyzed in the BaseSpace platform.

We are happy to announce the release of the Kraken Metagenomics App as a part of BaseSpace Apps.


With this BaseSpace Labs App researchers will be able to classify the presence of viruses and bacteria in their next-generation sequencing (NGS) samples. Kraken was developed by Derrick Wood in Steven Salzberg’s Lab at Johns Hopkins University. Unlike alignment-based classification methods, Kraken utilizes exact k-mer matching and a novel classification algorithm to perform taxonomic classification of NGS reads. The methods and performance have been described in detail in Genome Biology. The Kraken Metagenomics App limits the taxonomic classification to bacteria and viruses available in the MiniKraken 20140330 database.

In addition to using Kraken for classification, the app also provides a host removal feature which uses the SNAP aligner to remove human reads prior to classification. SNAP is an aligner that was developed by a team from the UC Berkeley AMP Lab, Microsoft, and UCSF. The output of the SNAP host removal step is an anonymized BAM file containing the host filtered reads. If host removal is selected then only the host filtered reads will be used by Kraken for taxonomic classification.

In order to demonstrate the performance of the App, we tested it on data described in a recent publication by Wilson et al., 2014. The authors were able to identify the presence of Leptospira in a CSF sample, using Illumina’s MiSeq desktop sequencer. This was in contrast to other methods, such as qPCR, which failed to identify the Leptospira. The data was downloaded from the SRA (Accession Number SRR1145846) and analyzed using the App. The SRA data contained 52,621 paired-end reads that remained after host filtering the original 3,063,784 paired-end reads. Reads were trimmed to remove sequencing adapter prior to analysis. The analysis completed in 30 minutes and generated results that are consistent with Wilson et al. This analysis demonstrates that the App is able to produce publication quality results with equivalent sensitivity.

The tables below show some of the basic metrics obtained from the App. Because the host removal option was selected for this analysis, 1,232 of the 52,621  reads were additionally identified as host. Of the remaining 51,389 reads, only 792 were classified as virus or bacteria. The remaining  50,597 reads which were not assigned a bacterial or viral taxonomy may have come from contamination by other organisms not found in the MiniKraken database or host reads not found in the human reference.


The Krona plot obtained from the Kraken Metagenomics App result is shown below. Krona plots allow for hierarchical data to be visualized with zoomable pie charts.  In the case of metagenomics data Krona plots display the taxonomic hierarchy of a sample. The taxonomic levels are represented in the radial direction and the organisms within each taxonomic level in the angular direction. Greater than 20% of the 792 reads that were assigned a taxonomy, were identified as Leptospira which is consistent with the results of Wilson et al.



With this App, researchers now have access to a high performing, sensitive, and interactive tool for analyzing their metagenomics data in BaseSpace. Researchers can use this App to perform hypothesis-free studies of the structure of bacterial and viral communities present in environmental, industrial, and biological samples. We are excited to provide this App to the BaseSpace community and look forward to feedback and suggestions for improving later versions.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.