As of about an hour ago all MiSeq resequencing applications in BaseSpace can run the BWA+GATK secondary analysis pipeline. The pipeline can be initiated directly on the instrument when setting up a MiSeq run, and following real-time data transfer the analysis is performed natively in BaseSpace.
According to the Broad Institute, GATK best practices basically boils down to the following:
1. Duplicate removal
2. BQ recalibration
3. INDEL cleaning
4. Variant calling (Unified Genotyper)
5. Manually curating variants / Variant recalibration / variant filtration
The current version of BWA+GATK implemented in BaseSpace comprises:
1. Duplicate removal/marking (modified samtools)
2. Variant calling (UnifiedGenotyper)
3. Variant filtering (vcf annotator)
Various experiments during the past year have indicated base quality recalibration is not necessary with our current base quality tables in RTA.
INDEL cleaning and variant recalibration are inherently good ideas, but impracticable for broad deployment on BaseSpace given the significant compute resources they require. More work will be done on this from our end and we hope to include them in the pipeline down the road.
Importantly, BWA+GATK will not be available for the on-instrument MiSeq software for a few months. This speaks to the fact that we can quickly deploy features, fix bugs and incorporate user feedback to tweak our BaseSpace implementation before rolling it out to the instrument install base. So expect to see new things in BaseSpace well before they are deployed on instrument going forward.
While we are working on an app store that delivers all sorts of commercial downstream analysis tools, we understand that select open source academic tools are used by most of our customers and want to make those as accessible as possible.