Interpreting structural variation in cancer genomes

A user story from the Genomics England 100,000 Genomes Project cancer programme1

By Jawahar Swaminathan, Ph.D., Program Manager – Population Genomics

Illumina and Genomics England announced a Bioinformatics and Clinical Interpretation partnership (BCIP) in 2016 to “develop a platform and knowledge base that can be used to improve and automate genome interpretation.” In 2017, following months of directed development and rigorous testing, BaseSpace™ Variant Interpreter (BSVI) was adopted by Genomics England as the default interpretation partner for cancer cases in the 100,000 Genomes Project. A previous blog article presented a light-hearted take on the on-boarding and outreach activities by the Illumina team across the 13 constituent Genomic Medicine Centres (GMCs) of the 100,000 Genomes Project. In this article, we showcase how Dr. Patrick Tarpey and Jamie Trotman from the Cancer Genetics Group at Addenbrookes Hospital, Cambridge used the BCIP tools to interpret biologically relevant structural variants in cancer cases from the East of England GMC.

Case 1

A 11-year boy presented with a complex brain tumor initially presented as a biphasic neuroepithelial tumour (low-grade with features similar to those of a desmoplastic infantile ganglioma and high-grade astrocytic tumour). This case was recruited into the 100,000 Genomes Project, where whole-genome sequencing of the tumor and matched normal samples were performed. No variants of interest were initially identified in the static report presented back to the GMC, using analysis from the Genomics England standard pipeline. However, upon deeper analysis with the BCIP tools, Patrick and Jamie were able to identify a novel ZNF394-BRAF gene fusion by visualizing the structural variants and confirmation via filtering and review of the variant call metrics. This fusion is highly reminiscent of an activating fusion between KIAA1549-BRAF (Faulkner et al, 2015, PMID: 26222501) that is a leading cause for pilocytic astrocytoma. The fusion product was predicted to result in the formation of an unregulated kinase domain of BRAF (Exons 10-18). BRAF helps transmit chemical signals from outside the cell to the cell’s nucleus and forms part of a signaling pathway known as the RAS/MAPK pathway, which regulates cell differentiation, migration and apoptosis.

Figure 1: Data visualization showing ZNF394-BRAF fusion
Figure 2: ZNF394-BRAF gene fusion as shown in variant grid

The identified ZNF394-BRAF fusion was subsequently confirmed via an orthogonal test, led to the clinician treating the patient with MEK inhibitors. MEK is downstream from BRAF in the growth factor activation pathway (Figure 3) and the inhibitor is expected to target BRAF-activated cells.

Figure 3: Growth factor activation pathway (Picture courtesy: J. Trotman)

Case 2

This case was an ambiguous diagnosis that presented as a glioblastoma but was reviewed and diagnosed two years after treatment by histology as a pilocytic astrocytoma. Whilst the static reports from Genomics England did not find any variants of interest, a review of the case in BSVI led to the discovery of a large 26.1Mb deletion in chromosome 2, suggesting a CCDC88A-ALK fusion. ALK is a neuronal receptor tyrosine kinase plays a critical role in the development of the nervous system and is selectively expressed in the peripheral and central nervous systems. The domain architecture of ALK shows that it is primarily composed of two MAM domains (MAM1 and MAM2) and a tyrosine kinase domain. The MAM (meprin, A-5 protein, and receptor protein-tyrosine phosphatase mu) domains are predicted to play a role in homodimerization of the receptor kinase and regulate the function of the enzyme (Marchand et al, 1996, PMID: 8798668).

Figure 4: Domain organization of the ALK gene

The fusion product seen in the case suggests that the breakpoints are intronic and lead to the production of a chimeric protein that eliminates the MAM domains of ALK, thereby leading to an activated kinase. Structural variants with intronic breakpoints support the use of whole-genome sequencing in cancer, since these events are unlikely to be identified via targeted, hybrid-capture methods such as whole-exome sequencing.

Figure 5: Visualization and variant grid showing the CCDC88A-ALK gene fusion (Courtesy: J. Trotman)

Prior studies points have identified CCDC88A-ALK fusion as a recurrent partner in ependymoma-like gliomas characterized by both ependymal and astrocytic features (Olsen et al, 2015, PMID: 25795305). This is a critical finding as there are selective ALK inhibitors that could be administered in this case. The fusion was subsequently confirmed in the tumor via orthogonal methods, i.e. PCR and Sanger Sequencing.


The two examples show how visualization of cases accompanied by appropriate use of filters (MGE10KB), gene lists, coupled with a manual check of the variant calls, has resulted in the identification of biologically relevant variants and insight into disease mechanism. As a power user of the BCIP tools deployed to support the 100,000 Genomes Project, Dr. Tarpey says “BSVI analysis of cancer genomes is invaluable to access all variants (regardless of vcf filter status), and to visualise variants (particularly SVs) to inform validity. The numerous opportunities for triage facilitate appropriate analysis strategies across the diverse array of cancer types.


Jamie Trotman is a pre-registered Clinical Scientist. His role is in the analysis and interpretation of 100,000 Genomes Project cancer programme data and report writing for the East of England GMC and the East Midlands and East of England Genomics Laboratory Hub.

Patrick Tarpey is a group leader in the Department of Clinical Genetics at the Cambridge University Hospitals NHS Trust. After a brief period in clinical diagnostics, Patrick moved to Mike Stratton’s team at the Sanger Institute to pursue a project on hereditary x-linked disease via sequencing of the entire genic X-chromosome in a cohort of 100 probands with X-linked disease. This endeavor identified multiple new disease genes which have since been incorporated into routine diagnostics.

He later migrated onto the cancer genome project and pursued multiple projects aimed at unravelling the landscape of somatically acquired variation in breast, bone and other cancer types. This led to the discovery of multiple novel cancer genes, including those of clinical potential. Patrick has a lead role in developing and expanding cancer genome services (familial and acquired) in the recently formed East Anglia and East Midlands Genomic Laboratory Hub (GLH)

For Research Use Only. Not for use in diagnostic procedures.  

1This version of BaseSpace Variant Interpreter co-developed with Genomics England as part of the BCIP contains extensive customizations for their use cases and is not openly accessible to the public. Please contact your Illumina sales representative for guidance on how to use the publicly available version of Variant Interpreter.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.