BaseSpace Security

Obviously security is a key  concern in making the decision to move to cloud-based genomic storage and analysis. It’s difficult (if not impossible) to quantify security in an absolute sense. But since most researchers currently use their institutional IT infrastructure for storage and analysis, it’s possible to assess current  institutional IT security relative to that provided by BaseSpace.

BaseSpace has been built by Illumina on Amazon’s AWS cloud infrastructure. AWS hosts cloud-based services such as Netflix, Quora, Reddit and Foursquare as well as providing customer-facing services for government departments including Treasury, DOE, and State. Amazon’s security webpage can be found here. I’ve found that the most useful security overview to be this white paper: “Amazon Web Services: Overview of Security Processes“. Another useful resource is the AWS blog. Here are some key points to note about AWS:

  • Standards and accreditation: SOC 1/SSAE 16/ISAE 3402 (auditing), FISMA moderate (US Federal Government), PCI DSS Level 1 (electronic payments), ISO 27001 (international security standard), and FIPS 140-2 (encryption). (For reference, the NIH’s own data centers are rated  FISMA moderate.)
  • Data centers are protected by security staff and controlled access procedures. Staff with system access undergo background checks.
  • All hardware is located behind firewalls which are configured by default to block all traffic.
  • Operating security patches are automatically applied.
The BaseSpace team is of course writing large amounts of code that runs on the AWS infrastructure. To verify end-to-end security we have had a third party computer security firm assess our architecture for security risks, and they are also tasked with running “penetration tests” to identify potential vulnerabilities. In addition we encrypt all uploaded data using the AES256 standard, ensuring that even if all other security precautions were circumvented, the stolen data could not be read.
We believe the combination of Amazon’s comprehensive and well tested approach to platform security, overlaid with our own security precautions, ensures that BaseSpace meets or exceeds the security provided by many institutional IT infrastructures.

Let’s also examine a few of the  more general questions that are sometimes raised about the security implications the cloud:

Isn’t a big public cloud provider  a huge target, and so inevitably vulnerable to attack?

  • It’s safe to assume that the size of the prize means that AWS is under constant attack. One advantage of this is that security researchers are always (a) working to identify vulnerabilities as any discovery will be high profile and (b) informing the operator of the problem so as to be seen as one of the good guys. A recent example of this was a security issue identified in October by researchers at Germany’s Ruhr University. The vulnerability, which has not been tied to any actual attacks, was immediately addressed by AWS. And it got a lot of press for Ruhr!
  • Obviously a criminal attacker that finds a vulnerability isn’t going to tell AWS about it. But in the words of the famous cartoon “I don’t have to outrun that bear, I only have to outrun you”: if someone breaks into Amazon their target will almost certainly be  easily monetized data such as credit card numbers, not genomic data.

If my data is in the cloud, then it’s “on” the internet, and that must be risky, right?

In reality virtually all the world’s computers are connected to the internet. A computer in isolation is rare, and not terribly useful. So it’s highly likely that any existing computer that you use for storing genomic data is already connected to the internet, or at the very least on an intranet that is in turn connected to the internet. Secure isolation from the internet is typically provided by a firewall device configured to protect the internal network from outside attack. AWS computers are protected in the same way by firewalls – and AWS actively monitors its firewalls to check for vulnerabilities (a service beyond the resources of most institutions). And we also encrypt your data, something else that’s rarely done in the institutional IT setting.

My data has to travel to and from the cloud over the internet – isn’t that a big risk?

E-commerce has been with us since web retailers such as Amazon began to emerge. SSL (Secure Sockets Layer) is an internet standard that has been developed to encrypt sensitive communications as they pass over the internet. SSL is regularly updated to allow for new technologies and new threats. Every day millions of people and institutions rely on SSL to protect financial transactions. We use SSL to protect BaseSpace data uploads and downloads.  Think of it this way: most of us now access bank accounts over the internet: so just because something is accessible over the internet, doesn’t mean it’s inherently insecure – it’s all about the quality of the security being implemented.

The entire subject of genomic data storage and analysis in the cloud is undergoing constant change, and we’d really like to get your input in the comments below. Let us know your experiences and concerns – we want to learn!

– Alex.


  1. To what extent can Illumina (or Amazon) access individual data sets stored in BaseSpace? Can you provide an overview of Illumina’s policy in this area?

    • Brian, thanks for your question. With regard to Illumina access to data sets, this is an excerpt from our privacy policy:

      “Security of Collected Information. Illumina is committed to protecting the security of the information collected, and we take reasonable physical, electronic, and administrative safeguards to help protect the information from unauthorized or inappropriate access. When we transmit information over the Internet, we protect it through the use of encryption, such as the Secure Socket Layer (SSL) protocol. We also restrict access to information to individuals at Illumina who need to know the information in accordance with their job responsibilities, and such access is only made from secured locations.”

      In other words, our policy is to not look at your data, and our procedures implement that policy. The only exception to this is if you explicitly choose to share run data with Illumina in order to help with a support issue.

      With regard to Amazon, customers such as Illumina run virtual instances of operating systems inside real operating system instances. As noted in the AWS security overview: “AWS administrators do not have access to customer instances, and cannot log into the guest OS”.

      Additionally Illumina encrypts your data to the AES 256 bit standard, providing yet another level of protection.

      For more information please see the BaseSpace Terms of Use and BaseSpace Privacy Policy.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.