Broad Institute speeds scientific research with Cloud SQL

16/05/2022

Editor’s note:The Broad Institute of MIT and Harvard, a nonprofit biomedical research organization that develops genomics software, needs to keep pace with the latest scientific discoveries. Here’s how they use managed database services from Google Cloud to move fast and stay on the cutting edge.

> Reference:

The Broad Institute of MIT and Harvard is a nonprofit biomedical research organization that focuses on advancing the understanding and treatment of human disease. One of our major initiatives is developing genomics tools and making them available across the scientific ecosystem. The rapid pace of discovery means our data sciences team has to keep pace so that our software products enable the best research. Our ability to move fast is critical. And when we decided to pivot our focus during the pandemic to develop and process tens of millions of COVID-19 tests, speed was a driving factor. Fully managed database services and analytics solutions from Google Cloud helped Google accelerate their pace of development.

Table of contents

Accelerating genomics insights with Cloud SQL

One of Google main products that uses Google Cloud services is Terra —a secure, scalable, open source platform for biomedical research. We co-developed it with Microsoft and Verily to help researchers access public data files, manage private data, organize research, and collaborate with others. After working with Google Cloud for a long time, we will naturally use its services Google Cloud Platform for Terra's dashboard.

For the backend, we use a number of cloud services including Cloud SQL for PostgreSQL and MySQL, as well as ., to allow users to track their different data assets, methods, and research results, and to power the Terra control plane. Cloud SQL helps us accelerate development in two key areas. First, our developers can get these database services up and running quickly, without going through some centralized system that might become a bottleneck. And secondly, using Cloud SQL lowers our operational burden. We can keep managed services running and performing well using fewer of our own developers. Instead, these teams can focus on developing new features for users.

Optimizing cloud spend with BigQuery analytics

For much of Google's genomics analysis, they use BigQuery, Compute Engine and Dataproc (https://cloud.google.com/dataproc), but understanding the detailed costs of that research has been challenging. Billing data can be exported into BigQuery, but the costs wouldn't be attributed to the specific analyses being performed. However, by adding billing labels to each cloud resource used and joining that information with detailed metadata in our relational Cloud SQL databases we can provide extremely fine grained cost information. As a result, for example, we’re able to tell a researcher that their virtual machine spent 17 cents as part of a certain analysis, research project, or sample. With these insights, our researchers have visibility into their costs, and are able to decide where to focus their optimizations.

Pivoting to process COVID-19 tests

When the global pandemic hit, the Broad Institute volunteered to make our clinical testing and diagnostics facilities available to serve public health needs. We created a novel automation system for COVID-19 test processing that is scalable, modular, and high-throughput, in service of the public health needs of the Commonwealth of Massachusetts and surrounding areas. In the first several months of the pandemic, Broad processed more than 10 percent of all PCR tests in the United States, and today has processed more than 30 million tests, with turnaround times of less than 24 hours. Using serverless components with a Cloud SQL for PostgreSQL database at its core, we built a testing solution—going from an idea to launching our large-scale COVID-19 operation in just two weeks. On our first day, we delivered 140 tests. But a year later we were delivering up to 150,000 tests a day. That’s in part because our database solution was able to scale up really quickly.

With a few . CLI commands, we enabled high availability and read replicas for our database, while backups and maintenance upgrades were handled automatically. This scalability made a big difference to us considering we were a small team working on very tight timelines.

Source: Gimasys

Accelerating genomics insights with Cloud SQL

Optimizing cloud spend with BigQuery analytics

Pivoting to process COVID-19 tests

Related Posts