A look into Microsoft’s cloud-based genomics

How often do you thank your genes for your good looks, social skills, or intelligence? Or for that matter, blame your genes for your maladies or diseases? Have you wondered what these genes really are and why they impact you so much?

Welcome to the world of genomics — a branch of molecular biology that deals in depth with the creation, evolution, functionality, and mapping of different genes in all living organisms. And just as you as an IT pro probably find that Azure can help you do your job better, those who work in the field of genomics are also finding that Microsoft’s cloud-based platform makes their research better.

Let’s take a few minutes to get a basic understanding of genes and their role in humans.

DNA and genomics

Deoxyribonucleic acid (DNA) is the foundation for all living organisms. It is a chemical compound that contains the instructions needed to direct the activities and physical structure of a living creature. In humans, DNA is comprised of two twisted and paired strands, often called the double helix.

Everyone’s DNA is made up of four chemical units that carry the instructions needed for making a specific protein within an organism. Since proteins make up our body structures like organs and tissues, and also control the internal chemical reactions, this DNA determines our predisposition to certain diseases. A complete set of DNA is called genome, and in human beings, this DNA is located on 23 pairs of chromosomes. It’s these 23 pairs that give us our unique look and personality. Astonishing, right?

This is not all. Researchers all over the world are combining their in-depth knowledge with advanced technology to find solutions for many of the complex diseases that are plaguing us today, and more importantly, to explore new possibilities for treating them.

One of the technologies that has been at the forefront in furthering the study of genomics is the cloud. Leading cloud companies like Microsoft are helping researchers store and analyze petabytes of data to gain meaningful insights about genes. In fact, Microsoft has come up with a few projects that provide critical actionable information by reducing the time it takes for DNA sequencing.

Here’s an in-depth look into Microsoft’s efforts in the field of genomics.


Factored Spectrally Transformed Linear Mixed Models (FaST-LMM) is a set of tools designed by Microsoft to perform genome-wide association studies (GWAS) most efficiently.

Specifically, this tool helps researchers perform both single SNP and SNP-set GWAS on huge volumes of data sets. SNP stands for single-nucleotide polymorphism, and it refers to the possibility of a particular base in a position. For example, let’s say base C occurs at a specific position in the human genome for most individuals, and base A occurs in rare circumstances. This can be interpreted as nucleotide variations that can occur at that specific position. This variation in SNP is what causes us to be more susceptible to some diseases than others, so understanding this variation can help health-care professionals identify the right treatment for each individual.

But identifying this variation is anything but easy, as computation can take years. With the cloud, what once took years can be completed in a day. For example, analyzing 500,000 SNPs for 15,000 people can amount to a whopping 62 billion SNP sets, and this can be analyzed in a day on Azure. That’s the power of technology!

[tg_youtube video_id=”lZ48LXsXjTo”]


Rapid advances in medical sciences, especially in the field of DNA sequencing and genomics, makes it difficult for researchers to stay on top of growing scientific knowledge. At the same time, understanding these developments is important, as they can identify and treat diseases better and faster than before.

To help the research community stay abreast of developments, Microsoft has come up with a curation system called Literome, which extracts genomics-based information from PubMed, and makes them available in the cloud for easy browsing and searching. This repository contains all the latest developments in the areas of directed interactions and genotype-phenotype associations, both of which deal with SNPs, gene interaction, and more.

Partnerships and collaboration

Besides developing its own tools, Microsoft is also partnering and collaborating with genomics companies DNAnexus and Codigo, and Hamburg Eppendorf University.

DNAnexus provides a collaborative genome informatics and data-management platform to enhance genome applications for health care and research. This platform now runs on Azure to provide faster processing and better data-handling capabilities. Already, organizations like the Stanford Center for Genomics and Personalized Medicine (SCGPM) have started using DNAnexus on Azure.

Similarly, BC Platforms and Microsoft have come together to build a cloud-based genomic data-management solution for Codigo, a commercial laboratory in Mexico that aims to have the largest bio-bank of genetic information in the Latin American region.

Such partnerships help Microsoft to extend its Azure platform to companies involved in genomic research, and through it, to thousands of researchers who are working to identify and treat complex diseases like cancer and sickle cell.

Cloud-based genomics in the real world

Microsoft’s cloud-based genomics tools have had an enormous impact in the real world, and much of it can be attributed to the fact that the idea for these tools come from biologists who developed an interest in computer science because analysis and computation was taking up so much of their valuable time. These ideas were augmented and implemented by computer scientists who have a keen interest in biological sciences.

Such a combination of biologists and computer scientists is what gives Microsoft an edge over other cloud providers operating in the same space. Currently, Microsoft’s tools ensure that running the Genome Analysis Toolkit (GATK) is seven times faster in Azure when compared to other cloud systems. As a result, it allows doctors to diagnose and treat rare and dangerous medical conditions much faster, thereby increasing the survival rate for patients. The cost of running these genome tests is affordable, too.

A genomics revolution

Because of the above factors, requests from hospitals and research institutions to process genomics tests are increasing rapidly. In this sense, Microsoft is fueling a genomics revolution that can change the way we understand our bodies. In addition, it can also hasten research in plants and animals, too, as the genomics sequencing can be done in a much shorter time than before. These developments, in turn, can address some of the pressing issues we face such as food shortages, renewable energy, global warming, threat from new virus strands, and more.

In short, cloud-based genomics is likely to be the next big wave in the medical and health-care sector. Led by companies like Microsoft, this technology can help us better understand living organisms in general and humans in particular, so that rare disorders can be predicted ahead of time, and appropriate treatment can be administered at the earliest. And this all means we are one step closer to a healthier life and a longer lifespan.

About The Author

Leave a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top