3 questions: Using computation to study the world's best single-cell chemists | MIT News

Today, of the estimated 1 trillion species living on Earth, 99.999 percent are microbes—bacteria, archaea, viruses, and single-cell eukaryotes. For most of our planet's history, the Earth was ruled by microbes, able to live and thrive in the most extreme environments. Over the past few decades, researchers have only begun to grapple with microbial diversity – it is estimated that less than 1 percent of known genes have laboratory-confirmed functions. Computational approaches offer researchers the ability to strategically analyze this truly astonishing amount of information.

Environmental microbiologist and computer scientist by training, new employee of MIT Yunha Hwang is interested in the new biology revealed by the most diverse and prolific form of life on Earth. In a joint teaching position as the Samuel A. Goldblith Professor of Career Development at the University of Department of Biology, and also an assistant professor at the University of Department of Electrical Engineering and Computer Science and MIT Schwarzman College of ComputingHwang studies the intersection of computation and biology.

Q: What led you to study microbes in extreme environments, and what are the challenges of studying them?

AND: Extreme environments are great places to look for interesting biology. Growing up, I wanted to be an astronaut, and the closest thing to astrobiology is the study of extreme environments on Earth. And the only thing that lives in these extreme environments are microbes. On a sampling expedition I was on off the coast of Mexico, we discovered a colorful microbial mat about 2 km underwater that was thriving because the bacteria were breathing sulfur instead of oxygen – but none of the microbes I had hoped to study grew in the lab.

The biggest challenge in studying microbes is that most of them cannot be cultured, which means the only method to study their biology is a method called metagenomics. My latest work is genomic language modeling. We hope to develop a computational system that will allow us to study the organism as precisely as possible “in silico”, using only sequence data. A genomic model of language is technically a large model of language, except that the language is DNA rather than human language. He is trained in a similar way, only in a biological language as opposed to English or French. If our goal is to learn the language of biology, we should take advantage of the diversity of microbial genomes. Even though we have a lot of data and as more samples become available, we have just scratched the surface of microbial diversity.

Q: Given how diverse microbes are and how little we know about them, how can studying microbes in silico using genomic language modeling advance our understanding of the microbial genome?

AND: The genome consists of many millions of letters. Man cannot look at it and understand it. However, we can program the machine to divide the data into useful parts. This is how bioinformatics works with a single genome. But if you look at a gram of soil, which can contain thousands of unique genomes, that's simply too much data to work with – it takes a human and a computer to deal with that data.

During my PhD and master's degrees, we were just discovering new genomes and new lineages that were so different from anything that had been characterized or grown in the lab. These were things we simply called “microbial dark matter.” When there are a lot of uncharacterized things, that's when machine learning can be really useful because we're just looking for patterns – but that's not the ultimate goal. We hope to map these patterns to the evolutionary connections between every genome, every microbe, and every instance of life.

Previously, we thought of proteins as a separate entity – this gives us a decent level of information because proteins are related by homology, so things that are evolutionarily related can have a similar function.

It is known in microbiology that proteins are encoded in genomes, and the context in which that protein is constrained – what regions come before and after – is evolutionarily conserved, especially if functional coupling exists. This makes total sense because when you have three proteins that need to be expressed together because they form a unit, you might want them right next to each other.

I want to incorporate more genomic context into the way we search for and annotate proteins and understand protein function, so that we can go beyond sequence or structural similarity and add contextual information to how we understand proteins and hypothesize about their function.

Q: How can your research be applied to harness the functional potential of microorganisms?

AND: Microbes are probably the best chemists in the world. Harnessing microbial metabolism and biochemistry will lead to more sustainable and efficient methods of producing new materials, new therapeutics and new types of polymers.

But it's not just about efficiency – microorganisms perform chemical processes that we don't even know how to think about. Understanding how microbes work and being able to understand their genome and functional capabilities will also be very important as we think about how our world and climate are changing. Microorganisms are responsible for most carbon sequestration and nutrient cycling; If we do not understand how a given microbe is able to fix nitrogen or carbon, then we will encounter difficulties in modeling nutrient flows on Earth.

On a more therapeutic side, infectious diseases pose a real and growing threat. Understanding how microbes behave in diverse environments compared to the rest of our microbiome is really important as we think about the future and combating microbial pathogens.

LEAVE A REPLY

Please enter your comment!
Please enter your name here