Helping scientists perform complex data analyzes without writing code | MIT News

As the costs of diagnostic and sequencing technologies have fallen dramatically in recent years, researchers have amassed an unprecedented amount of data on disease and biology. Unfortunately, scientists looking to move from data to new drugs often need the help of someone with software engineering experience.

Now Watershed Bio helps scientists and bioinformaticians conduct experiments and gain insights with a platform that enables users to analyze complex data sets regardless of their computational skills. The cloud-based platform provides workflow templates and a customizable interface that helps users explore and share data of all types, including whole-genome sequencing, transcriptomics, proteomics, metabolomics, high-content imaging, protein folding, and more.

“Scientists want to learn the software and data science part of this field, but they don't want to become software engineers writing code just to understand their data,” says co-founder and CEO Jonathan Wang '13, SM '15. “Thanks to Watershed, they don't have to.”

Watershed is used by research teams large and small in industry and academia to support discovery and decision-making. When new advanced analytical techniques are described in scientific journals, they can be instantly added to the Watershed platform as templates, making cutting-edge tools more accessible and collaborative for researchers of all backgrounds.

“Data in biology is growing exponentially, and the sequencing technologies that generate this data are getting better and cheaper,” Wang says. “Coming from MIT, this problem was right in my wheelhouse: It's a hard technical problem. It's also a big problem because these people are working on treating diseases. They know all this data has value, but they have a hard time using it. We want to help them get more information faster.”

No code discovery

Wang expected to major in biology at MIT, but he quickly became fascinated by the possibilities of creating solutions that would be accessible to millions of people working in computer science. He obtained his bachelor's and master's degrees from the Faculty of Electrical Engineering and Computer Science (EECS). Wang also interned in a biology lab at MIT, where he was surprised how slow and labor-intensive the experiments were.

“I saw the difference between biology and computer science, where there is a dynamic environment (in computer science) that allows for immediate feedback,” Wang says. “Even as a single person writing code, you have tons of fun at your fingertips.”

While working on machine learning and high-performance computing at MIT, Wang also started a high-frequency trading company with some of his classmates. His team hired researchers with PhDs in fields such as mathematics and physics to develop new trading strategies, but they quickly noticed a bottleneck in their process.

“Things moved slowly because researchers were used to building prototypes,” Wang says. “These were small approximations of models that you could run locally on your machines. To implement these approaches in production, they needed engineers to make them work at high throughput on a compute cluster. But the engineers didn't understand the nature of research, so you had to go back and forth. This meant that ideas you thought could be implemented in a day took weeks.”

To solve this problem, Wang's team developed a software layer that made building production-ready models as easy as building prototypes on a laptop. Then, a few years after graduating from MIT, Wang noticed that technologies like DNA sequencing had become cheap and ubiquitous.

“The bottleneck was no longer sequencing, so people said, 'Let's put everything in order,'” Wang recalls. “Computations became the limiting factor. People didn't know what to do with all the data they were generating. Biologists were waiting for help from data scientists and bioinformaticians, but these people didn't always understand biology at a deep enough level.”

The situation seemed familiar to Wang.

“It was exactly like what we saw in finance, where researchers tried to work with engineers, but the engineers never fully understood it, and all this inefficiency was related to people waiting for engineers,” Wang says. “Meanwhile, I learned that biologists were hungry to do these experiments, but there was such a big gap that they felt they needed to become software engineers or just focus on science.”

Wang officially founded Watershed in 2019 with physician Mark Kalinich '13, a former MIT classmate who is no longer involved in the company's day-to-day operations.

Since then, Wang has heard from biotech and pharmaceutical company executives about the increasing complexity of biological research. Unlocking new insights increasingly involves analysis of whole-genome data, population studies, RNA sequencing, mass spectrometry, and more. Developing personalized treatments or selecting patient populations for clinical trials can also require huge datasets, and new ways of analyzing data are constantly emerging in scientific journals.

Today, companies can perform large-scale analytics on Watershed without having to set up their own servers or cloud accounts. To speed up their work, researchers can use ready-made templates that work with all the most popular data types. Popular AI-powered tools like AlphaFold and Geneformer are also available, and the Watershed platform makes it easy to share workflows and drill down into results.

“The platform is ideal in terms of usability and adaptability for people from all walks of life,” says Wang. “No science is ever the same. I avoid the word product because it means you implement something and then you just run it on a large scale forever. Research is not like that. Research is about coming up with an idea, testing it, and using the results to develop another idea. The faster you can design, implement, and run experiments, the faster you can move on to the next one.”

Accelerating biology

Wang believes Watershed helps biologists keep up with the latest advances in biology while accelerating scientific discovery.

“If you can help scientists discover new insights not just a little bit faster, but 10 to 20 times faster, that can really make a difference,” Wang says.

Watershed is used by researchers in academia and by companies of all sizes. Executives at biotech and pharmaceutical companies also use Watershed to make decisions about new experiments and drug candidates.

“We've had success in all of these areas, and the common thread is that people understand research, but they're not experts in computer science or software engineering,” Wang says. “It's exciting to see this industry grow. For me, it's great to have worked at MIT and now to be back in Kendall Square, where Watershed is based. This is where so much progress is being made. We're trying to do our part to enable the future of biology.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here