Mengdi Wang makes a play at decoding disease

Mengdi Wang in her office with math on the whiteboard in background. — Mengdi Wang in her Princeton office. Photo by Sameer A. Khan/Fotobuddy

On the windowsill of Mengdi Wang’s Princeton office sits a box marked Civilization, a board game of strategy and conquest that starts in the Stone Age and ends with intergalactic flight.

“I love games,” said Wang, an associate professor of electrical and computer engineering and an expert in artificial intelligence.

Wang spent a sabbatical year at Google DeepMind in 2020, studying techniques to improve reinforcement learning, the AI technology behind the world’s most advanced Go-playing system, AlphaGo. While there, she noticed the beginnings of an explosive trend. From fusion energy to neuroscience, researchers of every stripe were turning to reinforcement learning and language models to solve hard, interesting problems.

Although language models were not her main specialty, she had studied them. In 2018, she wrote a paper with collaborators at Microsoft Research on reinforcement learning for natural language processing, the branch of machine learning that gave rise to today’s large language models (think ChatGPT). But while she had been interested in the topic then, the rapid development of the field Wang saw in 2020 left her stunned. “The models had become so powerful,” she said, “the set of solvable problems had become huge.”

Two years later, generative AI thundered into public consciousness. And Wang turned her growing expertise in language models to advance another emerging technology, mRNA vaccines, like those that protect against COVID-19.

Working with the biotech startup RVAC Medicines, Wang developed a language model that treats biological sequences like text. Just as natural languages have syntax and grammar, the sequences of mRNA molecules dictate the forms and functions of proteins. And those proteins are the key ingredient in vaccine development.

Her model looks at a key region of mRNA molecules that modulate protein synthesis, generates variations in the genome sequences of that region, and identifies the most efficient, like rearranging words to create a more resonant sentence. The result, validated in wet-lab experiments, was a 33% improvement in protein-production efficiency — huge in this field.

It’s a foundational step toward what Wang called the life sciences’ GPT moment, when AI breaks through to transform our relationship to health and disease. That’s probably a long way off. It will require systematically unifying disparate branches of knowledge, a civilization-level feat. But it’s there, over the horizon, like a perfect sentence waiting to be read.