Cutting-Edge Computational Study Provides ‘Gut Check’ to How We Thought Proteins Function
From growing our cells, tissues and organs to repairing our bodies as we age, proteins are responsible for nearly every task of cellular life. A central tenet in biology has long been that protein sequence dictates protein structure, and that structure dictates its function. But now, an international study led by scientists at the Flatiron Institute in New York City has revealed striking examples in which this tenet does not hold true. By applying robust prediction methods powered by artificial intelligence to proteins from the microbiome of the human gut, the researchers have uncovered many cases in which a protein’s sequence, structure and function do not line up as expected. These findings, published on April 26 in Nature Communications, hint at new mechanisms driving the unparalleled biodiversity of the gut microbiome. They also lay critical groundwork for deeper investigations into how our gut microbiome may impact our health.
“This work changes how we think about protein structure and function,” says Julia Koehler Leman, a project leader and research scientist at the Flatiron Institute’s Center for Computational Biology (CCB) and one of the study’s lead authors. “We see a broader range of sequences and functions associated with a smaller set of structures, which suggests more complex factors behind protein function.”
Growing out of a collaboration between the Flatiron Institute, the Broad Institute, the University of California, San Diego, and Jagiellonian University in Poland, the study had 16 authors and relied on a citizen-science approach in which volunteers could donate computing power to the study using the World Community Grid, formerly hosted by IBM.
Proteins are comprised of a variety of amino acids that have linked up to form long chains. As these chains grow, they fold in on themselves, creating intricate and dynamic three-dimensional structures. Until recently, determining these protein structures — not to mention their function — required painstaking laboratory work and resource-heavy computer algorithms. Most traditional studies seemed to largely confirm the notion that similar sequences lead to similar structures, and thus functions.
In recent years, powerful AI-based tools have made it easier to predict protein structure and, to a lesser extent, function, enabling scientists to set their sights on larger and more complex protein populations. One such population lies within our own gut. Known collectively as the human gut microbiome, it contains more genetic diversity than the entirety of the human genome, but most of its function remains a mystery. Some early studies have shown tantalizing hints of its power — for example, microbiome imbalance is linked to Type 1 diabetes susceptibility in infants. But scientists believe we are at the tip of the iceberg in terms of fully understanding the microbiome’s impact on health.
While the team behind today’s paper originally sought to develop a project to determine the structures of human gut microbiome proteins, their work soon expanded to include function prediction. The team took gene sequences from the human gut microbiome and used two computational methods called DMPFold and Rosetta — both developed partly at the Flatiron Institute — to predict the structure of the approximately 200,000 proteins that these sequences encoded. Next, they employed the computational tool DeepFRI, also developed at the Flatiron, to predict these proteins’ functions. The result was a repository of protein sequence, structure and functional information unlike any other in scope — a microbial tree of life. And it yielded several surprises.
First, the team noticed that a plethora of dissimilar sequences folded into highly similar protein structures, an observation that went against the long-held notion that protein sequence largely dictates its structure. Subsequently, when the researchers examined the relationship between protein structure and function, they found fascinating examples of structure-function mismatch, identifying cases in which different protein structures resulted in similar functions, and other cases in which similar protein structures yielded different functions.
The former case could be a backup survival strategy, ensuring that more than one kind of protein can carry out the same critical function. But the latter case is especially interesting, says the research team, because it raises questions about why proteins with similar structures would play different roles. “It could be that among similar structures, something is causing an amino acid in one protein to play a more important functional role than it does in another protein,” says Douglas Renfrew, a research scientist at the CCB and another of the study’s lead authors.
Taken together, these findings offer critical insight across several areas. In harnessing powerful computational methods, the team identified hundreds of new protein structures representing a total of 148 novel protein folds. Some of these novel structures are predicted to bind carbohydrates, potentially suggesting a new mechanism for carbohydrate binding. Because the gut microbiome has been known to play a role in the breakdown of carbohydrates, discoveries like these could impact our understanding of how the microbiome influences human health.
“This work pushes the frontier in terms of protein structure and function,” says Jamie Morton, an investigator in the biostatistics and bioinformatics branch of the National Institute of Child Health and Human Development who was not involved in the study. “When you see the same structure and all the different roles it can play, this upends our intuition on the structure-function link.”
Going forward, the challenge will be to apply this new data repository to the inner workings of the microbiome, says Tomasz Kosciolek, a group leader in bioinformatics at the Małopolska Centre of Biotechnology at Jagiellonian University and a study co-author. “Until now, we have been talking about the microbiome in the same language one would use to describe the biodiversity of a rainforest,” Kosciolek says. “We hope to start talking about it in more mechanistic terms, like what molecules might amplify, inhibit or change certain biological processes.”
In addition, this work also offers a “sequence-structure-function computational pipeline that can get us closer to understanding how the microbiome influences human health,” says Tommi Vatanen, a study co-author and a former postdoctoral fellow at the Broad Institute who is now an Academy research fellow at the University of Helsinki.
“We hope the scientific community will come together and start to test, both experimentally and computationally, some of the structure-function relationships unearthed here,” Koehler Leman adds. “This will bring us closer to understanding which aspects of structure are most relevant for function and will broaden our view of this fundamental relationship in biology.”