563 Publications

Specific viral RNA drives the SARS CoV-2 nucleocapsid to phase separate

C. Iserman, C. Roden, M. Boerneke, R. Sealfon, G. McLaughlin, I. Jungreis, C. Park, A. Boppana, E. Fritch, Y. Hou, C. Theesfeld, O. Troyanskaya, R. Baric, T. Sheahan, K. Weeks, A. Gladfelter

A mechanistic understanding of the SARS-CoV-2 viral replication cycle is essential to develop new therapies for the COVID-19 global health crisis. In this study, we show that the SARS-CoV-2 nucleocapsid protein (N-protein) undergoes liquid-liquid phase separation (LLPS) with the viral genome, and propose a model of viral packaging through LLPS. N-protein condenses with specific RNA sequences in the first 1000 nts (5’-End) under physiological conditions and is enhanced at human upper airway temperatures. N-protein condensates exclude non-packaged RNA sequences. We comprehensively map sites bound by N-protein in the 5’-End and find preferences for single-stranded RNA flanked by stable structured elements. Liquid-like N-protein condensates form in mammalian cells in a concentration-dependent manner and can be altered by small molecules. Condensation of N-protein is sequence and structure specific, sensitive to human body temperature, and manipulatable with small molecules thus presenting screenable processes for identifying antiviral compounds effective against SARS-CoV-2.

Show Abstract

Structure-Based Protein Function Prediction using Graph Convolutional Networks

V. Gligorijevic, D. Renfrew, T Kosciolek, J. Koehler, D. Berenberg, T Vatanen, C Chandler, B Taylor, I. Fisk, H Vlamakis, R Xavier, R Knight, K Cho, R. Bonneau

The large number of available sequences and the diversity of protein functions challenge current experimental and computational approaches to determining and predicting protein function. We present a deep learning Graph Convolutional Network (GCN) for predicting protein functions and concurrently identifying functionally important residues. This model is initially trained using experimentally determined structures from the Protein Data Bank (PDB) but has significant de-noising capability, with only a minor drop in performance observed when structure predictions are used. We take advantage of this denoising property to train the model on > 200,000 protein structures, including many homology-predicted structures, greatly expanding the reach and applications of the method. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 40% sequence identity to the training set. We show that our GCN architecture predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and previous competing methods. Using class activation mapping, we automatically identify structural regions at the residue-level that lead to each function prediction for every confidently predicted protein, advancing site-specific function prediction. We use our method to annotate PDB and SWISS-MODEL proteins, making several new confident function predictions spanning both fold and function classifications.

Show Abstract
June 10, 2020

Inference of Bacterial Small RNA Regulatory Networks and Integration with Transcription Factor-Driven Regulatory Networks

M Arrieta Ortiz, C Hafemeister, B Shuster, N Baliga, R. Bonneau

Small noncoding RNAs (sRNAs) are key regulators of bacterial gene expression. Through complementary base pairing, sRNAs affect mRNA stability and translation efficiency. Here, we describe a network inference approach designed to identify sRNA-mediated regulation of transcript levels. We use existing transcriptional data sets and prior knowledge to infer sRNA regulons using our network inference tool, the Inferelator. This approach produces genome-wide gene regulatory networks that include contributions by both transcription factors and sRNAs. We show the benefits of estimating and incorporating sRNA activities into network inference pipelines using available experimental data. We also demonstrate how these estimated sRNA regulatory activities can be mined to identify the experimental conditions where sRNAs are most active. We uncover 45 novel experimentally supported sRNAmRNA interactions in Escherichia coli, outperforming previous network-based efforts. Additionally, our pipeline complements sequence-based sRNA-mRNA interaction prediction methods by adding a data-driven filtering step. Finally, we show the general
applicability of our approach by identifying 24 novel, experimentally supported, sRNA-mRNA interactions in Pseudomonas aeruginosa, Staphylococcus aureus, and Bacillus subtilis. Overall, our strategy generates novel insights into the functional context of sRNA regulation in multiple bacterial species.

Show Abstract

Trapping, gliding, vaulting: transport of semiflexible polymers in periodic post arrays

B. Chakrabarti, C. Gaillard, D. Saintillan

The transport of deformable particles through porous media underlies a wealth of applications ranging from filtration to oil recovery to the transport and spreading of biological agents. Using direct numerical simulations, we analyze the dynamics of semiflexible polymers under the influence of an imposed flow in a structured two-dimensional lattice serving as an idealization of a porous medium. This problem has received much attention in the limit of reptation and for long-chain polymer molecules such as DNA that are transported through micropost arrays for electrophoretic chromatographic separation. In contrast to long entropic molecules, the dynamics of elastic polymers results from a combination of scattering with the obstacles and flow-induced buckling instabilities. We identify three dominant modes of transport that involve trapping, gliding and vaulting of the polymers around the obstacles, and we reveal their essential features using tools from dynamical systems theory. The interplay of these scattering dynamics with transport and deformations in the imposed flow results in the long-time asymptotic dispersion of the center of mass, which we quantify in terms of a hydrodynamic dispersion tensor. We then discuss a simple yet efficient chromatographic device that exploits the competition between different modes of transport to sort filaments in a dilute suspension according to their lengths.

Show Abstract

Genome-wide landscape of RNA-binding protein dysregulation reveals a major impact on psychiatric disorder risk

C. Park, J Zhou, A. Wong, K. Chen, C Theesfeld, R Darnell, O. Troyanskaya

Despite the strong genetic basis of psychiatric disorders, the molecular origins of these diseases are still largely unmapped. RNA-binding proteins (RBPs) are responsible for most post-transcriptional regulation, from splicing to translational to localization. RBPs thus act as key gatekeepers of cellular homeostasis, especially in the brain. Here, we leverage a deep learning approach to interrogate variant effects genome-wide, and discover that the dysregulation of RBP target sites is a principal contributor to psychiatric disorder risk. We show that specific modes of RBP regulation are genetically linked to the heritability of psychiatric disorders, and demonstrate that diverse RBP regulatory functions are reflected in distinct genome-wide negative selection signatures. Notably, RBP dysregulation has a stronger impact on psychiatric disorders than common coding region variants and explains heritability not currently captured by large-scale molecular QTL studies (expression QTLs and splicing QTLs). We share genome-wide profiles of RBP target site dysregulation, which we used to identify DDHD2 as a candidate schizophrenia risk gene, in a public web server. This resource provides a novel analytical framework to connect the full range of RNA regulation to complex disease.

Show Abstract

Excess dNTPs Trigger Oscillatory Surface Flow in the Early Drosophila Embryo

S. Dutta, N. Djabrayan, C. Smits, C. Rowley, S. Shvartsman

During the first 2 hours of Drosophila development, precisely orchestrated nuclear cleavages, cytoskeletal rearrangements, and directed membrane growth lead to the formation of an epithelial sheet around the yolk. The newly formed epithelium remains relatively quiescent during the next hour as it is patterned by maternal inductive signals and zygotic gene products. We discovered that this mechanically quiet period is disrupted in embryos with high levels of dNTPs, which have been recently shown to cause abnormally fast nuclear cleavages and interfere with zygotic transcription. High levels of dNTPs are associated with robust onset of oscillatory two-dimensional flows during the third hour of development. Tissue cartography, particle image velocimetry, and dimensionality reduction techniques reveal that these oscillatory flows are low dimensional and are characterized by the presence of spiral vortices. We speculate that these aberrant flows emerge through an instability triggered by deregulated mechanical coupling between the nascent epithelium and three-dimensional yolk. These results highlight an unexplored connection between a core metabolic process and large-scale mechanics in a rapidly developing embryo.

Show Abstract

Microtubule re-organization during female meiosis in C. elegans

Ina Lantzsch, Che-Hang Yu, Hossein Yazdkhasti, Norbert Lindow, Erik Szentgyörgyi, Steffen Prohaska, Martin Srayko, S. Fürthauer, Stefanie Redmann

The female meiotic spindles of most animals are acentrosomal and undergo drastic morphological changes while transitioning from metaphase to anaphase. The ultra-structure of acentrosomal spindles, and how this enables such dramatic rearrangements remains largely unknown. To address this, we applied light microscopy, large-scale electron tomography and mathematical modeling of female meiotic C. elegans spindles undergoing the transition from metaphase to anaphase. Combining these approaches, we find that meiotic spindles are dynamic arrays of short microtubules that turn over on second time scales. The results show that the transition from metaphase to anaphase correlates with an increase in the number of microtubules and a decrease of their average length. To understand the mechanisms that drive this transition, we developed a mathematical model for the microtubule length distribution that considers microtubule growth, catastrophe, and severing. Using Bayesian inference to compare model predictions and data, we find that microtubule turn-over is the major driver of the observed large-scale reorganizations. Our data suggest that cutting of microtubules occurs, but that most microtubules are not severed before undergoing catastrophe.

Show Abstract

Better together: Elements of successful scientific software development in a distributed collaborative community

J. Koehler, B Weitzner, D. Renfrew, S Lewis, R Moretti, A Watkins, V. Mulligan, S Lyskov, J Adolf-Bryfogle, J Labonte, J Krys, Rosetta Commons Consortium, W Schief, D Gront, O Schueler-Furman, D Baker, J Gray, R Dunbrack, T Kortemme, A Leaver-Fay, C Strauss, J Meiler, B Kuhlman, J Gray , R. Bonneau

Many scientific disciplines rely on computational methods for data analysis, model generation, and prediction. Implementing these methods is often accomplished by researchers with domain expertise but without formal training in software engineering or computer science. This arrangement has led to underappreciation of sustainability and maintainability of scientific software tools developed in academic environments. Some software tools have avoided this fate, including the scientific library Rosetta. We use this software and its community as a case study to show how modern software development can be accomplished successfully, irrespective of subject area. Rosetta is one of the largest software suites for macromolecular modeling, with 3.1 million lines of code and many state-of-the-art applications. Since the mid 1990s, the software has been developed collaboratively by the RosettaCommons, a community of academics from over 60 institutions worldwide with diverse backgrounds including chemistry, biology, physiology, physics, engineering, mathematics, and computer science. Developing this software suite has provided us with more than two decades of experience in how to effectively develop advanced scientific software in a global community with hundreds of contributors. Here we illustrate the functioning of this development community by addressing technical aspects (like version control, testing, and maintenance), community-building strategies, diversity efforts, software dissemination, and user support. We demonstrate how modern computational research can thrive in a distributed collaborative community. The practices described here are independent of subject area and can be readily adopted by other software development communities.

Show Abstract

DeepArk: modeling cis-regulatory codes of model species with deep learning

E Cofer, J Raimundo, A Tadych, Y Yamazaki, A. Wong, C Theesfeld, M Levine, O. Troyanskaya

To enable large-scale analyses of regulatory logic in model species, we developed DeepArk (https://DeepArk.princeton.edu), a set of deep learning models of the cis-regulatory codes of four widelystudied species: Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, and Mus musculus. DeepArk accurately predicts the presence of thousands of different context-specific regulatory features, including chromatin states, histone marks, and transcription factors. In vivo studies show that DeepArk can predict the regulatory impact of any genomic variant (including rare or not previously observed), and enables the regulatory annotation of understudied model species.

Show Abstract
April 28, 2020
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates