linkedin reddit search_black sharethis
619 Publications

An automated framework for efficiently designing deep convolutional neural networks in genomics

Z. Zhang, C. Park, C. Theesfeld, O. Troyanskaya

Convolutional neural networks (CNNs) have become a standard for analysis of biological sequences. Tuning of network architectures is essential for CNN’s performance, yet it requires substantial knowledge of machine learning and commitment of time and effort. This process thus imposes a major barrier to broad and effective application of modern deep learning in genomics. Here, we present AMBER, a fully automated framework to efficiently design and apply CNNs for genomic sequences. AMBER designs optimal models for user-specified biological questions through the state-of-the-art Neural Architecture Search (NAS). We applied AMBER to the task of modelling genomic regulatory features and demonstrated that the predictions of the AMBER-designed model are significantly more accurate than the equivalent baseline non-NAS models and match or even exceed published expert-designed models. Interpretation of AMBER architecture search revealed its design principles of utilizing the full space of computational operations for accurately modelling genomic sequences. Furthermore, we illustrated the use of AMBER to accurately discover functional genomic variants in allele-specific binding and disease heritability enrichment. AMBER provides an efficient automated method for designing accurate deep learning models in genomics.

Show Abstract
August 19, 2020

Microbe-metabolite associations linked to the rebounding murine gut microbiome post-colonization with vancomycin resistant Enterococcus faecium

A. Mu, G. Carter, L. Li, N. Isles, A. Vrbanac, J. Morton, A. Jarmusch, D. De Souza, V. Narayana, K. Kanojia, B. Nijagal, M. McConville, R. Knight, B. Howden, T. Stinear

Vancomycin-resistant Enterococcus faecium (VREfm) is an emerging antibiotic-resistant pathogen. Strain-level investigations are beginning to reveal the molecular mechanisms used by VREfm to colonize regions of the human bowel. However, the role of commensal bacteria during VREfm colonization, in particular following antibiotic treatment, remains largely unknown. We employed amplicon 16S rRNA gene sequencing and metabolomics in a murine model system to try and investigate functional roles of the gut microbiome during VREfm colonization. First-order taxonomic shifts between Bacteroidetes and Tenericutes within the gut microbial community composition were detected both in response to pretreatment using ceftriaxone and to subsequent VREfm challenge. Using neural networking approaches to find cooccurrence profiles of bacteria and metabolites, we detected key metabolome features associated with butyric acid during and after VREfm colonization. These metabolite features were associated with Bacteroides, indicative of a transition toward a preantibiotic naive microbiome. This study shows the impacts of antibiotics on the gut ecosystem and the progression of the microbiome in response to colonization with VREfm. Our results offer insights toward identifying potential nonantibiotic alternatives to eliminate VREfm through metabolic reengineering to preferentially select for Bacteroides.

Show Abstract
August 18, 2020

Alternative Activation of Macrophages Is Accompanied by Chromatin Remodeling Associated with Lineage-Dependent DNA Shape Features Flanking PU.1 Motifs

M Tang, E Miraldi, N Girgis, R. Bonneau, P Loke

IL-4 activates macrophages to adopt distinct phenotypes associated with clearance of helminth infections and tissue repair, but the phenotype depends on the cellular lineage of these macrophages. The molecular basis of chromatin remodeling in response to IL-4 stimulation in tissue-resident and monocyte-derived macrophages is not understood. In this study, we find that IL-4 activation of different lineages of peritoneal macrophages in mice is accompanied by lineage-specific chromatin remodeling in regions enriched with binding motifs of the pioneer transcription factor PU.1. PU.1 motif is similarly associated with both tissue-resident and monocyte-derived IL-4-induced accessible regions but has different lineage-specific DNA shape features and predicted cofactors. Mutation studies based on natural genetic variation between C57BL/6 and BALB/c mouse strains indicate that accessibility of these IL-4-induced regions can be regulated through differences in DNA shape without direct disruption of PU.1 motifs. We propose a model whereby DNA shape features of stimulation-dependent genomic elements contribute to differences in the accessible chromatin landscape of alternatively activated macrophages on different genetic backgrounds that may contribute to phenotypic variations in immune responses.

Show Abstract

CRISPR-Decryptr reveals cis-regulatory elements from noncoding perturbation screens

A. Rasmussen, T. Äijö, M. Gabitto, N. Carriero, N. Sanjana, J. Skok, R. Bonneau

Clustered Regularly Interspace Short Palindromic Repeats (CRISPR)-Cas9 genome editing methods provide the tools necessary to examine phenotypic impacts of targeted perturbations in high-throughput screens. While these technologies have the potential to reveal functional elements with direct therapeutic applications, statistical techniques to analyze noncoding screen data remain limited. We present CRISPR-Decryptr, a computational tool for the analysis of CRISPR noncoding screens. Our method leverages experimental design: accounting for multiple conditions, controls, and replicates to infer the regulatory landscape of noncoding genomic regions. We validate our method on a variety of mutagenesis, CRISPR activation, and CRISPR interference screens, extracting new insights from previously published data.

Show Abstract
August 14, 2020

Presenilin 1 phosphorylation regulates amyloid-β degradation by microglia

J Ledo, T Liebman, R Zhang, C Chang, E Azevedo, E Wong, H Silva, O. Troyanskaya, V Bustos, P Greengard

Amyloid-β peptide (Aβ) accumulation in the brain is a hallmark of Alzheimer’s Disease. An important mechanism of Aβ clearance in the brain is uptake and degradation by microglia. Presenilin 1 (PS1) is the catalytic subunit of γ-secretase, an enzyme complex responsible for the maturation of multiple substrates, such as Aβ. Although PS1 has been extensively studied in neurons, the role of PS1 in microglia is incompletely understood. Here we report that microglia containing phospho-deficient mutant PS1 display a slower kinetic response to micro injury in the brain in vivo and the inability to degrade Aβ oligomers due to a phagolysosome dysfunction. An Alzheimer’s mouse model containing phospho-deficient PS1 show severe Aβ accumulation in microglia as well as the postsynaptic protein PSD95. Our results demonstrate a novel mechanism by which PS1 modulates microglial function and contributes to Alzheimer’s -associated phenotypes.

Show Abstract
August 13, 2020

Abstract 2504: Modeling molecular development of breast cancer in canine mammary tumors

K. Graim, D Gorenshteyn, D Robinson, N. Carriero, J Cahill, R Chakrabarti, M Goldschmidt, A Durham, J. Funk, J Storey, V Kristensen, C Theesfeld, K Sorenmo, O. Troyanskaya

Malignancy in cancer is a consequence of the progressive accumulation of mutations in a tumor, with profound implications for drug selection and treatment. However, in human studies, inter-patient variability obscures molecular signatures of tumor progression because patients usually present with a single mammary tumor. In contrast, dogs frequently exhibit multiple naturally occurring mammary tumors in the same individual. Moreover, canine mammary tumors (CMTs) and human breast cancer have similar histopathological profiles and clinical presentation. We leverage the CMT model to elucidate genome-wide molecular changes clinically relevant in human breast cancer, focusing on signals underlying tumor development. We develop a robust, generally applicable, computational analysis framework (FREYA) for analysis of CMTs for comparative oncology. Using FREYA, we RNA profile 89 samples from 16 dogs, and demonstrate that CMTs recapitulate human breast cancer subtypes. We then extract molecular profiles of breast cancer progression at three distinct stages (normal, pre-malignant and malignant) and identify signatures of gene expression reflective of tumor progression. Focusing on the transitions to malignancy, we identify transcriptional patterns and biological pathways specific to malignant tumors and distinct from those characterizing pre-malignant tumors or normal tissue. We find that human breast cancer patients whose tumors exhibit strong CMT malignancy signatures have significantly decreased survival, indicative of the importance of the tumor progression processes identified in CMTs to human breast cancer prognosis. Altogether, our comprehensive genomic characterization demonstrates that CMTs are a powerful translational model of breast cancer, providing insights that inform our understanding of tumor development in humans. To catalyze and support similar analyses and use of the CMT model by other biomedical researchers, we publicly share all of our data and provide FREYA, a robust data processing pipeline and statistical analyses framework, at freya.flatironinstitute.org.

Show Abstract

NetQuilt: Deep Multispecies Network-based Protein Function Prediction using Homology-informed Network Similarity

M. Barot, V. Gligorijevic, K. Cho, R. Bonneau

Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to proteome and biological network functional annotation use sequence similarity to transfer knowledge between species. These similarity-based approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular or organismal context for meaningful function prediction. In order to supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, the majority of these methods are tied to a network for a single species, and many species lack biological networks. In this work, we integrate sequence and network information across multiple species by applying an IsoRank-derived network alignment algorithm to create a meta-network profile of the proteins of multiple species. We then use this integrated multispecies meta-network as input features to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and more diverse examples from multiple organisms, and consequently leads to significant improvements in function prediction performance. Further, we evaluate our approach in a setting in which an organism’s PPI network is left out, using other organisms’ network information and sequence homology in order to make predictions for the left-out organism, to simulate cases in which a newly sequenced species has no network information available.

Show Abstract

Optogenetic Rescue of a Patterning Mutant

H Johnson, N Djabrayan, S. Shvartsman, J Toettcher

Animal embryos are patterned by a handful of highly conserved inductive signals. Yet, in most cases, it is unknown which pattern features (i.e., spatial gradients or temporal dynamics) are required to support normal development. An ideal experiment to address this question would be to “paint” arbitrary synthetic signaling patterns on “blank canvas” embryos to dissect their requirements. Here, we demonstrate exactly this capability by combining optogenetic control of Ras/extracellular signal-related kinase (ERK) signaling with the genetic loss of the receptor tyrosine-kinase-driven terminal signaling patterning in early Drosophila embryos. Blue-light illumination at the embryonic termini for 90 min was sufficient to rescue normal development, generating viable larvae and fertile adults from an otherwise lethal terminal signaling mutant. Optogenetic rescue was possible even using a simple, all-or-none light input that reduced the gradient of Erk activity and eliminated spatiotemporal differences in terminal gap gene expression. Systematically varying illumination parameters further revealed that at least three distinct developmental programs are triggered at different signaling thresholds and that the morphogenetic movements of gastrulation are robust to a 3-fold variation in the posterior pattern width. These results open the door to controlling tissue organization with simple optical stimuli, providing new tools to probe natural developmental processes, create synthetic tissues with defined organization, or directly correct the patterning errors that underlie developmental defects.

Show Abstract

Scaling law of Brownian rotation in dense hard-rod suspensions

S. Chen, W. Yan, T. Gao

Self-diffusion in dense rod suspensions are subject to strong geometric constraints because of steric interactions. This topological effect is essentially anisotropic when rods are nematically aligned with their neighbors, raising considerable challenges in understanding and analyzing their impacts on the bulk physical properties. Via a classical Doi-Onsager kinetic model with the Maier-Saupe potential, we characterize the long-time rotational Brownian diffusivity for dense suspensions of hard rods of finite aspect ratios, based on quadratic orientation autocorrelation functions. Furthermore, we show that the computed nonmonotonic scalings of the diffusivity as a function of volume fraction can be accurately predicted by an alternative tube model in the nematic phase.

Show Abstract

Evaluating the Simple Arrhenius Equation for the Temperature Dependence of Complex Developmental Processes

J. Crapse, N. Pappireddi, M. Gupta, S. Shvartsman, E. Wieschaus, M. Wühr

The famous Arrhenius equation is well motivated to describe the temperature dependence of chemical reactions but has also been used for complicated biological processes. Here, we evaluate how well the simple Arrhenius equation predicts complex multistep biological processes, using frog and fruit fly embryogenesis as two canonical models. We find the Arrhenius equation provides a good approximation for the temperature dependence of embryogenesis, even though individual developmental stages scale differently with temperature. At low and high temperatures, however, we observed significant departures from idealized Arrhenius Law behavior. When we model multistep reactions of idealized chemical networks we are unable to generate comparable deviations from linearity. In contrast, we find the single enzyme GAPDH shows non-linearity in the Arrhenius plot similar to our observations of embryonic development. Thus, we find that complex embryonic development can be well approximated by the simple Arrhenius Law and propose that the observed departure from this law results primarily from non-idealized individual steps rather than the complexity of the system.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates