linkedin reddit search_black sharethis
619 Publications

ChIP-BIT2: a software tool to detect weak binding events using a Bayesian integration approach

X. Chen, A. Neuwald, L. Hilakivi-Clarke, R. Clarke, J. Xuan

Background
ChIP-seq combines chromatin immunoprecipitation assays with sequencing and identifies genome-wide binding sites for DNA binding proteins. While many binding sites have strong ChIP-seq ‘peak’ observations and are well captured, there are still regions bound by proteins weakly, with a relatively low ChIP-seq signal enrichment. These weak binding sites, especially those at promoters and enhancers, are functionally important because they also regulate nearby gene expression. Yet, it remains a challenge to accurately identify weak binding sites in ChIP-seq data due to the ambiguity in differentiating these weak binding sites from the amplified background DNAs.

Results
ChIP-BIT2 (http://sourceforge.net/projects/chipbitc/) is a software package for ChIP-seq peak detection. ChIP-BIT2 employs a mixture model integrating protein and control ChIP-seq data and predicts strong or weak protein binding sites at promoters, enhancers, or other genomic locations. For binding sites at gene promoters, ChIP-BIT2 simultaneously predicts their target genes. ChIP-BIT2 has been validated on benchmark regions and tested using large-scale ENCODE ChIP-seq data, demonstrating its high accuracy and wide applicability.

Conclusion
ChIP-BIT2 is an efficient ChIP-seq peak caller. It provides a better lens to examine weak binding sites and can refine or extend the existing binding site collection, providing additional regulatory regions for decoding the mechanism of gene expression regulation.

Show Abstract
April 15, 2021

Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks

J. Koehler, S. Lyskov, S. Lewis, J. Adolf-Bryfogle, R. Alford, K. Barlow, Z. Ben-Aharon, D. Farrell , J. Fell, W. Hansen, A. Harmalkar, J. Jeliazkov, G. Kuenze, J. Krys, A. Ljubetič, A. Loshbaugh, J. Maguire, R. Moretti, V. Mulligan, R. Bonneau, et al

Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.

Show Abstract

Constrained non-negative matrix factorization enabling real-time insights of in situ and high-throughput experiments

M. Maffettone, A. Daly, D. Olds

Non-negative Matrix Factorization (NMF) methods offer an appealing unsupervised learning method for real-time analysis of streaming spectral data in time-sensitive data collection, such as in situ characterization of materials. However, canonical NMF methods are optimized to reconstruct a full dataset as closely as possible, with no underlying requirement that the reconstruction produces components or weights representative of the true physical processes. In this work, we demonstrate how constraining NMF weights or components, provided as known or assumed priors, can provide significant improvement in revealing true underlying phenomena. We present a PyTorch based method for efficiently applying constrained NMF and demonstrate this on several synthetic examples. When applied to streaming experimentally measured spectral data, an expert researcher-in-the-loop can provide and dynamically adjust the constraints. This set of interactive priors to the NMF model can, for example, contain known or identified independent components, as well as functional expectations about the mixing of components. We demonstrate this application on measured X-ray diffraction and pair distribution function data from in situ beamline experiments. Details of the method are described, and general guidance provided to employ constrained NMF in extraction of critical information and insights during in situ and high-throughput experiments.

Show Abstract
April 2, 2021

Comparison of explicit and mean-field models of cytoskeletal filaments with crosslinking motors

A. Lamson, J. Moore, F. Fang, M. Glaser, M. Shelley, M. Betterton

In cells, cytoskeletal filament networks are responsible for cell movement, growth, and division. Filaments in the cytoskeleton are driven and organized by crosslinking molecular motors. In reconstituted cytoskeletal systems, motor activity is responsible for far-from-equilibrium phenomena such as active stress, self-organized flow, and spontaneous nematic defect generation. How microscopic interactions between motors and filaments lead to larger-scale dynamics remains incompletely understood. To build from motor–filament interactions to predict bulk behavior of cytoskeletal systems, more computationally efficient techniques for modeling motor–filament interactions are needed. Here, we derive a coarse-graining hierarchy of explicit and continuum models for crosslinking motors that bind to and walk on filament pairs. We compare the steady-state motor distribution and motor-induced filament motion for the different models and analyze their computational cost. All three models agree well in the limit of fast motor binding kinetics. Evolving a truncated moment expansion of motor density speeds the computation by 103–106 compared to the explicit or continuous-density simulations, suggesting an approach for more efficient simulation of large networks. These tools facilitate further study of motor–filament networks on micrometer to millimeter length scales.

Show Abstract

The many behaviors of deformable active droplets

Y-N. Young, M. Shelley, D. Stein

Active fluids consume fuel at the microscopic scale, converting this energy into forces that can drive macroscopic motions over scales far larger than their microscopic constituents. In some cases, the mechanisms that give rise to this phenomenon have been well characterized, and can explain experimentally observed behaviors in both bulk fluids and those confined in simple stationary geometries. More recently, active fluids have been encapsulated in viscous drops or elastic shells so as to interact with an outer environment or a deformable boundary. Such systems are not as well understood. In this work, we examine the behavior of droplets of an active nematic fluid. We study their linear stability about the isotropic equilibrium over a wide range of parameters, identifying regions in which different modes of instability dominate. Simulations of their full dynamics are used to identify their nonlinear behavior within each region. When a single mode dominates, the droplets behave simply: as rotors, swimmers, or extensors. When parameters are tuned so that multiple modes have nearly the same growth rate, a pantheon of modes appears, including zigzaggers, washing machines, wanderers, and pulsators.

Show Abstract

Coupled oscillators coordinate collective germline growth

C. Doherty, R. Diegmiller, M. Kapasiawala, E. Gavis, S. Shvartsman

Developing oocytes need large supplies of macromolecules and organelles. A conserved strategy for accumulating these products is to pool resources of oocyte-associated germline nurse cells. In Drosophila, these cells grow more than 100-fold to boost their biosynthetic capacity. No previously known mechanism explains how nurse cells coordinate growth collectively. Here, we report a cell cycle-regulating mechanism that depends on bidirectional communication between the oocyte and nurse cells, revealing the oocyte as a critical regulator of germline cyst growth. Transcripts encoding the cyclin-dependent kinase inhibitor, Dacapo, are synthesized by the nurse cells and actively localized to the oocyte. Retrograde movement of the oocyte-synthesized Dacapo protein to the nurse cells generates a network of coupled oscillators that controls the cell cycle of the nurse cells to regulate cyst growth. We propose that bidirectional nurse cell-oocyte communication establishes a growth-sensing feedback mechanism that regulates the quantity of maternal resources loaded into the oocyte.

Show Abstract
Developmental Cell, 56(6): 860-870.e8
March 22, 2021

EMPress Enables Tree-Guided, Interactive, and Exploratory Analyses of Multi-omic Data Sets

K. Cantrell, M. Fedarko, G. Rahman, ..., J. Morton, et al

Standard workflows for analyzing microbiomes often include the creation and curation of phylogenetic trees. Here we present EMPress, an interactive web tool for visualizing trees in the context of microbiome, metabolome, and other community data scalable to trees with well over 500,000 nodes. EMPress provides novel functionality—including ordination integration and animations—alongside many standard tree visualization features and thus simplifies exploratory analyses of many forms of ‘omic data.

Show Abstract

Bacterial activity hinders particle sedimentation

J. Singh, A. Patteson, B. Maldonado, P. Purohit, P. Arratia

Sedimentation in active fluids has come into focus due to the ubiquity of swimming micro-organisms in natural and industrial processes. Here, we investigate sedimentation dynamics of passive particles in a fluid as a function of bacteria E. coli concentration. Results show that the presence of swimming bacteria significantly reduces the speed of the sedimentation front even in the dilute regime, in which the sedimentation speed is expected to be independent of particle concentration. Furthermore, bacteria increase the dispersion of the passive particles, which determines the width of the sedimentation front. For short times, particle sedimentation speed has a linear dependence on bacterial concentration. Mean square displacement data shows, however, that bacterial activity decays over long experimental (sedimentation) times. An advection-diffusion equation coupled to bacteria population dynamics seems to capture concentration profiles relatively well. A single parameter, the ratio of single particle speed to the bacteria flow speed can be used to predict front sedimentation speed.

Show Abstract
March 15, 2021

An automated framework for efficiently designing deep convolutional neural networks in genomics

Z. Zhang, C. Park, C. Theesfeld , O. Troyanskaya

Convolutional neural networks (CNNs) have become a standard for analysis of biological sequences. Tuning of network architectures is essential for a CNN’s performance, yet it requires substantial knowledge of machine learning and commitment of time and effort. This process thus imposes a major barrier to broad and effective application of modern deep learning in genomics. Here we present Automated Modelling for Biological Evidence-based Research (AMBER), a fully automated framework to efficiently design and apply CNNs for genomic sequences. AMBER designs optimal models for user-specified biological questions through the state-of-the-art neural architecture search (NAS). We applied AMBER to the task of modelling genomic regulatory features and demonstrated that the predictions of the AMBER-designed model are significantly more accurate than the equivalent baseline non-NAS models and match or even exceed published expert-designed models. Interpretation of AMBER architecture search revealed its design principles of utilizing the full space of computational operations for accurately modelling genomic sequences. Furthermore, we illustrated the use of AMBER to accurately discover functional genomic variants in allele-specific binding and disease heritability enrichment. AMBER provides an efficient automated method for designing accurate deep learning models in genomics.

Show Abstract

AMBIENT: Accelerated Convolutional Neural Network Architecture Search for Regulatory Genomics

Z. Zhang, E. Cofer, O. Troyanskaya

Convolutional neural networks (CNN) have become a standard approach for modeling genomic sequences. CNNs can be effectively built by Neural Architecture Search (NAS) by trading computing power for accurate neural architectures. Yet, the consumption of immense computing power is a major practical, financial, and environmental issue for deep learning. Here, we present a novel NAS framework,
AMBIENT, that generates highly accurate CNN architectures for biological sequences of diverse functions, while substantially reducing the computing cost of conventional NAS.

Show Abstract
February 27, 2021
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates