Publications

Estimating the tails of the spectrum of the Hessian of the log-likelihood for \textit{ab-initio} single-particle reconstruction in electron cryomicroscopy

A. Rangan, W. S. Wai Shing, P. Cossio, et al.

Electron cryomicroscopy (cryo-EM) is a technique in structural biology used to reconstruct accurate volumetric maps of molecules. One step of the cryo-EM pipeline involves solving an inverse-problem. This inverse-problem, referred to as \textit{ab-initio} single-particle reconstruction, takes as input a collection of 2d-images -- each a projection of a molecule from an unknown viewing-angle -- and attempts to reconstruct the 3d-volume representing the underlying molecular density.
Most methods for solving this inverse-problem search for a solution which optimizes a posterior likelihood of generating the observed image-data, given the reconstructed volume. Within this framework, it is natural to study the Hessian of the log-likelihood: the eigenvectors and eigenvalues of the Hessian determine how the likelihood changes with respect to perturbations in the solution, and can give insight into the sensitivity of the solution to aspects of the input.
In this paper we describe a simple strategy for estimating the smallest eigenvalues and eigenvectors (i.e., the `softest modes') of the Hessian of the log-likelihood for the \textit{ab-initio} single-particle reconstruction problem. This strategy involves rewriting the log-likelihood as a 3d-integral. This interpretation holds in the low-noise limit, as well as in many practical scenarios which allow for noise-marginalization.
Once we have estimated the softest modes, we can use them to perform many kinds of sensitivity analysis. For example, we can determine which parts of the reconstructed volume are trustworthy, and which are unreliable, and how this unreliability might depend on the data-set and the imaging parameters. We believe that this kind of analysis can be used alongside more traditional strategies for sensitivity analysis, as well as in other applications, such as free-energy estimation.

Show Abstract

Computing whole embryo strain maps during gastrulation

David Denberg, Xiaoxuan Zhang, S. Shvartsman, et al.

Gastrulation is a critical process during embryonic development that transforms a single-layered blastula into a multilayered embryo with distinct germ layers, which eventually give rise to all the tissues and organs of the organism. Studies across species have uncovered the mechanisms underlying the building blocks of gastrulation movements, such as localized in-plane and out-of-plane epithelial deformations. The next challenge is to understand dynamics on the scale of the embryo: this requires quantifying strain tensors, which rigorously describe the differences between the deformed configurations taken on by local clusters of cells at time instants of observation and their reference configuration at an initial time. We present a systematic strategy for computing such tensors from the local dynamics of cell clusters, which are chosen across the embryo from several regions whose morphogenetic fate is central to viable gastrulation. As an application of our approach, we demonstrate a strategy of identifying distinct Drosophila morphological domains using strain tensors.

Show Abstract

Representational learning by optimization of neural manifolds in an olfactory memory network

Bo Hu, Nesibe Z. Temiz, C. Chou , Peter Rupprecht, Claire Meissner-Bernard, Benjamin Titze, S. Chung , Rainer W. Freidrich

Higher brain functions depend on experience-dependent representations of relevant information that may be organized by attractor dynamics or by geometrical modifications of continuous “neural manifolds”. To explore these scenarios we analyzed odor-evoked activity in telencephalic area pDp of juvenile and adult zebrafish, the homolog of piriform cortex. No obvious signatures of attractor dynamics were detected. Rather, olfactory discrimination training selectively enhanced the separation of neural manifolds representing task-relevant odors from other representations, consistent with predictions of autoassociative network models endowed with precise synaptic balance. Analytical approaches using the framework of manifold capacity revealed multiple geometrical modifications of representational manifolds that supported the classification of task-relevant sensory information. Manifold capacity predicted odor discrimination across individuals, indicating a close link between manifold geometry and behavior. Hence, pDp and possibly related recurrent networks store information in the geometry of representational manifolds, resulting in joint sensory and semantic maps that may support distributed learning processes.

Show Abstract

A robust and versatile computational peptide design pipeline to inform wet-lab experiments

V. Mulligan, Tristan Zaborniak , Benjamin P. Brown , D. Renfrew

Since Merrifield’s development of solid-phase peptide synthesis, we have seen explosive growth in the number of synthetic building-blocks that can be incorporated into peptides. This has created a problem: the number of possible molecules that could be synthesized is many orders of magnitude greater than the largest conceivable combinatorial libraries. Computational design, based on combinatorial optimization algorithms, addresses this problem by proposing sequences likely to have desired folds and functions. These computational methods complement experiments by reducing astronomically large numbers of combinatorial possibilities to experimentally tractable shortlists. This presentation describes our robust, versatile methods, made available to peptide scientists in the Rosetta and Masala software suites, for designing peptides that fold into rigid conformations. Our physics-based methods generalize to exotic chemical building blocks poorly amenable to machine learning-based methods for want of training data. Our pipeline has produced experimentally-validated mixed-chirality peptides that bind to targets of therapeutic interest, and peptides that diffuse across cell membranes. Ongoing research is mapping the sequence optimization problem (which grows intractable even for supercomputers as the number of candidate chemical building blocks grows very large) to current and near-future quantum computers, allowing use of quantum algorithms in the context of the existing, widely-used design protocols.

Show Abstract

Nuclear instance segmentation and tracking for preimplantation mouse embryos

H. Nunley , Binglun Shao, Prateek Grover, A. Watters, S. Shvartsman, L. M. Brown, et al.

For investigations into fate specification and morphogenesis in time-lapse images of preimplantation embryos, automated 3D instance segmentation and tracking of nuclei are invaluable. Low signal-to-noise ratio, high voxel anisotropy, high nuclear density, and variable nuclear shapes can limit the performance of segmentation methods, while tracking is complicated by cell divisions, low frame rates, and sample movements. Supervised machine learning approaches can radically improve segmentation accuracy and enable easier tracking, but they often require large amounts of annotated 3D data. Here, we first report a previously unreported mouse line expressing near-infrared nuclear reporter H2B-miRFP720. We then generate a dataset (termed BlastoSPIM) of 3D images of H2B-miRFP720-expressing embryos with ground truth for nuclear instances. Using BlastoSPIM, we benchmark seven convolutional neural networks and identify Stardist-3D as the most accurate instance segmentation method. With our BlastoSPIM-trained Stardist-3D models, we construct a complete pipeline for nuclear instance segmentation and lineage tracking from the eight-cell stage to the end of preimplantation development (>100 nuclei). Finally, we demonstrate the usefulness of BlastoSPIM as pre-train data for related problems, both for a different imaging modality and for different model systems.

Show Abstract

Opening the Black Box inside Grover’s Algorithm

M. Stoudenmire, Xavier Waintal

Grover’s algorithm is one of the primary algorithms offered as evidence that quantum computers can provide an advantage over classical computers. It involves an “oracle” (external quantum subroutine), which must be specified for a given application and whose internal structure is not part of the formal scaling of the quadratic quantum speedup guaranteed by the algorithm. Grover's algorithm also requires exponentially many calls to the quantum oracle (approximately √2𝑛 calls where n is the number of qubits) to succeed, raising the question of its implementation on both noisy and error-corrected quantum computers. In this work, we construct a quantum-inspired algorithm executable on a classical computer that performs Grover’s task in a linear number of calls to (simulations of) the oracle—an exponentially smaller number than Grover’s algorithm—and demonstrate this algorithm explicitly for Boolean satisfiability problems. The complexity of our algorithm depends on the cost to simulate the oracle once, which may or may not be exponential, depending on its internal structure. Indeed, Grover’s algorithm does not have an a priori quantum speedup as soon as one is given access to the “source code” of the oracle, which may reveal an internal structure of the problem. Our findings illustrate this point explicitly, as our algorithm exploits the structure of the quantum circuit used to program the quantum computer to speed up the search. There are still problems where Grover’s algorithm would provide an asymptotic speedup if it could be run accurately for large enough sizes. Our quantum-inspired algorithm provides lower bounds, in terms of the quantum-circuit complexity, for the quantum hardware to beat classical approaches for these problems. These estimates, combined with the unfavorable scaling of the success probability of Grover’s algorithm, which in the presence of noise decays as the exponential of the exponential of the number of qubits, makes a practical speedup unrealistic even under extremely optimistic assumptions of the evolution of both hardware quality and availability.

Show Abstract

Dynamic allostery drives autocrine and paracrine TGF-β signaling

Mingliang Jin, Robert I. Seed, P. Cossio, et al.

TGF-β, essential for development and immunity, is expressed as a latent complex (L-TGF-β) non-covalently associated with its prodomain and presented on immune cell surfaces by covalent association with GARP. Binding to integrin αvβ8 activates L-TGF-β1/GARP. The dogma is that mature TGF-β must physically dissociate from L-TGF-β1 for signaling to occur. Our previous studies discovered that αvβ8-mediated TGF-β autocrine signaling can occur without TGF-β1 release from its latent form. Here, we show that mice engineered to express TGF-β1 that cannot release from L-TGF-β1 survive without early lethal tissue inflammation, unlike those with TGF-β1 deficiency. Combining cryogenic electron microscopy with cell-based assays, we reveal a dynamic allosteric mechanism of autocrine TGF-β1 signaling without release where αvβ8 binding redistributes the intrinsic flexibility of L-TGF-β1 to expose TGF-β1 to its receptors. Dynamic allostery explains the TGF-β3 latency/activation mechanism and why TGF-β3 functions distinctly from TGF-β1, suggesting that it broadly applies to other flexible cell surface receptor/ligand systems.

Show Abstract

Video prediction using score-based conditional density estimation

P. Fiquet, E. P. Simoncelli

Temporal prediction is inherently uncertain, but representing the ambiguity in nat- ural image sequences is a challenging high-dimensional probabilistic inference problem. For natural scenes, the curse of dimensionality renders explicit den- sity estimation statistically and computationally intractable. Here, we describe an implicit regression-based framework for learning and sampling the conditional density of the next frame in a video given previous observed frames. We show that sequence-to-image deep networks trained on a simple resilience-to-noise objective function extract adaptive representations for temporal prediction. Synthetic exper- iments demonstrate that this score-based framework can handle occlusion bound- aries: unlike classical methods that average over bifurcating temporal trajectories, it chooses among likely trajectories, selecting more probable options with higher frequency. Furthermore, analysis of networks trained on natural image sequences reveals that the representation automatically weights predictive evidence by its reliability, which is a hallmark of statistical inference.

Show Abstract

CryoLike: A python package for cryo-electron microscopy image-to-structure likelihood calculations

W. S. Wai Shing, J. Soules, A. Rangan, P. Cossio

Extracting conformational heterogeneity from cryo-electron microscopy (cryo-EM) images is particularly challenging for flexible biomolecules, where traditional 3D classification approaches often fail. Over the past few decades, advancements in experimental and computational techniques have been made to tackle this challenge, especially Bayesian-based approaches that provide physically interpretable insights into cryo-EM heterogeneity. To reduce the computational cost for Bayesian approaches, we introduce CryoLike, a computationally efficient algorithm for evaluating image-to-structure (or image-to-volume) likelihoods across large image datasets, which is built on Fourier-Bessel representations of the images and packaged in a user-friendly Python workflow.

Show Abstract

Multimodal Learning for Embryo Viability Prediction in Clinical IVF

Junsik Kim, Zhiyi Shi, D. Needleman

In clinical In-Vitro Fertilization (IVF), identifying the most viable embryo for transfer is important to increasing the likelihood of a successful pregnancy. Traditionally, this process involves embryologists manually assessing embryos’ static morphological features at specific intervals using light microscopy. This manual evaluation is not only time-intensive and costly, due to the need for expert analysis, but also inherently subjective, leading to variability in the selection process. To address these challenges, we develop a multimodal model that leverages both time-lapse video data and Electronic Health Records (EHRs) to predict embryo viability. A key challenge of our research is to effectively combine time-lapse video and EHR data, given their distinct modality characteristic. We comprehensively analyze our multimodal model with various modality inputs and integration approaches. Our approach will enable fast and automated embryo viability predictions in scale for clinical IVF.

Show Abstract