linkedin reddit search_black sharethis
619 Publications

Disrupted developmental signaling induces novel transcriptional states

Aleena Patel, Vanessa Gonzalez, S. Shvartsman

Signaling pathways induce stereotyped transcriptional changes as stem cells progress into mature cell types during embryogenesis. Signaling perturbations are necessary to discover which genes are responsive or insensitive to pathway activity. However, gene regulation is additionally dependent on cell state-specific factors like chromatin modifications or transcription factor binding. Thus, transcriptional profiles need to be assayed in single cells to identify potentially multiple, distinct perturbation responses among heterogeneous cell states in an embryo. In perturbation studies, comparing heterogeneous transcriptional states among experimental conditions often requires samples to be collected over multiple independent experiments. Datasets produced in such complex experimental designs can be confounded by batch effects. We present Design-Aware Integration of Single Cell ExpEriments (DAISEE), a new algorithm that models perturbation responses in single-cell datasets with a complex experimental design. We demonstrate that DAISEE improves upon a previously available integrative non-negative matrix factorization framework, more efficiently separating perturbation responses from confounding variation. We use DAISEE to integrate newly collected single-cell RNA-sequencing datasets from 5-hour old zebrafish embryos expressing optimized photoswitchable MEK (psMEK), which globally activates the extracellular signal-regulated kinase (ERK), a signaling molecule involved in many cell specification events. psMEK drives some cells that are normally not exposed to ERK signals towards other wild type states and induces novel states expressing a mixture of transcriptional programs, including precociously activated endothelial genes. ERK signaling is therefore capable of introducing profoundly new gene expression states in developing embryos.

Show Abstract
September 6, 2024

OpenRAND: A performance portable, reproducible random number generation library for parallel computations

Shihab Shahriar Khan, Bryce Palmer, C. Edelmaier, Hasan Metin Aktulga

We introduce OpenRAND, a C++17 library aimed at facilitating reproducible scientific research by generating statistically robust yet replicable random numbers in as little as two lines of code, overcoming some of the unnecessary complexities of existing RNG libraries. OpenRAND accommodates single and multi-threaded applications on CPUs and GPUs and offers a simplified, user-friendly API that complies with the C++ standard’s random number engine interface. It is lightweight; provided as a portable, header-only library. It is statistically robust: a suite of built-in tests ensures no pattern exists within single or multiple streams. Despite its simplicity and portability, it remains performant—matching and sometimes outperforming native libraries. Our tests, including a Brownian walk simulation, affirm its reproducibility and ease-of-use while highlight its computational efficiency, outperforming CUDA’s cuRAND by up to 1.8 times.

Show Abstract
September 1, 2024

Emergence of lobed wakes during the sedimentation of spheres in viscoelastic fluids

S. Varchanis, Eliane Younes

The motion of rigid particles in complex fluids is ubiquitous in natural and industrial processes. The most popular toy model for understanding the physics of such systems is the settling of a solid sphere in a viscoelastic fluid. There is general agreement that an elastic wake develops downstream of the sphere, causing the breakage of fore-and-aft symmetry, while the flow remains axisymmetric, independent of fluid viscoelasticity and flow conditions. Using a continuum mechanics model, we reveal that axisymmetry holds only for weak viscoelastic flows. Beyond a critical value of the settling velocity, steady, non-axisymmetric disturbances develop peripherally of the rear pole of the sphere, giving rise to a four-lobed fingering instability. The transition from axisymmetric to non-axisymmetric flow fields is characterized by a regular bifurcation and depends solely on the interplay between shear and extensional properties of the viscoelastic fluid under different flow regimes. At higher settling velocities, each lobe tip is split into two new lobes, resembling fractal fingering in interfacial flows. For the first time, we capture an elastic fingering instability under steady-state conditions, and provide the missing information for understanding and predicting such instabilities in the response of viscoelastic fluids and soft media.

Show Abstract

Flow of wormlike micellar solutions over concavities

Fabian Hillebrand, S. Varchanis, Cameron C. Hopkins, et al.

We present a comprehensive investigation combining numerical simulations with experimental validation, focusing on the creeping flow behavior of a shear-banding, viscoelastic wormlike micellar (WLM) solution over concavities with various depths (D) and lengths (L). The fluid is modeled using the diffusive Giesekus model, with model parameters set to quantitatively describe the shear rheology of a 100 : 60 mM cetylpyridinium chloride:sodium salicylate aqueous WLM solution used for the experimental validation. We observe a transition from “cavity flow” to “expansion–contraction flow” as the length L exceeds the sum of depth D and channel width W. This transition is manifested by a change of vortical structures within the concavity. For L ≤ D + W, “cavity flow” is characterized by large scale recirculations spanning the concavity length. For L > D + W, the recirculations observed in “expansion–contraction flow” are confined to the salient corners downstream of the expansion plane and upstream of the contraction plane. Using the numerical dataset, we construct phase diagrams in L–D at various fixed Weissenberg numbers Wi, characterizing the transitions and describing the evolution of vortical structures influenced by viscoelastic effects.

Show Abstract

Decomposition of phenotypic heterogeneity in autism reveals distinct and coherent genetic programs

Aviya Litman, N. Sauerwald, C. Park, Y. Hao, O. Troyanskaya, et al.

Unraveling the phenotypic and genetic complexity of autism is extremely challenging yet critical for understanding the biology, inheritance, trajectory, and clinical manifestations of the many forms of the condition. Here, we leveraged broad phenotypic data from a large cohort with matched genetics to characterize classes of autism and their patterns of core, associated, and co-occurring traits, ultimately demonstrating that phenotypic patterns are associated with distinct genetic and molecular programs. We used a generative mixture modeling approach to identify robust, clinically-relevant classes of autism which we validate and replicate in a large independent cohort. We link the phenotypic findings to distinct patterns of de novo and inherited variation which emerge from the deconvolution of these genetic signals, and demonstrate that class-specific common variant scores strongly align with clinical outcomes. We further provide insights into the distinct biological pathways and processes disrupted by the sets of mutations in each class. Remarkably, we discover class-specific differences in the developmental timing of genes that are dysregulated, and these temporal patterns correspond to clinical milestone and outcome differences between the classes. These analyses embrace the phenotypic complexity of children with autism, unraveling genetic and molecular programs underlying their heterogeneity and suggesting specific biological dysregulation patterns and mechanistic hypotheses.

Show Abstract
August 16, 2024

CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EM

Minkyu Jeon, M. Astore, S. Hanson, P. Cossio, et al.

Cryo-electron microscopy (cryo-EM) is a powerful technique for determining high-resolution 3D biomolecular structures from imaging data. As this technique can capture dynamic biomolecular complexes, 3D reconstruction methods are increasingly being developed to resolve this intrinsic structural heterogeneity. However, the absence of standardized benchmarks with ground truth structures and validation metrics limits the advancement of the field. Here, we propose CryoBench, a suite of datasets, metrics, and performance benchmarks for heterogeneous reconstruction in cryo-EM. We propose five datasets representing different sources of heterogeneity and degrees of difficulty. These include conformational heterogeneity generated from simple motions and random configurations of antibody complexes and from tens of thousands of structures sampled from a molecular dynamics simulation. We also design datasets containing compositional heterogeneity from mixtures of ribosome assembly states and 100 common complexes present in cells. We then perform a comprehensive analysis of state-of-the-art heterogeneous reconstruction tools including neural and non-neural methods and their sensitivity to noise, and propose new metrics for quantitative comparison of methods. We hope that this benchmark will be a foundational resource for analyzing existing methods and new algorithmic development in both the cryo-EM and machine learning communities.

Show Abstract

Unraveling the Molecular Complexity of N-Terminus Huntingtin Oligomers: Insights into Polymorphic Structures

Neha Nanajkar, A. Sahoo, Silvina Matysiak

Huntington’s disease (HD) is a fatal neurodegenerative disorder resulting from an abnormal expansion of polyglutamine (polyQ) repeats in the N-terminus of the huntingtin protein. When the polyQ tract surpasses 35 repeats, the mutated protein undergoes misfolding, culminating in the formation of intracellular aggregates. Research in mouse models suggests that HD pathogenesis involves the aggregation of N-terminal fragments of the huntingtin protein (htt). These early oligomeric assemblies of htt, exhibiting diverse characteristics during aggregation, are implicated as potential toxic entities in HD. However, a consensus on their specific structures remains elusive. Understanding the heterogeneous nature of htt oligomers provides crucial insights into disease mechanisms, emphasizing the need to identify various oligomeric conformations as potential therapeutic targets. Employing coarse-grained molecular dynamics, our study aims to elucidate the mechanisms governing the aggregation process and resultant aggregate architectures of htt. The polyQ tract within htt is flanked by two regions: an N-terminal domain (N17) and a short C-terminal proline-rich segment. We conducted self-assembly simulations involving five distinct N17 + polyQ systems with polyQ lengths ranging from 7 to 45, utilizing the ProMPT force field. Prolongation of the polyQ domain correlates with an increase in β-sheet-rich structures. Longer polyQ lengths favor intramolecular β-sheets over intermolecular interactions due to the folding of the elongated polyQ domain into hairpin-rich conformations. Importantly, variations in polyQ length significantly influence resulting oligomeric structures. Shorter polyQ domains lead to N17 domain aggregation, forming a hydrophobic core, while longer polyQ lengths introduce a competition between N17 hydrophobic interactions and polyQ polar interactions, resulting in densely packed polyQ cores with outwardly distributed N17 domains. Additionally, at extended polyQ lengths, we observe distinct oligomeric conformations with varying degrees of N17 bundling. These findings can help explain the toxic gain-of-function that htt with expanded polyQ acquires.

Show Abstract

Computational tools for cellular scale biophysics

Mathematical models are indispensable for disentangling the interactions through which biological components work together to generate the forces and flows that position, mix, and distribute proteins, nutrients, and organelles within the cell. To illuminate the ever more specific questions studied at the edge of biological inquiry, such models inevitably become more complex. Solving, simulating, and learning from these more realistic models requires the development of new analytic techniques, numerical methods, and scalable software. In this review, we discuss some recent developments in tools for understanding how large numbers of cytoskeletal filaments, driven by molecular motors and interacting with the cytoplasm and other structures in their environment, generate fluid flows, instabilities, and material deformations which help drive crucial cellular processes.

Show Abstract

Deciphering missense coding variants with AlphaMissense

Z. Pan, Chandra L. Theesfeld

Genetic diagnosis promises to guide treatment and manage expectations for patients and physicians. Yet even when a variant in a disease gene is identified, the assignment of pathogenic impact is not always possible.1 Of the 215 million possible substitutions in approximately 19,900 genes, 71 million are missense mutations that result in an amino acid substitution rather than a stop codon or a frameshift.2 Only 4 million missense variants have been observed, of which approximately 2% have been clinically classified as pathogenic or benign by testing companies and collected in the public ClinVar repository. The rest are classified as variants of uncertain significance (VUS) due to the dearth of information on the functional impact or pathogenic consequences of the mutation.
A key challenge is to understand how changes in protein sequence affect function and contribute to disease. While the development of mutational scanning assays enables scientists to test thousands of substitutions at a time in cell lines, it is not possible to experimentally test all mutations, let alone assess fitness in humans. To meet this challenge, computational approaches that integrate many types of information and can predict functional impacts are becoming increasingly more sophisticated in their ability to accurately classify variants.
The early and powerful strategy for modeling the pathogenic impacts of variants involved employing evolutionary sequence information through the use of multiple sequence alignments (MSA). This approach examines sequence conservation across species and within humans, as demonstrated in models like PolyPhen and SIFT.3 The integration of functional insights related to protein domains and functions further enhances these models, coupled with artificial intelligence.3 Prediction of a correct 3-dimensional protein structure has long been a grail in research. Marks et al.4 suggested a global statistical model to massively reduce the search space of protein conformations by linking the pairwise correlations from MSA to fold a protein into a correct 3-dimensional structure (directly from Marks et al.4). AlphaFold5 marked a significant advancement in the field by using a large language model (LLM) to associate protein structure with MSA with unprecedented accuracy, effectively solving the “protein folding problem.” The ability of protein LLMs to learn not just amino acid relationships in linear sequences but also extremely rich relationships in any number of dimensions and contexts powers such models.

Show Abstract
  • Previous Page
  • Viewing
  • Next Page
Advancing Research in Basic Science and MathematicsSubscribe to Flatiron Institute announcements and other foundation updates