CCM: Publications

Sharp error estimates for target measure diffusion maps with applications to the committor problem

Shashank Sule, L. Evans, Maria Cameron

We obtain asymptotically sharp error estimates for the consistency error of the Target Measure Diffusion map (TMDmap) (Banisch et al. 2020), a variant of diffusion maps featuring importance sampling and hence allowing input data drawn from an arbitrary density. The derived error estimates include the bias error and the variance error. The resulting convergence rates are consistent with the approximation theory of graph Laplacians. The key novelty of our results lies in the explicit quantification of all the prefactors on leading-order terms. We also prove an error estimate for solutions of Dirichlet BVPs obtained using TMDmap, showing that the solution error is controlled by consistency error. We use these results to study an important application of TMDmap in the analysis of rare events in systems governed by overdamped Langevin dynamics using the framework of transition path theory (TPT). The cornerstone ingredient of TPT is the solution of the committor problem, a boundary value problem for the backward Kolmogorov PDE. Remarkably, we find that the TMDmap algorithm is particularly suited as a meshless solver to the committor problem due to the cancellation of several error terms in the prefactor formula. Furthermore, significant improvements in bias and variance errors occur when using a quasi-uniform sampling density. Our numerical experiments show that these improvements in accuracy are realizable in practice when using $\delta$-nets as spatially uniform inputs to the TMDmap algorithm.

Show Abstract

Improving Convergence and Generalization Using Parameter Symmetries

Bo Zhao, R. M. Gower, Robin Walters, Rose Yu

In overparametrized models, different parameter values may result in the same loss. Parameter space symmetries are loss-invariant transformations that change the model parameters. Teleportation applies such transformations to accelerate optimization. However, the exact mechanism behind this algorithm's success is not well understood. In this paper, we prove that teleportation gives overall faster time to convergence. Additionally, teleporting to minima with different curvatures improves generalization, which suggests a connection between the curvature of the minima and generalization ability. Finally, we show that integrating teleportation into optimization-based meta-learning improves convergence over traditional algorithms that perform only local updates. Our results showcase the versatility of teleportation and demonstrate the potential of incorporating symmetry in optimization.

Show Abstract

Microscopic Theory, Analysis, and Interpretation of Conductance Histograms in Molecular Junctions

Leopoldo Mejía, P. Cossio, Ignacio Franco

Molecular electronics break-junction experiments are widely used to investigate fundamental physics and chemistry at the nanoscale. Reproducibility in these experiments relies on measuring conductance on thousands of freshly formed molecular junctions, yielding a broad histogram of conductance events. Experiments typically focus on the most probable conductance, while the information content of the conductance histogram has remained unclear. Here we develop a microscopic theory for the conductance histogram by merging the theory of force-spectroscopy with molecular conductance. The procedure yields analytical equations that accurately fit the conductance histogram of a wide range of molecular junctions and augments the information content that can be extracted from them. Our formulation captures contributions to the conductance dispersion due to conductance changes during the mechanical elongation inherent to the experiments. In turn, the histogram shape is determined by the non-equilibrium stochastic features of junction rupture and formation. The microscopic parameters in the theory capture the junction’s electromechanical properties and can be isolated from separate conductance and rupture force (or junction-lifetime) measurements. The predicted behavior can be used to test the range of validity of the theory, understand the conductance histograms, design molecular junction experiments with enhanced resolution and molecular devices with more reproducible conductance properties.

Show Abstract

Bayesian spatial modelling of localised SARS-CoV-2 transmission through mobility networks across England

Thomas Ward, Mitzi Morris , Andrew Gelman, B. Carpenter, William Ferguson, Christopher Overton, Martyn Fyles

In the early phases of growth, resurgent epidemic waves of SARS-CoV-2 incidence have been characterised by localised outbreaks. Therefore, understanding the geographic dispersion of emerging variants at the start of an outbreak is key for situational public health awareness. Using telecoms data, we derived mobility networks describing the movement patterns between local authorities in England, which we have used to inform the spatial structure of a Bayesian BYM2 model. Surge testing interventions can result in spatio-temporal sampling bias, and we account for this by extending the BYM2 model to include a random effect for each timepoint in a given area. Simulated-scenario modelling and real-world analyses of each variant that became dominant in England were conducted using our BYM2 model at local authority level in England. Simulated datasets were created using a stochastic metapopulation model, with the transmission rates between different areas parameterised using telecoms mobility data. Different scenarios were constructed to reproduce real-world spatial dispersion patterns that could prove challenging to inference, and we used these scenarios to understand the performance characteristics of the BYM2 model. The model performed better than unadjusted test positivity in all the simulation-scenarios, and in particular when sample sizes were small, or data was missing for geographical areas. Through the analyses of emerging variant transmission across England, we found a reduction in the early growth phase geographic clustering of later dominant variants as England became more interconnected from early 2022 and public health interventions were reduced. We have also shown the recent increased geographic spread and dominance of variants with similar mutations in the receptor binding domain, which may be indicative of convergent evolution of SARS-CoV-2 variants.

Show Abstract

Birth of a Transformer: A Memory Viewpoint

A. Bietti, Vivien Cabannes, Diane Bouchacourt, Herve Jegou, Leon Bottou

Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We study how transformers balance these two types of knowledge by considering a synthetic setup where tokens are generated from either global or context-specific bigram distributions. By a careful empirical analysis of the training process on a simplified two-layer transformer, we illustrate the fast learning of global bigrams and the slower development of an “induction head” mechanism for the in-context bigrams. We highlight the role of weight matrices as associative memories, provide theoretical insights on how gradients enable their learning during training, and study the role of data-distributional properties.

Show Abstract

On Learning Gaussian Multi-index Models with Gradient Flow

A. Bietti, Joan Bruna, L. Pillaud-Vivien

We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection. By appropriately exploiting the matrix semigroup structure arising over the subspace correlation matrices, we establish global convergence of the resulting Grassmannian population gradient flow dynamics, and provide a quantitative description of its associated `saddle-to-saddle' dynamics. Notably, the timescales associated with each saddle can be explicitly characterized in terms of an appropriate Hermite decomposition of the target link function. In contrast with these positive results, we also show that the related

Show Abstract

Direct stellarator coil design using global optimization: application to a comprehensive exploration of quasi-axisymmetric devices

A. Giuliani

Many stellarator coil design problems are plagued by multiple minima, where the locally optimal coil sets can sometimes vary substantially in performance. As a result, solving a coil design problem a single time with a local optimization algorithm is usually insufficient and better optima likely do exist. To address this problem, we propose a global optimization algorithm for the design of stellarator coils and outline how to apply box constraints to the physical positions of the coils. The algorithm has a global exploration phase that searches for interesting regions of design space and is followed by three local optimization algorithms that search in these interesting regions (a "global-to-local" approach). The first local algorithm (phase I), following the globalization phase, is based on near-axis expansions and finds stellarator coils that optimize for quasisymmetry in the neighborhood of a magnetic axis. The second local algorithm (phase II) takes these coil sets and optimizes them for nested flux surfaces and quasisymmetry on a toroidal volume. The final local algorithm (phase III) polishes these configurations for an accurate approximation of quasisymmetry. Using our global algorithm, we study the trade-off between coil length, aspect ratio, rotational transform, and quality of quasi-axisymmetry. The database of stellarators, which comprises almost 140,000 coil sets, is available online and is called QUASR, for "QUAsi-symmetric Stellarator Repository".

Show Abstract

Discriminative calibration: Check Bayesian computation from simulations and flexible classifier

Y. Yao, Justin Domke

To check the accuracy of Bayesian computations, it is common to use rank-based simulation-based calibration (SBC). However, SBC has drawbacks: The test statistic is somewhat ad-hoc, interactions are difficult to examine, multiple testing is a challenge, and the resulting p-value is not a divergence metric. We propose to replace the marginal rank test with a flexible classification approach that learns test statistics from data. This measure typically has a higher statistical power than the SBC rank test and returns an interpretable divergence measure of miscalibration, computed from classification accuracy. This approach can be used with different data generating processes to address likelihood-free inference or traditional inference methods like Markov chain Monte Carlo or variational inference. We illustrate an automated implementation using neural networks and statistically-inspired features, and validate the method with numerical and real data experiments.

Show Abstract

Trapped acoustic waves and raindrops: high-order accurate integral equation method for localized excitation of a periodic staircase

F. Agocs, A. Barnett

We present a high-order boundary integral equation (BIE) method for the frequency-domain acoustic scattering of a point source by a singly-periodic, infinite, corrugated boundary. We apply it to the accurate numerical study of acoustic radiation in the neighborhood of a sound-hard two-dimensional staircase modeled after the El Castillo pyramid. Such staircases support trapped waves which travel along the surface and decay exponentially away from it. We use the array scanning method (Floquet--Bloch transform) to recover the scattered field as an integral over the family of quasiperiodic solutions parameterized by their on-surface wavenumber. Each such BIE solution requires the quasiperiodic Green's function, which we evaluate using an efficient integral representation of lattice sum coefficients. We avoid the singularities and branch cuts present in the array scanning integral by complex contour deformation. For each frequency, this enables a solution accurate to around 10 digits in a couple of seconds. We propose a residue method to extract the limiting powers carried by trapped modes far from the source. Finally, by computing the trapped mode dispersion relation, we use a simple ray model to explain an observed acoustic "raindrop" effect (chirp-like time-domain response).

Show Abstract

Stabilizing the calculation of the self-energy in dynamical mean-field theory using constrained residual minimization

Harrison LaBollita, J. Kaye, Alexander Hampel

We propose a simple and efficient method to calculate the electronic self-energy in dynamical mean-field theory (DMFT), addressing a numerical instability often encountered when solving the Dyson equation. Our approach formulates the Dyson equation as a constrained optimization problem with a simple quadratic objective. The constraints on the self-energy are obtained via direct measurement of the leading order terms of its asymptotic expansion within a continuous time quantum Monte Carlo framework, and the use of the compact discrete Lehmann representation of the self-energy yields an optimization problem in a modest number of unknowns. We benchmark our method for the non-interacting Bethe lattice, as well as DMFT calculations for both model systems and \textit{ab-initio} applications.

Show Abstract