2022 Mathematical and Scientific Foundations of Deep Learning Annual Meeting
Conference Organizers:
Peter Bartlett, University of California, Berkeley
Rene Vidal, Johns Hopkins University
This meeting will bring together members of the NSF-Simons Research Collaborations on the Mathematical and Scientific Foundations of Deep Learning (MoDL) and of projects in the NSF program on Stimulating Collaborative Advances Leveraging Expertise in the Mathematical and Scientific Foundations of Deep Learning (SCALE MoDL). The focus of the meeting is the set of challenging theoretical questions posed by deep learning methods and the development of mathematical and statistical tools to understand their success and limitations, to guide the design of more effective methods, and to initiate the study of the mathematical problems that emerge. The meeting aims to report on progress in these directions and to stimulate discussions of future directions.
-
THURSDAY, SEPTEMBER 29
9:30 AM Edgar Dobriban | T-Cal: An Optimal Test for the Calibration of Predictive Models 11:00 AM Sasha Rakhlin | Decision-Making Without Confidence or Optimism: Beyond Linear Models 1:00 PM René Vidal | Semantic Information Pursuit for Explainable AI 2:30 PM Andrea Montanari | From Projection Pursuit to Interpolation Thresholds in Small Neural Networks 4:00 PM Chinmay Maheshwari | Human Automation Teams in Societal Scale Systems FRIDAY, SEPTEMBER 30
9:30 AM Elisabetta Cornacchia | Learning with Neural Networks: Generalization, Unseen Data and Boolean Measures 11:00 AM Soledad Villar | Invariances and Equivariances in Machine Learning 1:00 PM Gal Vardi | Implications of the Implicit Bias in Neural Networks -
Elisabetta Cornacchia
Learning with Neural Networks: Generalisation, Unseen Data and Boolean Measures
We consider the learning of logical functions with gradient descent (GD) on neural networks. We introduce the notion of “Initial Alignment’’ (INAL) between a neural network at initialization and a target function and prove that if a network and target do not have a noticeable INAL, then noisy gradient descent on a fully connected network with i.i.d. initialization cannot learn in polynomial time. Moreover, we prove that on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in prior works. We then show that in the distribution shift setting, when the data withholding corresponds to freezing a single feature, the generalisation error admits a tight characterisation in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers. In particular, this puts forward the hypothesis that for such architectures and for learning logical functions, GD tends to have an implicit bias towards low-degree representations.
Based on two joint works with E. Abbé, S. Bengio, J. Hązła, J. Kleinberg, A. Lotfi, C. Marquis, M. Raghu, C. Zhang.
Alexander Rankhlin
A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient learning algorithms that achieve near-optimal regret. Characterizing the statistical complexity in this setting is
challenging due to the interactive nature of the problem. The difficulty is compounded by the use of complex non-linear models, such as neural networks, in the decision-making context. We present a complexity measure that is proven to be both necessary and sufficient for interactive learning, as well as a unified algorithm design principle. This complexity measure is inherently information-theoretic, and it unifies a number of existing approaches — both Bayesian and frequentist.
Chinmay Maheshwari
Human Automation Teams in Societal Scale Systems
Opportunities abound for the transformation of societal systems using new automation technologies — most notably, the integration of IoT, AI/ML and Cloud Computing — and business models to address some of the most pressing problems in diverse sectors such as energy, transportation, health care, manufacturing, financial systems and online marketplaces. These new technologies have significantly facilitated independent and decentralized decision making for humans and led to emergence of Human Automation Teams (HATs). Designing appropriate learning, adaptation and incentive schemes for HATs is one of the most engaging problems in AI/ML systems today. If left unaddressed, HATs can lead to disastrous outcomes for the society. In this talk, I will discuss fundamental challenges associated with developing decentralized and independent decision making algorithms for HATs and present novel design methodologies to ensure convergence to a “stable” societal outcome. Specifically, I will focus on two different vignettes of HATs — (i) two-sided online matching markets and (ii) infinite-horizon discounted multi-agent reinforcement learning.
This talk is based on joint work with Prof. Shankar Sastry (Berkeley), Prof. Manxi Wu (Cornell), Prof. Eric Mazumdar (Caltech), Druv Pai (Berkeley)
Gal Vardi
Implications of the implicit bias in neural networks
When training large neural networks, there are generally many weight combinations that will perfectly fit the training data. However, gradient-based training methods somehow tend to reach those which generalize well, and understanding this “implicit bias” has been a subject of extensive research. In this talk, I will discuss recent works which show several implications of the implicit bias (in homogeneous neural networks trained with the logistic loss): (1) In shallow univariate neural networks the implicit bias provably implies generalization; (2) By using a characterization of the implicit bias, it is possible to reconstruct a significant fraction of the training data from the parameters of a trained neural network, which might shed light on representation learning and memorization in deep learning, but might also have negative potential implications on privacy; (3) In certain settings, the implicit bias provably implies convergence to non-robust networks, i.e., networks which are susceptible to adversarial examples.
Based on joint works with Niv Haim, Itay Safran, Gilad Yehudai, Michal Irani, Jason D. Lee, and Ohad Shamir.