New Directions in Theoretical Machine Learning (2019)
Organizers:
Sanjeev Arora, Princeton University
Maria-Florina Balcan, Carnegie Mellon University
Sanjoy Dasgupta, University of California, San Diego
Sham Kakade, University of Washington
The Simons Symposium on New Directions in Theoretical Machine Learning brought together a select group of experts with diverse skills and backgrounds to discuss the following questions:
- How can we build on the recent success in supervised learning for perceptual and related tasks?
- What’s next for ML if perception gets solved?
- Is the current set of methods sufficient to take us to the next level of “intelligent” reasoning?
- If not, what is missing, and how can we rectify it?
- What role can classical ideas in Reasoning, Representation Learning, Reinforcement Learning, Interactive Learning, etc. have to play?
- What modes of analyses do we need to even conceptualize the next level of machine capabilities, and what will be good ways to test those capabilities?
-
Overview: The workshop brought together a select group of experts with different backgrounds to discuss the next set of theoretical challenges for advancing machine learning and artificial intelligence. There have a been a number of breakthrough developments in the last few years: language models are rapidly getting better (in the last few months, there have been impressive advances in computer generated text); image recognition is working far more accurately in new domains (relevant to scene parsing and navigation); and progress in robotics is rapidly accelerating (with much focus on self-driving cars). The workshop had leading experts in all of these areas and focused on formulating new foundational and algorithmic questions to accelerate the progress of the field.
Developments and Novel Questions:
What is the role of models and physics in robotics, reinforcement learning, interactive learning and self-driving cars?
A central question in robotics and interactive learning is how to make use of models of the environment (such as physical simulators). While much of machine learning does not utilize models of the world (ML is largely prediction based), the area of robotics is one where it is increasingly clear that models of the environment are important. D. Bagnell is one of the world’s leading experts in self-driving cars, along with being a leading robotics theorist, and he discussed the interplay of human demonstration with physical models; the focus was on how to obtain provably better planning algorithms (say for self-driving cars) utilizing models of the world along with human examples. E. Todorov is a robotics theorist and has also developed one of the most widely used physics simulators in robotics; he spoke about how ML methods for robotics are unlikely to succeed without incorporating physical models. E. Brunskill discussed the fundamental limits of how accurately we need to learn a model in order to utilize a model of planning purposes.
How do our current set of methods need to be modified in order to take us to the next level of ‘intelligent’ reasoning?
There has been much recent remarkable success in natural language processing. There were a number of talks related to the key shortcomings in current language modeling approaches, in terms of a lack of the ability to capture reasoning and long-term dependencies. D. Roth gave a thought-provoking talk on how to incorporate reasoning and logic in language models through a notion of incidental reasoning, where an agent learns a way to reason from co-occurrences in the data. G. van den Broeck talked about systems for probabilistic reasoning that can both be learned and used efficiently, overcoming complexity barriers that have traditionally hobbled the field. S. Kakade discussed the challenges with language models that utilize long-term memory, relevant to reasoning from information stored in the ‘deep past.’ On a related topic of reasoning, C. Szegedy discussed the role of reasoning in the context of mathematics and theorem proving.
How can representation learning and unsupervised learning be used for faster learning in new domains and for finding better perceptual representations?
For example, people learn to identify new objects with just a few examples. There is an increasing belief that such accelerated learning may be possible with machine learning approaches in the near future, where systems can also rapidly learn to identify new objects. P. Isola gave a talk, “Toward the Evolution and Development of Prior Knowledge,” which focused on insightful ideas concerning how to build systems that learn from implicit signals (e.g., how to incorporate video information when learning about object recognition). S. Arora gave a theory of representational learning that elucidates how perceptual representations in language (or other domains including vision) can be learned in an unsupervised manner (without a teacher), which allows for more rapid learning in new contexts.
What are new models for of learning, such as lifelong learning, learning with a teacher, etc.?
There is a growing need to have systems evolve over time and gradually improve from experience. T. Mitchell gave a talk about systems that continually learn from experience based on insights from his Never-Ending Language Learner (NELL) system, which was a continually running algorithm for learning. S. Dasgupta spoke about models in which a benign teacher chooses examples that enable a learner to quickly acquire an accurate model. D. Sadigh and Y. Yue discussed challenges in settings where a robot or other mechanical device must interact with and learn from a human.
Other topics on the frontiers of machine learning
- To what extent, and in what ways, can causality be inferred from data? (B. Scholkopf)
- How can data be used to guide the design of algorithms? (N. Balcan)
- What is the new theoretical landscape of privacy models, in the wake of recent EU regulation? (K. Chaudhuri)
Another important direction discussed was data-driven algorithm design for combinatorial problems, an important aspect of modern data science and algorithm design. Rather than using off-the-shelf algorithms that only have worst-case performance guarantees, practitioners typically optimize over large families of parameterized algorithms and tune the parameters of these algorithms using a training set of problem instances from their domain to determine a configuration with high expected performance over future instances. However, so far, most of this work came with no performance guarantees. Nina Balcan presented recent provable guarantees for these scenarios, both for the batch and online scenarios where a collection of typical problem instances from the given application are presented either all at once or in an online fashion, respectively. The challenge is that for many combinatorial problems of significant importance to machine learning, including partitioning and subset selection problems, a small tweak to the parameters can cause a cascade of changes in the algorithm’s behavior, so the algorithm’s performance is a discontinuous function of its parameters, which leads to very interesting learning theory questions as well.
Future collaborations:
The interactions between the researchers were excellent, with active discussion and potential future follow-ups in a number of areas. These include:
- Tengyu Ma and Sham Kakade plan to examine the question of implicit regularization in deep neural networks in language models.
- Drew Bagnell, Sham Kakade, Emo Todorov and Elad Hazan discussed questions of error feedback learning (and iterative learning control) as robust control methods, where we could obtain provable guarantees.
- Nina Balcan and Yisong Yue discussed providing fast branch and bound algorithms with provable guarantees for solving mixed-integer programs by using a data-driven algorithm design approach.
- Nina Balcan, Sanjoy Dasgupta, Guy van den Broeck and Dan Roth talked about label-efficient learning in large-scale multitask scenarios where one can exploit known or learned logical constraints among tasks.
- Phillip Isola and Sham Kakade discussed questions regarding a theory and algorithm for learning from ‘two views’ (e.g., learn from a video stream, where one view is the past and the other is the ‘future’).
- Phillip Isola and Sanjeev Arora plan to further discuss ideas for representation learning.
-
MONDAY
10:00 - 11:00 AM Christian Szegedy | Is Math Only for Humans?
View Slides11:30 - 12:30 PM Tom Mitchell | What Questions Should a Theory of Learning Agent Answer?
View Slides5:00 - 6:00 PM Dan Roth | Incidental Supervision and Reasoning Challenges
View Slides6:15 - 7:15 PM Sanjeev Arora | Discussion: Thoughts on a theory for unsupervised learning, with applications to RL, learning to learn, etc.
View SlidesTUESDAY
10:00 - 11:00 AM Drew Bagnell | Imitation, Feedback and Games 11:30 - 12:30 PM Emo Todorov | Model-based Control
View SlidesAndreas Krause | Towards Safe Reinforcement Learning 5:00 - 6:00 PM Yisong Yue | Blending Learning & Control via Functional Regularization
View Slides6:15 - 7:15 PM Emma Brunskill | Machine Learning Challenges from Computers that Learn to Help People
View SlidesWEDNESDAY
9:45 - 2:00 PM Guided Hike to Partnach Gorge 5:00 - 6:00 PM Elad Hazan | New Algorithms and Directions in Reinforcement Learning
View Slides6:15 - 7:15 PM Phillip Isola | From Amoebas to Infants: Toward the Evolution and Development of Prior Knowledge
View SlidesTHURSDAY
10:00 - 11:00 AM Dorsa Sadigh | Machine Learning for Human-Robot Systems
View Slides11:30 - 12:30 PM Nina Balcan | Data-driven/Machine-learning Augmented Algorithm Design
View Slides5:00 - 6:00 PM Bernhard Schölkopf | Toward causal learning
View Slides6:15 - 7:15 PM Sanjoy Dasgupta | Using Interaction to Overcome Basic Hurdles in Learning
View SlidesDaniel Hsu | Interactive Learning via Reductions
View SlidesFRIDAY
10:00 - 11:00 AM Ulrike Luxburg | Comparison-based Machine Learning Sham Kakade | Learning, Memory, and Entropy
View Slides11:30 - 12:30 PM Guy Van den Broeck | Circuit Languages as a Synthesis of Learning and Reasoning
View Slides5:00 - 6:00 PM Kamalika Chaudhuri | New Directions in Privacy-preserving Data Analysis
View SlidesMoritz Hardt | The sociotechnical forces against overfitting
View Slides6:15 - 7:15 PM Tengyu Ma | Data-dependent Regularization and Sample Complexity Bounds of Deep Neural Networks
View Slides -
Sanjeev Arora Princeton University Drew Bagnell Aurora Innovation Maria-Florina Balcan Carnegie Mellon University Emma Brunskill Stanford University Kamalika Chaudhuri University of California, San Diego Sanjoy Dasgupta University of California, San Diego Moritz Hardt University of California Berkeley Elad Hazan Princeton University Daniel Hsu Columbia University Phillip Isola Massachusetts Institute of Technology Sham Kakade University of Washington Andreas Krause ETH Zürich Tengyu Ma Stanford University Tom Mitchell Carnegie Mellon University Dan Roth University of Pennsylvania Dorsa Sadigh Stanford University Bernard Schölkopf Max Planck Institute for Intelligent Systems Christian Szegedy Google Emo Todorov University of Washington Guy van den Broeck University of California, Los Angeles Ulrike von Luxburg Universität of Tübingen Yisong Yue California Institute of Technology