linkedin reddit search_black sharethis

2025 Simons Collaboration on the Theory of Algorithmic Fairness Annual Meeting

Date


Meeting Goals:

The Simons Collaboration on the Theory of Algorithmic Fairness aims to establish firm mathematical foundations, through the lens of computer science theory, for the burgeoning area of algorithmic fairness.

The 2025 annual meeting explored recent advances in the field from a multidisciplinary perspective featuring both theoretical work as well as research from outside the collaboration with the purpose of inspiring new research directions and connections.

In addition to more traditional challenges in algorithmic fairness, the annual meeting spotlit the revolutionary technologies reshaping the world, with a special emphasis on generative AI. These technologies raise new concerns about potential discrimination, which should be rigorously formalized and addressed.

Speakers:

Boaz Barak, Harvard University
Constantinos Daskalakis, MIT
Kate Donahue, MIT
Hoda Heidari, CMU
Manish Raghavan, MIT
Adam Tauman Kalai, Open AI
Richard Zemel, Columbia University
Annette Zimmermann, University of Wisconsin-Madison

Previous Meetings:

2022 Annual Meeting
2023 Annual Meeting
2024 Annual Meeting

  • The fourth annual meeting of the Simons Collaboration on the Theory of Algorithmic Fairness explored recent advances in the field from a multidisciplinary perspective, incorporating both theoretical work and research from outside the collaboration to inspire new directions and connections. This year, we focused on revolutionary technologies reshaping the world, with a particular emphasis on generative AI—specifically, Large Language Models (LLMs). These technologies introduce new concerns about potential discrimination, which must be rigorously formalized and addressed. While the past fifteen years of research on AI fairness remains highly relevant to fairness in generative AI, the increased autonomy and unstructured interactions of these models with a wide range of individuals in their daily lives present novel theoretical and practical challenges that demand careful examination.

    The first talk was by Philosopher Annette Zimmermann on democratizing AI. The talk discussed the concentration of decision-making regarding the development, release, and oversight of advanced models by a small connection of individuals. This talk relied on conceptual resources in analytic political philosophy and democratic theory to analyze the processes of agenda-setting and the setting of values AI models should align with. This thought-provoking talk left the community with questions about the possible roles it can play in affecting relevant policies.

    The focus of the following three talks was on systems which are composed of humans collaborating with AI. Kate Donahue focused on the role theoretical modeling can play in better understanding when socially desirable outcomes are feasible accounting for human cognitive biases and the way that the human and AI interact. A major aspect that affects impossibilities and benefits in human-AI collaboration is the respective levels of accuracy of the AI and the humans and also the extent in which their information and expertise complement each other. Hoda Heidari focused on measuring potential discrimination in the very complex settings of generative AI. Heidari extended a classification of measures for bias and unfairness from the more established literature on prediction and classification to general-purpose generative AI. The theme of evaluating LLMs was echoed in the second day of the meeting by Richard Zemel with a focus on the diversity of the models. Manish Raghavan also explored the impact of differences in information between humans and AI. In addition, Raghavan explored homogeneity in information between humans using the same AI potentially creating conditions of monoculture.

    PI Daskalakis developed methods that extend a major theme of learning from corrupted data to deep generative models. Given the large quantities of data needed to train such models it is often infeasible to collect a sufficiently large dataset of high-quality data. Daskalakis described the methods for denoising diffusion models, one of the most prominent classes of deep generative models and illustrated the performance of their method on various datasets and contexts.

    The final two talks of the meeting provided a unique opportunity to investigate the thinking of theoreticians who are currently working for Open AI — a leader in the development of LLMs. Adam Tauman Kalai considered fairness toward the user who is interacting with a chatbot, which he coined first-fairness. Kalai presented a methodology for analyzing first-person fairness, a task that is non-trivial given the complicated and open-ended outputs of an LLM (echoing the theme explored in previous talks). Kalai discussed how to define harmful biases (where the LLM personalizes its answers in an unwelcome way) and showed that post-training reinforcement learning significantly reduces such harmful biases in ChatGPT. Boaz Barak discussed the challenges of alignment in LLMs, making the point that spending more computation at inference time may be needed to increase AI models reliably even in situations outside their training distribution. Barak discussed the idea of deliberative alignment as a method alignment simultaneously increases robustness to jailbreaks while decreasing over-refusal rates, as well as improving out-of-distribution generalization.

  • Thursday, February 6

    9:30 - 10:30 AMAnnette Zimmermann | Democratizing AI: Why and How
    11:00 - 12:00 PMKate Donahue | Opportunities and Challenges in Human-AI Systems
    1:00 - 2:00PMHoda Heidari | Reflections on Fairness Measurement: From Predictive to Generative AI
    2:30 - 3:30 PMManish Raghavan | The Role of Information in Human-AI Systems
    4:00 - 5:00 PMConstantinos Daskalakis | Training Deep Generative Models using Corrupted Data

    Friday, February 7

    9:30 - 10:30 AMRichard Zemel | Improving the Diversity and Evaluation of Large Language Models
    11:00 - 12:00 PMAdam Tauman Kalai | First-Person Fairness in Chatbots
    1:00 - 2:00 PMBoaz Barak | AI Safety via Inference-Time Compute
  • Boaz Barak
    Harvard University and Open AI

    AI Safety via Inference-Time Compute
    View Slides (PDF)

    Ensuring AI models reliably follow human intent, even in situations outside their training distribution, is a challenging problem. In this talk, Boaz Barak will discuss how spending more computation at inference time can be used to improve robust adherence to human-specified policies, specifically using reasoning AI models such as OpenAI’s o1-preview, o1-mini, and o1.

    In particular, Barak will present Deliberative Alignment: A new safety training paradigm that directly teaches the model safety specifications and trains it to explicitly recall and accurately reason over the specifications before answering. Deliberative alignment simultaneously increases robustness to jailbreaks while decreasing over-refusal rates, as well as improving out-of-distribution generalization.
    Constantinos Daskalakis
    MIT

    Training Deep Generative Models Using Corrupted Data

    Deep generative models have found a plethora of applications in machine Learning, and various other scientific and applied fields, where they are used for sampling complex, high-dimensional distributions and are leveraged in downstream analyses involving such distributions. It is well understood that the quality of these models depends on the size and quality of the data they are trained on, however creating large-scale, high-quality datasets is often expensive and sometimes impossible or even undesirable. Indeed, in many scientific domains we have no access to high-quality data due to physical or instrumentation constraints. In other applications, curating high-quality datasets is resource-intensive, e.g., finding the three-dimensional structure of proteins is a slow and expensive process, and producing high-quality MRI scans requires keeping subjects for a longer time in MRI machines.

    Finally, some of the most prominent deep generative models have been shown to memorize their training data, which they can be manipulated to reveal. Is it possible to train deep generative models to generate high-quality samples given a training set of samples that have been corrupted due to some underlying physical process or to protect sensitive information? And is it possible to train models given a training set of samples with heterogeneous quality? We develop such methods for denoising diffusion models, one of the most prominent classes of deep generative models. We illustrate the performance of our method on various datasets.

    This is based on joint works with Yeshwanth Cherapanamjeri, Giannis Daras, Alex Dimakis, and other collaborators.
    Kate Donahue
    MIT

    Opportunities and Challenges in Human-AI Systems
    View Slides (PDF)

    In this talk, Kate Donahue will describe prior work and current research in human-AI collaboration, specifically focusing on the role theoretical modeling can play in better understanding when socially desirable outcomes are feasible. Donahue will discuss the influence that different factors can have, such as human cognitive biases, the way that the human and AI interact, and different levels of accuracy.

    First, Donahue will review the goal of strict benefits to human-algorithm collaboration (complementarity, Bansal et al., 2021), presenting several impossibility results and conditions for when complementarity may be in tension with other goals, such as fairness. These results help us understand when we can (and cannot) achieve certain accuracy goals and give insight into how we should design AI tools. Next, she will present a stylized model of a strategic decision using an algorithmic tool: a firm using an algorithmic tool to select candidates. We show that when the firm has access to side information (e.g., employment status of candidates), counter-intuitive results can occur, such as increased accuracy of the AI tool leading to worse social outcomes. Finally, she will conclude by discussing several directions in human-LLM interaction and the ways in which generative AI poses unique challenges (and benefits).
    Hoda Heidari
    Carnegie-Mellon University

    Reflections on Fairness Measurement: From Predictive to Generative AI

    The algorithmic fairness literature has historically focused on predictive AI models designed to automate or assist specific decisions in high-stakes domains. Hoda Heidari will contrast that line of work with the recently growing set of benchmarks, metrics, and measures for bias and unfairness through general-purpose generative AI. Heidari will offer some reflections on conceptualizing and measuring GenAI unfairness, drawing on past work, which mapped mathematical fairness notions to measures of equality of opportunity. Hoda will conclude with the implications of this framework for the valid measurement of unfairness—and more broadly, societal risks—for general-purpose AI.
    Manish Raghavan
    MIT

    The Role of Information in Human-AI Systems
    View Slides (PDF)

    Human-AI systems seek to leverage complementary strengths and cover for weaknesses. In this talk, Manish Raghavan will discuss two settings in which information plays a key role: clinical decision-making and content production. At a high level, we study the impact of (1) differences in information between humans and AI, and (2) homogeneity in information between humans using the same AI.
    Adam Tauman Kalai
    Open AI

    First-Person Fairness in Chatbots
    View Slides (PDF)

    Much research on fairness has focused on institutional decision-making tasks, such as resume screening. Meanwhile, hundreds of millions of people use chatbots like ChatGPT for very different purposes, ranging from resume writing and technical support to entertainment. We study “first-person fairness,” which means fairness toward the user who is interacting with a chatbot. The main challenge in analyzing first-person fairness is that chatbots generate open-ended text for a variety of tasks, hence existing fairness notions such as equalized odds do not necessarily apply. We present a methodology which can be applied to future chatbots as well as experiments demonstrating its effectiveness on ChatGPT. We find that post-training reinforcement learning significantly reduces harmful biases in ChatGPT.

    This is joint work with Tyna Eloundou, Alex Beutel, David G. Robinson, Keren Gu-Lemberg, Anna-Luisa Brakman, Pamela Mishkin, Meghan Shah, Johannes Heidecke, and Lilian Weng.
    Richard Zemel
    Columbia University

    Improving the Diversity and Evaluation of Large Language Models
    View Slides (PDF)
    Annette Zimmermann
    University of Wisconsin–Madison

    Democratizing AI: Why and How
    View Slides (PDF)

    Calls for ‘democratizing AI’ have become increasingly ubiquitous, particularly during the latest generative AI deployment wave. While this wave has prompted renewed and highly publicized fears about AI possibly posing an existential threat to humanity, there has also been increased enthusiasm about the positive potential of a large number of people now starting to integrate generative AI tools into their daily lives. Surely, the thinking goes, this will ‘democratize AI’ by making it much more accessible to ordinary citizens without specialized expertise. But it would be misguided to jump to that conclusion. Crucially, just because generative AI is at our fingertips, that does not mean it is truly in our hands.

    As a democratic constituency, we currently find ourselves in a choice environment characterized by an asymmetrical concentration of power, wealth, and information in the hands of a small group of AI deployers, coupled with a widespread dispersion of AI-related risks across the demos at large. In this talk, I argue that this asymmetry creates a particular type of democratic legitimacy deficit that warrants a particular type of collective response. More specifically, I argue that AI deployers, most of whom are oligopolistic corporate actors not subject to direct democratic authorization, are currently able to unilaterally set AI regulation agendas by the mere act of developing new tools and then granting public access to them, while simultaneously offloading responsibilities for ex post harm mitigation onto government actors and—by extension—democratic constituencies. Once democratic constituencies relinquish the power to effectively set agendas, and find themselves locked in a dynamic of reacting to a (possibly unduly narrow and risky) cluster of options presented to them by an unrepresentative subset of the democratic constituency, they forfeit an essential feature of democracy itself: the right to exercise free and equal anticipatory control over the nature and scope of decisions shaping their shared future. Rather than uncritically accepting AI exceptionalism, the view that AI is a qualitatively unusual policy domain that cannot possibly be subjected to democratic contestation and control, we—as democratic constituencies—must identify ways of taking charge, inspired by the ways in which other complex policy domains have been democratized.

    This talk mobilizes previously untapped conceptual resources in analytic political philosophy and democratic theory to systematically analyze how contemporary processes of AI deployment may align with, and differ from, other instances of (un-)democratic agenda-setting; to articulate which specific democratic values ought to guide efforts to subject AI deployment to democratic control; and to specify which concrete political practices and institutional transformations can best approximate such values.

Videos

    February 6, 2025

  • February 7, 2025

Subscribe to MPS announcements and other foundation updates