Reaching Across the Aisle to Find the Algorithms of Vision
Much of the foundational work on the visual system approached it in a very simple way: Show an animal an image, measure how its neurons respond, show another and repeat. The assumption — stated or not — was that visual processing could be understood as a rote input-output transformation. Scientists studied cells as though they respond based simply on the visual features present in the image; these responses could then be used to discriminate between different images.
While this basic understanding of the visual system has been fruitful in many ways, it has always left some researchers doubtful. These researchers believe that the anatomy and dynamics of the visual system suggest it is not simply responding in a ‘bottom-up’ way. Rather, it may be generating some of its responses based on a model of how the world works. This debate between a ‘discriminative’ versus a ‘generative’ approach to vision has gone on for decades. Though both models aim to explain visual processing, the two approaches stem from different philosophical and mathematical traditions. The result is that different researchers use only their preferred method rather than working together, creating a sharper distinction between these two approaches than the brain’s behavior may warrant.
In recent years, progress in both computer vision and computational neuroscience has shown the limits of this dichotomy and encouraged more expansive modeling of visual processing. But doing so requires that representatives of both sides come together to sort out what they each believe, where they agree, and where their predictions diverge. This is just what happened in September at the virtual Cognitive Computational Neuroscience (CCN) conference during the kickoff event for a ‘generative adversarial collaboration’ (GAC). A GAC is a process developed by CCN in 2020 to make scientific disagreements explicit and productive. Researchers submit a proposal for a controversial topic to CCN, and a handful of proposals are selected for GAC events at the conference. The following year, GAC organizers submit a position paper laying out plans for progress on their topic area and present that progress at that year’s conference.
This year’s GAC on generative and discriminative models in the visual system was organized by a team of 11 researchers. Some use discriminative approaches, some generative, but all were interested in exploring the intersection between the two. According to their proposal, they aimed to determine if “our intellectual heritage unduly polarizes our intuitions about the algorithm of vision, holding us hostage in a false dichotomy.”
Simple and fast versus flexible and slow
To frame the debate, it’s necessary to know what counts as a discriminative versus a generative system. But this is perhaps the first point of disagreement, or at least confusion. In the field of statistics, discriminative and generative models have simple definitions. Discriminative models are those that compute the probability of a latent variable, or underlying cause, given an observation. In terms of visual processing, such latent variables would be the objects out there in the world, and an observation would be the light that hits the retina. A model would do some calculation on, say, pixels from an image of the world to determine which objects are most likely to be present. Generative models instead compute the joint probability of both the latent variables and the observations. This requires knowing how probable certain objects are in the world in general, not just how probable they are in a given image.
While the calculations of these different probability distributions are technically quite different, when these computations are mapped onto the brain, the line between the two starts to blur. “If you look in more detail, everything crumbles,” says Niko Kriegeskorte, a neuroscientist at Columbia University and a GAC speaker. The field lacks strict definitions for what counts as generative versus discriminative models, and what has cropped up in the neuroscience research literature is better described as a set of loose associations.
Models representing the discriminative side tend to be feedforward, simple and fast. Deep feedforward convolutional neural networks, for instance, are exemplars of discriminative processing. These models are usually trained in a supervised way: They learn to map images to labels, for example learning to categorize images of cats and dogs. The resulting model can take in a new image and quickly label it. It’s common for discriminative systems like these networks to work in a bottom-up way, forming a simple response to their immediate inputs. They are also thought of as being specialized to a particular task, such as object recognition, because of how they are trained.
Generative models, on the other hand, are slow, but they are also more expressive, flexible and rigorous. They usually rely on unsupervised methods of training where the aim is to capture a basic understanding of the statistics and structures of the world, which can then be used for predictions. For example, in a world where cats are more common than dogs, a generative model may use the sight of paws to predict that long whiskers are also present, and eventually conclude that there is a cat in the image. Structurally, these models are more likely to have recurrent connections, particularly top-down connections from higher visual areas or the frontal cortex that carry predictive signals to the visual system. They are also more likely to represent information with a probability distribution, which allows for a full picture of the uncertainty associated with any given visual perception.
Scientists have reason to believe that either type of process may be at play in the brain. Supporters of the generative approach point to its intuitive appeal and alignment with introspection. We can, after all, generate visual perception in the form of mental imagery and dreams; such phenomena wouldn’t be possible without any top-down influences or internal world models. Learning general principles about how the world works also makes generative systems more adaptable to new circumstances. During the GAC event, Josh Tenenbaum, a neuroscientist at the Massachusetts Institute of Technology and an investigator with the Simons Collaboration on the Global Brain (SCGB), applied image filters to the video of his talk to make this point: Because our visual systems understand that videos can be filtered with different visual effects, such as color and contrast changes, we are still able to recognize the content of images with such effects applied, even if they are new to us. Supporters of the discriminative approach, however, point to its concrete successes in explaining neural data. Deep convolutional neural networks trained to classify images provide some of the best models for predicting real neural activity in response to complex visual inputs. We also know that the feedforward pathway of the visual system can implement object categorization very quickly, which is consistent with a discriminative model.
The two types of models are in different stages of development, making it difficult to compare their benefits. Current discriminative models can actually process images, giving them an edge over generative models. (This may reflect more of what researchers can do on a computer rather than what the brain can do, however.) At present, generative models are difficult to train and build and can only really be run on small toy problems — not the real-world challenges the visual system faces. Without models that are as good at image processing as today’s discriminative models, generative approaches don’t stand a chance of beating them on quantitative predictions of neural activity. Trying to compare them is a bit like comparing today’s cars to self-driving cars; self-driving cars may have some good features, but if you need to get around today they won’t be much help.
“At the end of the day, you’re going to have to have a model to test,” says Jim DiCarlo, a neuroscientist at MIT and an SCGB investigator. DiCarlo, who represented the discriminative side in the GAC, has shown the powerful ability of discriminative models trained on object recognition to predict neural activity. “Once someone builds a new image-computable model, only then can experimental data from my lab and others be used to adjudicate the accuracy of that model relative to other models.”
In a way, this reduces the debate over generative versus discriminative approaches to an engineering race. Even if generative approaches make a lot of intuitive sense, researchers still need to make them work in practice in order to make large-scale comparisons to brain activity. At present, they can’t. But generative models may not always be the underdog. Given their beneficial properties — notably, their ability to train without much labeled data — machine learning researchers hope they will become useful in the future. “It’s important that we do not confuse what we find easy or can do right now with what the brain can do,” Ralf Haefner, a neuroscientist at the University of Rochester, said at the event.
Exploring the intersection
Many models don’t fit neatly into one category or the other, as the GAC panelists were quick to note. Recurrent discriminative models exist, some generative models can be fast, and so on. Forcing the brain into boxes defined by statisticians and engineers has risks, said Benjamin Peters, a neuroscientist at Columbia, during the discussion. “We shouldn’t be too literal but rather just take inspiration from the algorithms.”
The visual system could, for example, be using a discriminative component for quick and easy visual perception but still contain generative elements for more deliberative functions. Or a built-in generative model could use its predictions about the world to help provide training data for the brain’s discriminative portion. Talia Konkle, a neuroscientist at Harvard University, argued in her talk for an acknowledgment of the separation between perception and cognition, with perception as a discriminative process and cognition as a more generative one.
Some hybrid approaches are already popular in the field of machine learning. Contrastive learning, for example, is a style of training in which a network learns to group similar things, such as different crops of the same image, and distinguish dissimilar things. This approach has generative components — the training doesn’t require explicit object labels, and it creates representations that capture a lot of the relevant statistics in the data. At the same time, it can work well with feedforward architectures, which are typical of discriminative models. And it does learn to discriminate between similar and different images.
Given that these models can fall on a spectrum, some researchers question whether it makes sense to focus on a binary divide. “Are these even really the terms we want to converge on?” says Kim Stachenfeld of DeepMind. Scientists and engineers acknowledge that a clear distinction between generative and discriminative processing isn’t necessary to build a system that works. Nor is this distinction necessary to understand the brain. “If you think it’s one or the other, you’re missing the point,” Kriegeskorte says. “I’m no longer sure if in 10 to 20 years we will think of it in this dichotomy.”
Part of the purpose of the GAC was to explore the discriminative-versus-generative debate as a means of moving the field forward. Stachenfeld felt that it was useful to try to organize approaches to the visual system into these two camps and then “see what’s left over” — the leftovers illustrate what kind of new terminology and ideas are needed. Others agreed the discussions helped clarify which features are truly essential to each type of modeling approach and how to think through the evidence for each in the brain. Kriegeskorte noted that he is now “avoiding stupid mistakes I used to make” around the use of terminology for these models.
The real test of whether these conceptual advances matter will be the extent to which they impact experiments. Experimental design is an area where real progress is harder to come by, Kriegeskorte says.
Doris Tsao, a neuroscientist at the California Institute of Technology and an SCGB investigator, proposed one avenue for experiments: isolating the generative component of the nervous system and studying its impact on neural activity in the absence of feedforward input about the current state of the world. Previous research in people with corpus callosum lesions offers some hints. In cases where a portion of this pathway between the two hemispheres had been cut, researchers showed a word such as ‘knight’ to the right hemisphere (via the left eye); this caused patients to describe (via the influence of feedback connections in their left hemisphere) a visual scene of knights even without any visual stimuli presented or conscious awareness of the word. Tsao believes similar experiments in animals could help identify the top-down generative pathways responsible for conjuring this imagery. However, GAC participants were divided on whether artificial isolation of the generative system will help elucidate its function under normal circumstances.
Most participants agreed on the need for experiments that focus more on the generative abilities of the brain. Nicole Rust, a neuroscientist at the University of Pennsylvania and an SCGB investigator, made an argument for studying visual prediction, such as the ability to predict what will happen next in a video. DiCarlo said that discussions surrounding the GAC have increased his desire to do more experiments inspired by the benefits of generative processing.
Over the next year, the group will continue to discuss concrete steps for advancing the research and share their progress with the broader community through publications and events. It was fun to “have all these people who think different things in the same room,” Stachenfeld says. And hopefully the effort this diverse group of researchers put into working through their different beliefs and assumptions will help them clarify these concepts and solidify the landscape of future research for both neuroscience and machine learning.