Advancing Astronomy With Statistics
When Dan Foreman-Mackey started his career as an astronomer, he knew he wanted to take a different approach to research. He was interested in the technical side of astronomy, particularly statistics and the role it could play in the field.
Today, as a research scientist at the Flatiron Institute’s Center for Computational Astrophysics (CCA), he uses statistical techniques to help astronomers make new discoveries. With one foot in each field, he serves as a liaison between computational science and astronomy, translating fundamental statistical methods into code that can be used to study exoplanets — planets that orbit stars other than our sun.
Foreman-Mackey was previously an associate research scientist at the CCA. Prior to joining the Flatiron Institute, he worked as a Sagan postdoctoral fellow at the University of Washington. He earned a doctorate in physics at New York University, a master’s in physics at Queen’s University in Canada, and a bachelor’s in physics at McGill University.
Foreman-Mackey recently spoke with the Simons Foundation about his work and its applications. The conversation has been edited for clarity.
What is the focus of your work at the CCA?
I develop software infrastructure to support astronomy research. A lot of my time is spent thinking about how to build tools that can make astronomers’ work easier. These tools are generally for data analysis. My goal is to help astronomers best use the observational data they’ve collected so that they can extract the most information out of it in a statistically rigorous way.
Lately, I’ve been focusing on applications for exoplanet data, such as ways to discover new exoplanets or ways to characterize the atmospheres of the ones we know about. Astronomers have been trying to understand the compositions of the atmospheres of other planets, particularly through identifying individual molecules, such as ones that might indicate whether the planet could be hospitable for life. Traditionally, that’s been an extremely hard measurement to make because exoplanets are far away, small and very faint. Observations are getting better, but even with extremely high-quality data, those measurements still tend to be quite small. This means it’s hard to tell whether a signal is just some instrumental effect or actually an observation of a specific molecule.
I’ve been excited about trying to bring some of my statistical expertise to this problem. In particular, I have been focusing on ways to make robust statistical claims about how believable a certain result is — essentially creating tools that allow astronomers to reliably distinguish between what is a signal and what is noise.
How are you able to distinguish this?
We’re working with datasets of wavelengths of light seen as a planet passes in front of a star. When the planet crosses in front of the star, there’s a dip in light since the planet is blocking part of the starlight, and scientists can track that dip. With these datasets, astronomers can measure the effective size of the planet at different wavelengths. If the atmosphere is full of, say, nitrogen, there will be less light detected at the wavelengths nitrogen absorbs. In that case, we would expect the planet to look effectively larger at those wavelengths. We’re trying to measure these minute changes in the visible size of these planets as they transit to identify the molecules that might be present in an atmosphere.
However, there are lots of different things that can affect the effective size. For example, there can be instrumental effects, such as changes in the detector’s temperature or the focus and alignment of the instrument. Different viewing factors — like the orbital period or the angle at which you’re viewing the planetary system — can also affect what you’re seeing. This makes data analysis a high-dimensional problem, meaning there are lots of parameters you have to sort out to get out the information you’re interested in. Some of these parameters are correlated, which makes understanding them an even greater challenge.
Working through all the parameters can be extremely computationally expensive, so I’m interested in building tools that make those computations easier. This typically involves writing code that helps tease out these differences using statistical analysis methods, such as Bayesian inference and Markov chain Monte Carlo methods. I try to write computationally efficient codes that can be used with any dataset. This way, they can be useful to astronomers working with data from different telescopes. Interestingly, this work often ends up having applications in other areas outside of exoplanet research.
What other applications has your work had?
One area that really benefits from the progress that we’ve made in exoplanets is the study of stellar astrophysics. If you want to study exoplanets, you also have to understand the stars they orbit. My work is useful for characterizing stellar variability, stellar rotation and different types of stellar activity. Asteroseismology, an approach that uses internal waves to probe the interiors of stars, can also be done with these tools. I’ve even heard about my work being used in agriculture for micro-weather forecasting.
As tools and datasets get better, we can ask harder and more complex questions. Making computational tools faster and more reliable will help us toward this goal. I hope my research enables astronomers to be ambitious with the kinds of questions that they’re asking about the makeup of the cosmos.