Gaia Data Releases Spark Open-Access Success
The dome of the Hayden Planetarium in New York City burst into a confetti-like field of colorful puffs of light. Below, an audience of scientists and non-scientists alike oohed and aahed at the multicolored light show.
“It looks like cotton candy falling down,” observed astrophysicist Jackie Faherty, a senior scientist and education manager at the American Museum of Natural History in New York City. “It’s pretty, right? But it’s also science.”
The globs of ‘cotton candy’ projected onto the planetarium’s ceiling represented stars scattered throughout the Milky Way galaxy. The vibrant colors signified the abundance of metal in each star, as recently gleaned from measurements taken by the European Space Agency’s Gaia spacecraft.
As the after-hours planetarium show continued, more spectacles from Gaia appeared above the captivated audience. As stars circled the galaxy like leaves caught in a whirlpool, the cosmic dance of co-moving stars revealed themselves, and previously unseen wonders of the Milky Way came into focus. The data powering the cosmic displays, released by the Gaia team just a month prior, had already profoundly changed the way astrophysicists look and think about our home galaxy.
“We can see both the bulk motion of the stars and the peculiar motions of the stars with respect to the bulk motion,” Faherty said. “And in those motions, you can see all sorts of irregularities and complexity in the Milky Way disk. [Gaia] has turned the Milky Way into a visibly dynamical and time-dependent object.”
The planetarium show capped off the second of two ‘hack’ sessions at the Flatiron Institute’s Center for Computational Astrophysics (CCA). Held in late April and early June, the two events gathered astronomers from around the world to collaborate on unlocking the full scientific potential of the newly released Gaia data. Attendees gleaned new insights into old puzzles, including the structure of the Milky Way’s dark matter and the origin of jumbo-size, Jupiter-like objects.
The Gaia team “changed the world of astrophysics overnight,” says CCA group leader David Hogg, who organized the events. “So much diverse science was happening. We had people working on the solar system. We had people working on the universe as a whole. We had people working on dark matter. We had people working on supernovae. It was incredible.”
The Gaia spacecraft, launched in 2013, is creating a 3-D map of the Milky Way by measuring the positions and distances of stars with unprecedented accuracy. Gaia gauges the distances to distant objects using the parallax effect. As the Earth orbits the sun, stars appear to shift in the sky. The extent of this shift depends on the distance to the star. The shorter the distance, the greater the shift.
Over a five-year period, Gaia will precisely measure the parallax of around one percent, or roughly 1 billion, of the galaxy’s stars. The spacecraft will also observe distant quasars and exoplanets as well as nearby objects such as asteroids and comets. Such precise measurements are impossible using ground-based telescopes because of the distortion caused by Earth’s atmosphere.
In September 2016, the Gaia team released the first batch of data from the mission encompassing the distances and motions of around 2 million stars. Those data prompted discoveries, including 18 scientific papers that resulted from a five-day research marathon hosted at the Flatiron Institute. The Gaia dataset that was released on April 25, 2018 completely dwarfed that initial release, upping the number of stars to around 1.7 billion.
Fortunately, aware of the sheer scale and importance of the April release, Hogg had started preparing early. About a year earlier, he began organizing weekly prep workshops at the Flatiron Institute for local astronomers. Hogg and other attendees discussed plans for how best to hit the ground running from day one and developed computer code that could probe the dataset for discoveries.
Initially, Hogg planned to have a small gathering of around a dozen astronomers on release day. As word spread of the research jam session, more and more researchers asked to join in. Ultimately the guest list topped out at 75 attendees, including five journalists, all itching for a fresh look at the Milky Way.
What made the Gaia release particularly exciting, Hogg says, was a selfless decision made by the spacecraft’s scientific team. With large scientific missions, the scientists behind the endeavor typically get first dibs on the data. This head start allows the team to make discoveries and write research papers months before anyone else can even touch the data. The Gaia team instead opted to give the data to everyone right away.
“When we were looking at the Gaia data, we were seeing it for the first time, but so were the people who worked on the mission,” Hogg says. “They literally released it to themselves and the world simultaneously. It was truly brand new, and nobody knew what was in there.”
The decision to release Gaia’s data without a proprietary period was made “right from the start” in the early 2000s, says Gaia support scientist Jos de Bruijne. Gaia’s predecessor, the Hipparcos spacecraft, had a one-year proprietary period. Divvying up the data was a nightmare, de Bruijne recalls, with many sponsor countries hankering for prime early access.
“For Gaia, we would have been complexifying the issue by an order of magnitude because it will have many more objects, many more things you can do with the data, and a lot more people interested to get involved,” he says. “We said let’s instead concentrate our efforts on getting the data right, getting it out as quickly as possible and letting the politics rest.”
Bruijne admits that Gaia’s open-data decision may not be the best course for every mission. Postdoctoral researchers rely on publications for career advancement, for instance, and may feel dissuaded from committing to long-term projects without a proprietary period. Funding sources may also require that their scientists get a head start digging into the data.
For Gaia, the decision was ultimately the right one, Bruijne says. “I’ve seen a lot of positive reactions about the data being public immediately. To see that the community was eager to get their hands on the data was rewarding.”
Before dawn on April 25, release day, Hogg and a handful of others were already caffeinated and gathered at the Flatiron Institute in New York City watching the Gaia team’s press conference and waiting for the 6 a.m. ET data release. “Hi everybody, it’s Gaia day!” said astrophysicist Kathryn Johnston of Columbia University with a surprising amount of chipperness for so early in the morning.
For an hour or two after the Gaia team officially released their data, attendees encountered connection issues while trying to download the files. “The servers are not able to handle all the enthusiasm,” Faherty joked.
With the files eventually in hand and a roomful of astronomers eagerly sifting through the data, shouts of discovery occasionally pierced the babble of clacking laptop keys. Attendees would crowd around computers to look at a never-before-seen feature of the Milky Way’s structure or the surprising choreography of a band of co-moving stars.
“Even though we spent the last year preparing, it was chaos for the first few hours,” Hogg recalls. “We were still completely swamped by the quantity of the data. We were still completely surprised by what was in the data. People, in the end, didn’t end up doing the projects they prepared to do because other things caught their attention. That’s the way data analysis really is: you spend a lot of time getting ready for a dataset, but what you end up doing is often quite different.”
The event was an ‘un-conference,’ as Hogg describes it. The hackathon had no predetermined schedule, no invited speakers, no strict structure. “Everybody was here to work,” he says. “The idea is to get people together to benefit from each other’s working expertise.” Following the lead of the Gaia team, attendees agreed to openly share their ideas, expertise, code and interim results with other participants and the greater scientific community.
Just hours after the data release, astrophysicists Adrian Price-Whelan of Princeton University and Ana Bonaca of Harvard University discovered something surprising in the Gaia dataset that could help illuminate the distribution of dark matter around our galaxy.
They studied a long, thin string of stars that move together in the relatively barren, though potentially dark matter–rich, halo that surrounds our galaxy. These stars were probably once part of a cluster that was stretched and pulled like taffy by the Milky Way’s gravity. Thanks to Gaia, Price-Whelan and Bonaca could examine the conga line of stars with unprecedented clarity. The clearest look yet of the river of stars revealed a surprise: a big gap in the string where something knocked some stars out of alignment.
The gap was probably caused by the string passing too close to a massive object, such as a giant interstellar cloud. However, given the location of the stars, a more exciting culprit could be to blame: an invisible clump of dark matter. The existence of such clumps, Bonaca says, “has been a big question mark in dark matter theory” — predicted but never observed. The yawning chasm between the stars could be one of the first clear signs that such clumps exist. “It’s still highly, highly speculative,” Price-Whelan says, “but when we looked at this, our first reaction was ‘oh my God.’”
The discovery wouldn’t have happened without Gaia, Price-Whelan says. “None of this work was possible before Gaia because the differences in the stars’ motions in the sky are so small,” he explains. “In past surveys where you could measure these proper motions, you wouldn’t be able to distinguish the stars in the stream from stars in the Milky Way.”
Faherty also found new clarity thanks to Gaia, discovering a testbed for learning more about the origin of astronomical objects that straddle the boundary of planets and stars. These brown dwarfs, also called “super Jupiters,” are more massive than the gas giants in our solar system but not so massive that they undergo the nuclear fusion that powers a star. Astronomers aren’t entirely sure where or how brown dwarfs form.
For three years, Faherty traveled back and forth to South America to observe brown dwarfs via telescope. She braved lousy weather, high winds and a car accident winding her way to the mountaintop observatory. “It felt like a war to get to the numbers,” she recalls. “All of it produced 75 brown dwarf parallaxes, which at the time was huge. Gaia just destroyed that. It has now redone my whole thesis and then some, which I’ve decided I’m OK with.”
In the Gaia data, Faherty found high-precision parallax estimates for hundreds of brown dwarfs. One group of stars, in particular, caught her attention. About 130 light-years from Earth, the star system includes a brown dwarf in a distant orbit from a small red dwarf star. Other brown dwarfs live in the star system as well. Some tightly orbit their stars while others float untethered through space. The single star system hosts brown dwarfs in every configuration.
“It’s like a family picture,” Faherty says. “Now I’ve got them alone, I’ve got them far, and I’ve got them close in. So how did they form? Now we can start looking for the smoking gun signatures. Maybe something in their atmospheres, of how they formed.”
The ‘hackathon’ prompted many more scientific discoveries and collaborations. By the Monday following the hackathon, two scientific papers based on work done at the Flatiron Institute event had already been submitted to scientific research journals.
The same focus on fostering collaborations and facilitating the competition of scientific papers continued around five weeks later at the second Gaia event, called the Gaia sprint. Ninety participants gathered at the Flatiron Institute, this time with a bit more familiarity with the recently released data.
Even with more than a month’s more time with the Gaia dataset, people were still just starting to discover what’s in the data,” Hogg says. “When you have 1.7 billion objects, it’s really hard to understand what’s in the data. People were still totally in exploratory mode at the Gaia sprint, which was fine with me. It was a really exciting environment, lots of exciting things happened, and people made real discoveries during the week.”
And the Gaia discoveries will keep rolling in. Researchers from the New York City area gather each Wednesday at the Flatiron Institute to work on the dataset and swap ideas. Next year researchers will gather for another Gaia sprint, this time at the Kavli Institute for Theoretical Physics at the University of California, Santa Barbara, and the astronomy community is already looking forward to the next batch of Gaia data slated for release in late 2020.
“There’s so much to do with the data that we’re going to stay excited about this for a decade,” Hogg says.
Gaia’s open-data policy heightens the excitement, he says. “It shows that scientists are motivated to do their best possible. They’re not motivated by the trappings of success. They’re motivated by principles of truth and care and correctness. That was exciting. People all over the astronomical community are discussing this point now, which is how do we do these projects. We’ve always thought that having a one-year proprietary period was a requirement to make a project work. Gaia shows that you can instead work in an amazingly open way.”