New Institute Pushes the Boundaries of Big Data

The Simons Foundation launches Flatiron Institute, a group of computational centers exploring galaxy formation, genomics, neuroscience and other fields

At Flatiron Institute, researchers will have ample space to meet and discuss their work. © Perkins Eastman
At Flatiron Institute, researchers will have ample space to meet and discuss their work. © Perkins Eastman

Each year thousands of genomes are sequenced, millions of neuronal activity traces are recorded, and light from hundreds of millions of galaxies is captured by our newest telescopes, all creating datasets of staggering size. These complex datasets are then stored for analysis.

Ongoing analysis of these information streams has illuminated a problem, however: Scientists’ standard methodologies are inadequate to the task of analyzing massive quantities of data. The development of new methods and software to learn from data and to model — at sufficient resolution — the complex processes they reflect is now a pressing concern in the scientific community.

To address these challenges, the Simons Foundation has launched a substantial new internal research group called the Flatiron Institute (FI). The FI is the first multidisciplinary institute focused entirely on computation. It is also the first center of its kind to be wholly supported by private philanthropy, providing a permanent home for up to 250 scientists and collaborating expert programmers all working together to create, deploy and support new state-of-the-art computational methods. Few existing institutions support the combination of scientists and programmers, instead leaving programming to relatively impermanent graduate students and postdoctoral fellows, and none have done so at the scale of the Flatiron Institute or with such a broad scope, at a single location.

The institute will hold conferences and meetings and serve as a focal point for computational science around the world.

Software developed at the Flatiron Institute that models organic molecules like DNA, RNA and proteins successfully predicted this interaction between a synthetic antibody (shown in pink, yellow and blue regions) and a cancer-cell protein. If the antibody successfully disrupts the functioning of the cancer cell, then there are implications for treatment of human cancers. Courtesy of Richard A. Bonneau, Center for Computational Biology, Flatiron Institute

The FI is led by Leslie Greengard, Silver Professor of Mathematics and Computer Science at New York University’s Courant Institute of Mathematical Sciences, and David Spergel, Charles A. Young Professor of Astronomy at Princeton University.

Greengard directs the FI’s Center for Computational Biology (CCB). Launched in 2013 as the Simons Center for Data Analysis, CCB now has a staff of 32 and is expected to grow to approximately 55 in the next few years. CCB comprises five subgroups working on biophysical modeling, genomics, neuroscience, numerical algorithms and systems biology. The methodologies being developed at CCB are aimed at advancing basic scientific knowledge, with long-term potential for understanding and treating human disease, as well as understanding broader aspects of Earth’s ecosystem. As an example, the genomics group, led by deputy director Olga Troyanskaya, has made advances in predicting autism-relevant genes by merging and analyzing massive public genetic datasets.

David Spergel directs the institute’s Center for Computational Astrophysics (CCA), where researchers ask statistical questions of data in novel ways and create large-scale, complex simulations of events in deep space. CCA boasts subgroups in galaxy formation and statistical astronomy.

“I am excited to be at an institute where we are co-located with biologists and people from other fields of science, who are all using similar tools to address very different problems,” Spergel says. CCA’s work may help to answer fundamental questions in astrophysics, such as: What is dark energy? What is dark matter? What can we learn about black holes, neutron stars and general relativity from the new gravitational wave detection? What is the history of our galaxy’s formation? What sets the properties of planetary systems?

When stars die, the more massive ones explode as supernovae, sometimes blowing the gas that comprises them entirely out of their home galaxies. Scientists use supercomputer simulations to model these explosions and how they influence their surroundings. This image shows dense streamers of gas in one region of the universe where many galaxies are embedded; the orange and yellow regions are hotter gas that has been blown out of nearby galaxies by exploding stars. Credit: Chris Hayward, Center for Computational Astrophysics, Flatiron Institute; and Phil Hopkins, California Institute of Technology; and the FIRE Collaboration, fire.northwestern.edu.
When stars die, the more massive ones explode as supernovae, sometimes blowing the gas that comprises them entirely out of their home galaxies. Scientists use supercomputer simulations to model these explosions and how they influence their surroundings. This image shows dense streamers of gas in one region of the universe where many galaxies are embedded; the orange and yellow regions are hotter gas that has been blown out of nearby galaxies by exploding stars. Credit: Chris Hayward, Center for Computational Astrophysics, Flatiron Institute; and Phil Hopkins, California Institute of Technology; and the FIRE Collaboration, fire.northwestern.edu.

In the next few years, the foundation plans to launch two more computational centers in distinct areas of study within the FI, creating a fertile climate for discovery.

“My background in mathematics, coupled with working as a code cracker and founding a quantitative financial firm, taught me the power of large-scale data analysis,” says foundation chair Jim Simons. “It seems clear that data is going to play an increasingly important role in all aspects of science and it is very exciting to create an organization to exploit that realization. My hopes are high that Flatiron will make important advances in basic science, as well as in translating many of their findings to applications of direct benefit to humanity.”

In addition to the computational centers, the FI hosts a scientific computing core, led by co-directors Nick Carriero and Ian Fisk. This core develops and deploys the computing infrastructure — including new computational and statistical methods and storage and data-handling systems — necessary for carrying out the centers’ research. It also creates and disseminates software tools for the use of various scientific communities.

“An important decision we made,” says Marilyn H. Simons, president of the Simons Foundation, “was to not constrain these scientists with the customary academic metrics or timescales. We understand the complexity of their task and the risk involved in this new area of science. In other words, there is no ‘publish or perish’ here. And since all the work will be supported by the foundation, scientists needn’t be distracted by applying for grants and can spend all their time doing research, a benefit rarely enjoyed in academia.”

“What we can do at the Flatiron Institute — hard to do in an academic environment — is work on projects that might take some years to produce results,” Greengard adds. “Many of the problems we are seeking to solve require a substantial amount of tool building and don’t lend themselves to regular patterns of publication. We have the luxury here of working on such projects that will ultimately lead to powerful methods for analysis.”

Algorithms developed by Flatiron Institute researchers enable automatic detection of neurons and neuronal processes contained in the data streams generated by brains of test subjects. Here, the pink and yellow objects are active neurons or neuronal processes detected in the brain of a mouse engaged in a behavioral task. Courtesy of Eftychios A. Pnevmatikakis and Andrea Giovannucci, Center for Computational Biology, Flatiron Institute; data from Sue Ann Koay, Tank Lab, Princeton Neuroscience Institute, Princeton University.
Algorithms developed by Flatiron Institute researchers enable automatic detection of neurons and neuronal processes contained in the data streams generated by brains of test subjects. Here, the pink and yellow objects are active neurons or neuronal processes detected in the brain of a mouse engaged in a behavioral task. Courtesy of Eftychios A. Pnevmatikakis and Andrea Giovannucci, Center for Computational Biology, Flatiron Institute; data from Sue Ann Koay, Tank Lab, Princeton Neuroscience Institute, Princeton University.

“Computational tools are reshaping the way we do science,” says Spergel. “The next step is to shift from individuals working alone to large teams of scientists to advance these approaches. Having an institute like Flatiron really enables us to take that step, to collaborate on a large scale and do transformative science.”

The Simons Foundation, located at 160 Fifth Ave., has leased 162 Fifth Ave. in its entirety to house the Flatiron Institute. Perkins Eastman, a nearby planning, design and consulting firm, led the redesign of the building’s interior to create the collaborative atmosphere that will be the hallmark of the institute. The interior’s design draws inspiration from the legendary Flatiron Building located just across Fifth Avenue.

Spergel concludes, “The conversations we can have here at lunch or at coffee are really special. There aren’t other places in which those conversations could happen. New ideas are bound to come out and drive research forward.”

Recent Articles