UW News

October 30, 2008

Danger, massive amounts of data ahead; eScience Institute can help

News and Information

The UW’s new eScience Institute will help keep the UW competitive as research projects become ever more reliant on computation and on the analysis of massive amounts of data.

The Institute is being launched with a special event from 3:30 to 5 p.m. Wednesday, Nov. 5, in the Microsoft Atrium of the Paul G. Allen Center for Computer Science & Engineering. The event will include presentations by Provost Phyllis Wise; Ed Lazowska, interim eScience director and Bill & Melinda Gates Chair, Computer Science & Engineering; Dan Fay of Microsoft Research; David Baker, professor of biochemistry; Martin Savage, professor of physics; and Andy Connolly, professor of astronomy.

In the past few decades, high performance computing — simulation — has revolutionized many fields of science and engineering. Today, an exponentially exploding volume of data from cheap sensors is creating a second, and even more pervasive, revolution. The Sloan Digital Sky Survey, the most ambitious astronomical survey in human history, has collected 15 terabytes (15 trillion bytes) of information over the past decade. In contrast, the Large Synoptic Survey Telescope, which is now under construction, will amass 30 terabytes of data every day. A single table-top gene sequencer produces 1 terabyte per day. Projects such as the Large Hadron Collider, the world’s largest high energy particle accelerator, will generate 60 terabytes a day.

Locally, the UW is the lead institution on the deployment of the Regional Scale Nodes of the Ocean Observatories Initiative funded by the National Science Foundation. This ambitious project will connect thousands of chemical, physical and biological sensors through a network of 2,000 kilometers of fiberoptic cable. The volume of data created by this network will be immense.

“In the future, data-driven science will be everywhere,” Lazowska says. “Sensors will be ubiquitous. This data-centric approach will be true of the social sciences and engineering as well as physical and natural sciences.” The ability to gather mountains of data is made possible by a number of other factors, including the continuing decline in prices for mass digital storage and powerful computing hardware, as well as the development of ingenious software to manage not just single computers but vast networks. Extracting knowledge from these massive data sets will be at the heart of 21st century discovery, Lazowska says.

Individual researchers, and often teams of researchers, are presented with formidable challenges in trying to figure out ways to analyze such volumes of data. The eScience Institute will offer a Universitywide capability with expertise in high-performance computing, data management and data discovery.

The Institute received $1 million in funds from the State Legislature earlier this year, which is permitting the creation of a small team of research scientists and faculty. The research scientists will be able to do much of the hands-on work and consulting as a shared campus resource, while faculty will ensure that the University remains at the forefront as the technology rapidly evolves, as well as offering classes to students who will need eScience skills in their career. The institute is advised by a steering committee of faculty who are working in eScience fields.

To guide the effort, UW Technology recently interviewed about 150 of the premier principal investigators at the UW. Many expressed concerns regarding lack of coordination in high performance computing (e.g., server space and software technologies), as well as an explosion of research data and a lack of knowledge and guidance regarding how to manage and analyze this data. “It’s clear there are hundreds of investigators across campus who are in need of support,” Lazowska says. “The Institute will be creating a network of support and will facilitate communication among investigators over data management and computing issues. The demand for this is huge.”

So great is the demand that Lazowska can easily foresee a future in which eScience specialists will be a routine part of many grant requests. The investment in a free-standing institute will not necessarily be large, “but it will be a highly focused initiative with institution-wide leverage,” he says. “Research-intensive institutions will either develop eScience capabilities that are broadly available, or they will cease to be competitive.”

Lazowska sees the Institute as an important way of building connections among researchers who share similar concerns. “We have a lot of great people who are working in isolation,” he says. “We have world class computer networking. But an essential piece to maintain our competitiveness will be providing an eScience capability across the campus.”