GNN - Genome News Network  
  Home | About | Topics
Mass-computing effort brings public into genome fold
By Adam Marcus

Featured article.

It could be your neighbor, your friend, that woman on the subway with the purple hair. Somewhere out there, more than 30,000 people every day are yielding their home computers to the service of science.

More specifically, these people are volunteers in a vast and democratic experiment in so-called distributed computing, with their individual machines working together to answer questions about human and animal genomes and the way proteins fold.

Genome @ home logo with link
Folding @ home logo with link

The two projects—Genome@home and Folding@home—are the twin visions of Vijay Pande, a physical chemist at Stanford University in California. Folding@hHome is an effort to analyze the folding properties of proteins, while Genome@home is digging for the final structure of virtual proteins and the genes that code for them.

"We're trying to reverse-engineer genomes by taking all existing folds and designing new sequences for those folds," Pande says. "That will give us a genome that could have existed but may or may not have turned up yet." The end result might be new, therapeutic proteins for diseases in which the molecules are awry, such as Alzheimer's.

Thanks to Folding@home, protein expert Kevin Plaxco, of the University of California, Santa Barbara, and his colleagues can get real-time simulations of the way proteins fold. They can then use these simulations to vet their own theories and the experiments they conduct using real molecules.

"Our experiments give us only a very limited view of what's happening, whereas in simulation all of the details are there," says Plaxco.

So far, Pande's group has farmed out small proteins consisting of between 40 to 50 amino acids. Their goal is to reach 100 amino acids—about the number needed to study human diseases that involve folding problems. That may take another 10-fold increase in computing power, or another 270,000 volunteers willing to turn over their processors to the greater good. Pande believes the goal is attainable.

As it happens, there appears to be a limited catalogue of protein shapes. Many different sequences of amino acids can produce the same fold, and chains that are only 50 percent identical can have similar structures—making the job of identifying shapes somewhat easier for scientists. "Evolution just does what works," says Pande.

Pande [pronounced paan-DAY] came up with the idea for the projects as a new faculty member at Stanford in 1999: Parcel out small parts of big problems to thousands of personal computers, borrowing idle memory and processing time to create a whole bigger than the sum of the parts. The application takes only about half a megabyte—and two to three megabytes when running—and shouldn't sap speed or performance even when users are working in other programs.

Since October 2000, over 100,000 users throughout the world have participated in Folding@Home.

Conventional wisdom would dictate that researchers who are trying to understand protein folding or looking for new genes would need supercomputers like those used by large federal or private research laboratories. These machines are fast and powerful, but they are also in great demand—securing large blocks of time for crunching data can be difficult. And as fast as they are, supercomputers take months to simulate even a few fractions of a second of folding.

"It was clear that distributed computing offered a large computational resource," says Pande, who received his doctorate in physics from the Massachusetts Institute of Technology in Cambridge. But was it doable? "It's one thing to say that it's possible. It's another to put your money where your mouth is."

The answer clearly has been yes. Pande's group recently became the first to simulate folding starting with an amino acid sequence and working up from the level of atoms. They have also simulated the most prolonged fold, on the time scale of milliseconds—about 1,000 times longer than the previous record attained with supercomputers.

The project has been working on several different protein fragments, including such colorfully named structures as beta-hairpins, villin headpieces and zinc fingers. The last is a common feature of many proteins that helps them bind to DNA. Pande can't discuss his latest findings, which are set to appear in press.

Distributed computing isn't novel. The Search for Extraterrestrial Intelligence Institute in Mountain View, California, known as SETI, has used the strategy for several years to search for electronic transmissions from alien life forms. The SETI@home project lets anyone with a computer and an Internet connection participate in its search for extraterrestrial intelligence. Other projects employing the same principle include evolution@home and FightAIDS@Home.

Unlike, say, number crunching, most of the problems in proteomics and genomics aren't easy to do in parallel—a fact that initially earned Pande fairly hefty skepticism. "If you want to study the motion of a protein and its function, it's like following someone when they're walking home. One step has to follow the last and you can't skip ahead," he says.

Many critics believed it would be impossible to cajole a loosely knit fabric of individual computers, each working on a separate aspect of a problem, into churning out good data. And there was some doubt the project would 'scale' well—in other words, would it in fact be slower, not faster, than doing it one step at a time?

Pande and his colleagues wrote the original software code in 1999, nothing terribly cutting edge, although the networking and security aspects are strong, he said. "In principle it sounds trivial, but in practice getting it to work without bugs is harder." For most participants the program runs fine, though it has been dyspeptic on the now-obsolete Windows 95.

The main expense has been server equipment, which chip giant Intel Corp., of Santa Clara, California, provided with a sizeable grant. In all, Pande says, the project has cost about $100,000—a bit more than $1 per participant.

Simulation of a folding protein from Folding@home.

Proteins are nature's version of an erector set—that can assemble itself in a matter of a few microseconds up to several minutes. Composed of amino acids, the chains fold into intricate shapes that help give them specific functions. That they can fold so quickly is one of nature's more perplexing mysteries.

For instance, it seems that a given protein should take longer than the age of the universe (about 14 billion years) to find its correct configuration. Clearly it does not, and researchers don't understand why. That uncertainty has prompted a temporary shrug—called Levinthal's paradox—that has lasted nearly forty years while the answer resolves. "We haven't worked ourselves out of a job yet," says Plaxco.

Plaxco says that although he saw the value in detailed simulations of the process, he was initially skeptical Pande's idea would work. But Pande's early successes have convinced him the approach is viable. "I was wrong there," he admits.

Martin Gruebele, a biophysicist at the University of Illinois, collaborates with Pande to understand the folding dynamics of a protein called BBAW. This 'mini'-protein has the distinction of being the fastest folder yet discovered, pulling its act together in a few microseconds (precisely how speedy will be revealed in an upcoming Nature article).

This pace—a few microseconds—is excruciatingly slow for computer simulators, but blindingly fast for experimentalists. "It turns out that their simulations compare quite nicely with our experiments," notes Gruebele.

BBAW is also among the proteins Pande's group has been trying to redesign through Genome@home. While that might simply mean coming up with a faster folding sequence, it could also lead to improvements in function by uncovering an optimal combination of amino acids. This could in theory help catalyzing proteins speed chemical reactions or make protein-based drugs more durable in the face of high temperatures or acidic environments, according to Gruebele.

In addition to its role as timekeeper, the folding project performs another valuable role that hasn't been possible before. It can tell researchers a great deal about the stability of proteins. This property reflects the strength of the many, exquisitely weak interactions that tie the molecule's backbone together. The trouble with trying to simulate it has been that the margin of error for each bond is so high relative to the strength of the connection that even tiny errors, taken together, can destroy an estimate.

Until now, Gruebele says, the stability problem had vexed computational researchers. But Pande's folding simulation has proven remarkably accurate at predicting figures that jibe well with his group's experimental values for the stability constant—the ratio of folded to unfolded proteins at a given temperature. "It sounds innocuous but it's really very important if we want to learn how to design stable proteins," he says.

More information on the two projects Genome@home and Folding@home can be found here.

. . .

Back to GNN Home Page