GNN - Genome News Network  
  Home | About | Topics
Yeast Proteomics
Mapping protein interactions and complexes on a genomic scale
Edward R. Winstead

Featured article.

Life is about networking. At a cellular level, networking is what we are all about. Genes make proteins but proteins seldom work alone. Instead, they bind to each other and interact, often as parts of complex structures. Complete genome sequencing has enabled scientists to generate lists of all the proteins that sequenced organisms are likely to make. And now, they have begun isolating protein complexes and mapping protein interactions to understand how cells function. Three new studies, done in yeast, show just how much information is involved.

Detail from yeast SH3 domain protein-protein interaction network. View larger

In one study, nothing happened for a year because the project was stuck. Three laboratories—in Italy, Canada and the United States—had generated an unexpectedly large amount of diverse data on protein interactions in yeast. They were stalled because the masses of raw data seemed meaningless. With no obvious remedy, Charles Boone, one of the project's leaders, stored the files on his computers at the University of Toronto.

‘We applied our computational tools to their amazing data’

Several months later, Christopher W.V. Hogue, whose bioinformatics team was setting up a database of molecular interactions, joined the project when Boone moved to a laboratory near Hogue's. They translated the spreadsheet information into data that could be analyzed, organized, and represented with computational tools.

"Charlie Boone was used to drawing these networks out on paper but you couldn't do that here," says Gary D. Bader, one of the bioinformaticists at the University of Toronto. "The collaboration with him was a perfect combination of skills. We were able to apply our computational methods to all the amazing data he had and create the pretty pictures that make it easy to understand the data."

The pictures were protein-interaction maps that let the biologists finally view their own work. The project then had four leaders: Boone, Hogue, Stanley Fields of the University of Washington in Seattle, and Gianni Cesareni of the University of Rome. Their findings have just been published in Science.

"The whole experience underscored for me the value of computational biology," says Boone. "Until we hooked up with the creative bioinformatics people, we could not interpret the data in a way that let us communicate the story to others in a clear and interesting manner."

This is the story. Four laboratories working together identified 59 interactions among proteins in yeast by combining two sets of information on protein interactions—one derived computationally and the other experimentally. Both data sets, each with several hundred possible interactions, had 59 interactions in common.

HMS-PCI Dataset. View larger

The study was designed to address a data problem that has emerged in recent years with the increase in genomic studies. Large data sets by their very nature have limitations. Any single technology or method can generate significant numbers of false positives or fail to detect results, for instance.

One solution is to combine data sets derived from different sources. This can help in the same way that reading news stories from independent sources can lead to a better understanding of what actually happened. Combining genomic data sets is a challenge, however, because of the volume and diversity of the information.

As with genomics, the first stage of any proteomics study is in large measure about data processing. Analyzing every protein in any organism presents a special challenge because proteins function by working with other proteins. To truly understand human biology and how cells work, one has to identify every human protein, its structure and the complexes it forms with other proteins.

"The idea was to strengthen the impact of what we got by using complementary strategies," says Stanley Fields. He credits Boone with recognizing the potential of the collaboration and making it happen.

"All four labs were major contributors of both data and intellectual creativity," says Boone. "I don't think we could have generated this paper if any one of the labs dropped out. The fun part was that everybody got along, so we sat back and had a good time."

Boone and Cesareni had both been studying the same protein sequences, or domains, that mediate interaction between proteins.

This led to the first—computational—approach. The process, known as phage display, starts by identifying structural motifs in protein likely to bind the SH3 domain, which is done using viruses. The whole yeast proteome is then screened for proteins with this structure. Finally, computational tools are used to assemble candidates into a predicted network of protein-protein interactions.

The Italian team used phage display to identify 394 interactions among 206 proteins.

The second approach is experimental and tests specific proteins in yeast for interactions with the SH3 binding domain. The high-throughput process, called two-hybrid testing, yielded 233 interactions among 145 proteins. These data came from the laboratory of Stanley Fields, who developed the widely-used test.

Analysing protein interactions. View larger

Interrelating the data sets was the tricky part. "The importance of the computational side of things is one lesson that emerges from all this," says Fields. "Everything is underpinned by computational methods that have been vastly improved in recent years. And clearly many new methods will be needed to move forward."

Bader and Hogue created the database for storing all types of molecular interactions and aim to be the central repository for such data. "The genome has provided a lists of parts, and right now everyone is trying to figure out how all the parts fit together," says Bader. "We're storing the information about these interactions." The online database is called BIND, for Biomolecular Interaction Network Database.

Two other new large-scale protein studies are in the process of depositing their results in BIND. The groups—one in Canada and one in Germany—isolated and characterized hundreds of protein complexes. The findings are published in the same issue of Nature.

The studies used a special technique to fish out protein complexes and analyze them. Tagged individual proteins were used as 'bait' to pull out other proteins from yeast cells. The clusters were analyzed with mass spectrometry, and computer algorithms identified the component proteins based on their structures.

The German group, led by Giulio Superti-Furga, of Cellzome AG in Heidelberg, identified 1,440 distinct proteins, which they grouped into about 200 complexes.

The Canadian researchers, using about 500 bait proteins, detected 3,617 interactions, representing 25 percent of the yeast proteome. Daniel Figeys, of MDS Proteomics in Toronto, led the team.

"Our goal was to demonstrate that mapping the interactions of a proteome was tractable," says Michael F. Moran, of MDS Proteomics. "This study was a test-drive of the high-throughput platform that was built to analyze human proteins as part of our drug discovery."

"The novelty of these studies is the scale," says Anuj Kumar, a researcher at Yale University in New Haven, Connecticut, who co-authored a commentary accompanying the papers in Nature.

No single approach will accurately and comprehensively resolve all protein complexes on a large-scale, says Kumar. "But this particular type of mass spectrometry-based approach identifies a large number of proteins for further detailed study." He adds: "The really interesting results and surprises will come when people look at the complexes in detail."

Detail from the protein complex network. View larger

Kumar's co-author, Michael Snyder, led the Yale team that created protein microarrays containing nearly the entire yeast proteome. They used the microarrays—glass slides with 5,800 yeast proteins—to screen the proteome for biochemical interactions. This was another example of cutting-edge technology developed in yeast.

Why yeast? Yeast is way out in front as a testing ground for genomic and proteomic tools that are later used to study other model organisms and humans, says Fields. The organism, Saccharomyces cerevisiae, was sequenced in 1996.

"When researchers want to establish a new technology like the mass spectrometry protein analysis, the place to go is yeast," says Fields. "The more yeast is used for cutting-edge approaches, the more amenable it becomes to others."

. . .

Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141-147 (January 10, 2002).
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180-183 (January 10, 2002).
Kumar, A. & Snyder, M. Protein complexes take the bait. Nature 415, 123-124 (January 10, 2002).
Tong, A.H. et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321-324 (January 11, 2002).
Gerstein, M., Lan, N. & Jansen, R. Proteomics: Integrating interactomes. Science 295, 284-287 (January 11, 2002).
Bader, G.D. et al. BIND—The Biomolecular Interaction Network Database. Nucleic Acids Res 29, 242-245 (January 1, 2001).

Back to GNN Home Page