|Making predictions about the proteome|
|Using fly and worm data, computers classify human proteins by function|
Edward R. Winstead
February 12, 2001
The molecule of the moment is DNA, but proteins, some might say, are the real deal. Proteins carry out the functions of a cell, and scientists from drug developers to cell biologists want to know every human protein by name, structure, and function. Realizing this goal is no small accomplishmentthe human 'proteome' is likely to be significantly larger than the genome. But for researchers seeking the real deal, the human genome sequence is the place to start.
Human sequence data are already being used to advance proteomics. This week's Science reports an attempt to classify more than 26,000 human proteins by their predicted function. The proteins represent 26,383 human genes identified by J. Craig Venter of Celera Genomics, in Rockville, Maryland, and colleagues during their sequencing of the human genome. Venter's team grouped 60 percent of the proteins in broad functional categories like 'immune response,' 'signal transduction' and 'neural structure, function, development.'
The computers, using sophisticated bioinformatics tools, assigned proteins to broad families based on similarities in DNA sequence, matching those of unknown function with those of known function, when possible. The process yielded concrete if generalized data. The paper includes tables of functionally related proteins from the sequenced genomes of five specieshuman, fly (D. melanogaster), soil worm (C. elegans), yeast, and the plant Arabidopsis.
The protein analysis focused on the animal genomes. The researchers compared the sequences against each other: fly-worm; human-fly; human-worm. This generated 'orthologs'genes similar in structure that can be traced by descent to a common ancestor of the species. Counting only proteins with unambiguous one-to-one relationships as orthologs, the researchers found 2758 human-fly orthologs, and 2031 human-worm orthologs.
Between these two sets there are 1523 orthologs. "We define the evolutionary conserved set as those 1523 human proteins that have strict orthologs in both D. melanogaster and C. elegans," the researchers write in Science.
The overview includes biological differences between the species that are apparent in the genome. For example, there are striking differences between the human genome and the genomes of the fly or worm when it comes to genes involved in acquired immunity. "This is expected, because the acquired immune response is a defense system that only occurs in vertebrates," the researchers write.
The most common molecular functions are the transcription factors and those involved in nucleic acid metabolism. Other highly represented functions in the human genome, the researchers found, are the receptors, kinases, hydrolases. The analysis found many proteins that are members of proto-oncogene families, as well as families of 'select-regulatory' molecules, such as proteins involved in signal transduction.
In order to classify as many proteins as possible, the researchers focused on molecular function rather than a higher function. Because the automatic classification methods treat only relatively large protein families, the researchers say, there are a number of 'unclassified' sequences that do in fact have a known or predicted function.
. . .