|How many human genes?|
By Bijal P. Trivedi
May 26, 2000
Ladies and Gentlemen, PLACE YOUR BETS!
Got a hunch about how many genes are contained in the human genome? Well, if you are passing by the Cold Spring Harbor Laboratory, in New York, you may want to drop by and place a bet. Genome scientists did just that at the Genome Sequencing and Biology meeting that took place from May 10-14 at Cold Spring Harbor Laboratory.
The meeting coincided with the publication of three reports in the June issue of Nature Genetics, each describing a different approach to calculating the number of human genes; the estimates ranged from 28,000 to 120,000.
In one report, Jean Weissenbach and his team at Genoscope, in France, suggest that the number of human genes lies somewhere between 28,000 and 34,000. The team used the gene rich genome of the pufferfish, Tetraodon nigroviridis, to pluck out similar human genes from the raw sequence data.
Using a different strategy, Phil Green and Brent Ewing, of the University of Washington, in Seattle, arrived in the same ballpark with 35,000 genes. They used a collection of expressed sequence tags (ESTs), short unique segments of genes, and the total number of genes on chromosome 22 to calculate the gene count.
John Quackenbush's team from The Institute for Genomic Research, in Rockville, MD, also used ESTs but his team proposes a total of about 120,000 genes. Quackenbush says the high estimate made him a little nervous until he heard that DoubleTwist, a California based company, had come up with a similar estimate: 105,000 genes.
Assuming that every EST comes from a specific gene can lead to gross overestimates of gene number, according to Samuel Aparicio, of the Wellcome Trust Centre in Cambridge, UK, and author of a commentary that accompanied the reports in Nature Genetics. Each EST is made from a messenger RNA molecule, which is an intermediate between a gene and the final protein. Because a single gene can produce many different mRNAs, and thus many ESTs, researchers must do a lot of sequence crunching to make sure they only use one EST per gene. The choice of ESTs and the number used can easily inflate the gene tally.
Other problems arise because pseudo genes, which resemble real genes but do not produce a complete functional protein, also produce intermediates that clutter the cell and produce "biological noise," says Aparicio.
"When we get the whole genome sequenced, it is going to take awhile to sort out what's a gene and what isn't," said Green.
From a biological point of view, the 35,000 estimates may sound small when compared to the number of genes in the Drosophila melanogaster (fruit fly) and Caenorhabditis elegans (worm) genomes: 13,601 and 18,424. But the number doesn't sound too far off according to Aparicio.
Since diverging from invertebrates the mammalian genome has doubled in size twice, says Aparicio. That places the number of genes at about 60,000. But this is countered by frequent gene loss, which would place the final gene count between 40 and 50 thousand, he says.
Large numbers of genes are not necessary to create a complex organism, like the human. Using a smaller number of genes but increasing the number of interactions between them, and using more sophisticated switches for controlling each gene, will also create a complex organism.
For more information about the Genesweep competition at CSHL go to http://www.ensembl.org/genesweep.html
. . .