GNN - Genome News Network  
  Home | About | Topics
   
Don't despair. You have at least twice as many genes as a fruit fly.
  
By Bijal P. Trivedi


We humans have beaten the fly, the worm and the weed. Two simultaneous publications of the human genome sequence, one in this week's issue of Science and the other in Nature, each conclude that the number of human genes lies somewhere around 30,000; flies have 13,600, worms have 19,000, and close on our heels is a weed with 25,500 genes. Fortunately, the complexity of an organism is not determined solely by the number of genes, otherwise we may have ended up as a bug or a small, spindly plant.

The number of human genes has been the subject of hot scientific debate and hefty wagers, and is the focus of the Genesweep competition—basically lotto for the genome community. Bets have ranged from 28,000 to more than 142,000 genes. Now, with almost all of the genome sequence in-house, scientists are able to determine the complete set of protein-coding genes carried by a person.

GNN scientists fed 2.91 billion bases of sequence through three gene prediction programs—GRAIL, Genscan and FgenesH—that harness different algorithms to fish out genes. The sequence was also threaded though "Otto", a program that mimics the human curators who weigh and compare various types of experimental evidence to determine whether something is a gene.

Otto and its three companion programs identified 39,114 genes, each of which were supported by at least one form of experimental evidence: similarity to human or mouse genomic DNA, similarity to expressed human or rodent genes, or proteins. When the standards were raised, requiring that each gene prediction have the support of two experimental observations, the number of validated genes fell to 26,383, which marks GNN's minimum gene count. GNN's results are published in Science.

The International Human Genome Sequencing Consortium used sequence data available in the public sequence database Genbank, and different gene prediction computer programs and still arrived at a similar gene count to GNN scientists. The consortium used gene prediction programs Emsembl and Genie, and merged the predicted genes with known genes found in RefSeq, SWISSPROT and TrEMBL databases to come up with an initial gene count of around 31,000. These results are published in Nature.

GNN scientists address the surprisingly low gene number, stating in this week's issue of Science, "the modest number of human genes means that we must look elsewhere for the mechanisms that generate the complexities inherent in human development and the sophisticated signaling systems that maintain homeostasis."

At the level of the genome, looking elsewhere means analyzing when and where genes are turned on and off, and for how long. It involves understanding how a single gene can produce a diverse range of proteins depending on how the segments of a single gene are cut and pasted together, a phenomenon known as alternative splicing. And, it requires an appreciation of how the 99 percent of the genome that does not code for proteins can influence and regulate the genes.

In the future, more emphasis will be placed on RNA, the intermediate molecule that bridges the translation of DNA to protein. After a gene is transcribed into RNA, RNA can be edited before being translated into a protein. Thus RNA editing affects the function of a protein.

Finally, identifying modifications that occur at the level of the protein will uncover how these changes affect its interactions with other proteins in the cell, and the effects on cell physiology.

Even with the majority of the genome in hand, the gene count is not final. Many predictions still need to be verified in the lab. As the sequences and gene counts from other organisms roll in, comparisons with other genomes will help us figure out which genes are real and what each of our genes do.

. . .

 
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (February 15, 2001).
 
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304-1351 (February 16th, 2001).
 

Back to GNN Home Page