GNN - Genome News Network  
  Home | About | Topics
   
The landscape of the genome: deserts, islands and oases
  
By Bijal P. Trivedi

A genome is a work in progress; each piece occurs at random, and whether it stays or goes depends on whether the change is beneficial or at least neutral to the organism. The result, according to reports appearing this week in the journals Science and Nature, is that the genome is a haphazard collage of vast genetic deserts, gene-rich oases, parallel universes of gene duplications, and oceans where the genome stutters the same genetic phrases over and over.

One of the most striking features of the genome that can only be observed with the entire sequence in hand, are the huge regions that have arisen because of duplication. Large chunks of DNA sequence from all over the genome have been copied and pasted from one chromosome to another. An illustration in Science shows 1,077 blocks of genes that have been duplicated. One of the largest duplications was a block of sequence from chromosome 2 containing 33 genes that was found to resemble a stretch of sequence on chromosome 14.

Researchers estimated that these duplications are ancient, predating the divergence of mammals; placing these rearrangements at anywhere from 600 to 100 million years old.

An interesting and important observation with immediate medical implications is that in some cases both copies of the gene, the original and the duplicate, also called a paralog, are functional and are linked to similar diseases. Scientists from Celera Genomics found some paralogs of clotting factors, transcription factors, and potassium channel genes are associated with similar bleeding disorders, developmental defects and cardiovascular problems, respectively. The researchers suggest that a study of both the original gene and its twin may provide new revelations into what causes disease.

Genes are the pay dirt of the genome, yet the new sequence shows that they occupy just slightly more than one percent of the entire DNA sequence. However, genes are modular in design, allowing them to be spread over huge distances; exons, short segments that code for a protein are separated by stretches of non-coding regions, called introns, which occupy 24 percent of the genome. It is not known whether introns are important for gene function.

What is particularly surprising to researchers is the extent to which genes are clustered densely into small islands in desert-like expanses containing few or no genes. Why this happens is not known. What percentage of these gene-free deserts is important for gene function is also unknown. Whether the genome could be streamlined, removing 50 percent of the gene-free regions, for example, remains to be determined.

Introns and exons aside, the function of the other 75 percent of the genome remains largely a mystery.

About half the genome is cluttered with repetitive sequences, some thousands of bases in length, that do not produce any protein. Whereas much of this sequence was classified as "junk DNA", scientists are now being more cautious wondering if these regions do in fact exert some effect on the regulation of certain genes.

Looking at certain repetitive sequences reveals that our genome has been the playground of retroviruses. These viruses capture processed genes, in which the introns have been removed, and insert them back into the genome at some random location. Scientists know that this is the work of retroviruses because they leave a signature sequence on either side of the inserted gene, sort of a genetic calling card.

Not only have retroviruses had fun hopping in and out of our genome, but researchers from the International Human Genome Sequencing Consortium have found that bacteria have also left their mark. The researchers have identified 223 proteins that are very similar to proteins from bacteria. As these genes don't seem to have counterparts in the fruit fly, the worm or weed, scientists believe that they entered our genomes by horizontal transfer; basically genes from invading bacteria were directly incorporated into our genome. One of the most intriguing genes we seem to have acquired produces monoamine oxidase, which is the target of one class of anti-depressant drugs.

A casualty of the evolutionary process is the size of the genome, which rarely reflects the complexity of the organism, and the size of a chromosome, which rarely reflects the importance or number of genes it carries. Illustrating this point is chromosome 2, which is the largest chromosome but doesn't carry the highest number of genes. One reason for the tremendous size of chromosome 2 is that at some point during evolution a huge region of the chromosome was duplicated and incorporated. The Y chromosome is an exception because it is the smallest human chromosome and contains the fewest genes.

Much of the genome appears like a genetic graveyard with the partially decomposed remains of genes, deposited by retroviruses and massive duplications, many of which no longer function. Whether the human genome is unable to unload its evolutionary baggage or whether the vast gene free regions have a use remains to be seen.

Structural Features of the Genome

Whose DNA

2 men, 3 women

– 1 African-American
– 1 Asian-American
– 1 Hispanic-Mexican
– 2 Caucasians

Size of genome

3.0 billion base pairs

Total number of genes

Annotated genes supported by 2 pieces of evidence

Hypothetical genes supported by 1 piece of evidence

39,114

26,383


12,731

Percentage of genome occupied by genes 1.1 - 1.4%
Percentage of annotated genes whose molecular function is not known 42%
Largest chromosome
Chromosome 2
Smallest chromosome
Y Chromosome
Most gene-rich chromosome Chromosome 19
Least gene-rich chromosome Chromosome 13 and Y
Rate of variation between two genomes 1 base pair/1250
Number of SNPs 2.1 million
Percentage of SNPs that alter protein structure <1%
Gene with the largest number of exons
Titin has 234 exons

. . .

 
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (February 15, 2001).
 
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304-1351 (February 16th, 2001).
 

Back to GNN Home Page