|Double trouble: Real duplications in the human genome|
By Kate Dalke
August 16, 2002
By using novel software to compare the two draft human genome sequences, scientists have identified nearly 170 regions that include duplicated segments of the genome. These hotspots of genomic instability may help researchers pinpoint the underlying mechanisms of some genetic diseases and understand how new genes form.
The regions, which are actually genetic code flanked by stretches of copied or rearranged DNA, can, in effect, be good or bad. They can result in new genes that may have novel functions. Or, they can disrupt a cell and cause disease.
There are balancing forces at work in the genome, according to Jeffrey A. Bailey of Case Western Reserve University School of Medicine in Cleveland, Ohio, who was a member of the research team. "Too much duplication can be a bad thing," says Bailey. "But the dangers of instability are balanced by the benefits of having new genes form."
The dangers of duplication are well documented in genetic developmental diseases such as Williams-Beuren and DiGeorge syndromes. These disorders occur because sections of the genome are deleted near these duplicated regions.
Duplication can also be beneficial. It allows one copy of a gene to move to another region of the genome and perhaps take on a new, needed function. An example of this might be new human genes that evolve to fight infectious agents that have developed resistance to other genes.
The scientists pinpointed the duplications by comparing the public and private human genome sequences. The challenge was to distinguish actual duplications from ones that were due to sequencing errors. Unfortunately for the researchers, the errors look like actual duplications.
In the study, Evan E. Eichler of Case Western and colleagues used data generated by Celera Genomics in Rockville, Maryland, and the publicly funded Human Genome Project. Both groups published draft human genome sequences in February 2001.
To remove overlap errors, the researchers lined up short, random pieces of DNA that Celera had generated through its whole-genome shotgun method to the longer pieces assembled by the Human Genome Project. They could then differentiate between duplications and overlap errors.
The scientists discovered that about five percent of the human genome contain duplications. This figure is lower than previous estimates in part because the new study eliminated overlap errors. The majority of previously identified duplications were "missed overlaps that should be joined together," says Bailey.
The new findings, published in Science, can now be used to improve the draft sequences by correcting overlap errors.
The discovery of duplications in regions containing lots of genes may surprise some researchers. It has been suggested that duplications are more likely to occur in regions that lack genes, perhaps because such changes are less likely to harm the cell. In fact, the new data suggest that duplications often occur among genes.
"The gene-rich regions rearrange like crazy," says Knut Reinert of Celera, who helped design the computer program to identify duplicated regions.
At Case Western, Eichler and colleagues will use new data to investigate idiopathic mental retardationone of the reasons the researchers first began to study genomic instability. They can now go back and look at how the gain or loss of DNA is related to the development of disease.
. . .