![]() |
|||
![]() |
|||
![]() |
|||
|
![]() |
||||||||||||
|
1995
Sequencing the genome of Haemophilus influenzae Rd
Early proponents of the Human Genome Project recognized both the importance of innovation and the promise of sequencing the DNA of various model organisms besides human beings. By the mid-1990s, however, the principal strategies had produced complete genomes of only a few viruses. Demonstrating the value of a new strategy of "shotgun" sequencing, J. Craig Venter and colleagues published, in May 1995, the first completely sequenced genome of a self-replicating, free-living organismthe bacteria Haemophilus influenzae Rd.
Haemophilus influenzaeknown as H. flu, for shortis a bacteria that can cause ear and respiratory infections, as well as meningitis in children. With 1.8 million base pairs, the size of its genome was fairly typical for a bacteriumbut about ten times longer than any virus that had been sequenced. "Whole-genome random sequencing," as used with H. flu, was a stepwise process that, in simplest terms, aimed to assemble a wholly sequenced genome from partly sequenced DNA fragments with the help of a computational model. This approach dispensed with the need for a preliminary physical map of the genome. Copies of DNA from H. flu were cut into pieces of random lengths of between 1,600 to 2,000 base pairs to create a library of plasmid clones. The clones were then partly sequenced at both ends, using automated sequencing machines, revealing "read lengths" each several hundred base pairs long. These base-pair sequenceswith their many overlapsbecame the raw data that was entered into the computer. Smaller libraries of longer fragments15,000 to 20,000 base pairswere also developed. Using a software tool, the TIGR assembler, the many thousands of fragments were compared, clustered, and matched for assembling the genome. The most informative and nonrepeating sequences were identified first, and repeating fragments were compared next. The longer fragments helped to order some of the very repetitive and almost identical sequences. Small physical gaps that remained after the TIGR assembler performed its work were rectified with several auxiliary strategies. Assembling the H. flu genome from 24,304 DNA fragments was a considerable achievementand to some observers a surprise. The genome contains 1,830,137 base pairs, in which 1,749 genes are embedded. Once assembled, the genetic coding regions were located, compared to known genes, and a detailed map developed. Success in sequencing Haemophilus influenzae Rdthe project took about a yeardemonstrated that random shotgun sequencing could be applied to whole genomes with speed and accuracy. Within months the same method was applied to another bacterium, the Mycoplasma genitalium, and genomes of other organisms soon followed.
|