GNN - Genome News Network  
  Home | About | Topics
   
SEQUENCING THE GENOME
  
By Bijal P. Trivedi

There are essentially two ways to sequence a genome. The BAC-to-BAC method, the first to be employed in human genome studies, is slow but sure. The BAC-to-BAC approach, also referred to as the map-based method, evolved from procedures developed by a number of researchers during the late 1980s and 90s and that continues to develop and change.*

The other technique, known as whole genome shotgun sequencing, brings speed into the picture, enabling researchers to do the job in months to a year. The shotgun method was developed by GNN president J. Craig Venter in 1996 when he was at the Institute for Genomic Research (TIGR).*

Now that the human genome sequence is nearing completion, the next phase—understanding the meaning and function of genes—can begin.

A primer on the two approaches to sequencing follows.

* References to the scientific literature are at the end of this primer.



The human body has about 100 trillion cells. Inside each of those cells is the nucleus that contains the genome—46 human chromosomes—which govern human development.


Each chromosome is one long string of DNA that is tightly coiled in a compact bundle. Chromosomes are comprised of millions of copies of the four letters of the genetic code—the DNA bases A, C, G, T—that are arranged into genes and non-coding sections. Finding the order, or sequence, of these four letters is the goal of genomics. The entire human genome is made up of about 3.5 billion bases.


To read the DNA, the chromosomes are cut into tiny pieces, each of which is read individually. When all the segments have been read they are assembled in the correct order.

Two approaches have been used to sequence the genome. They differ in the methods they use to cut up the DNA, assemble it in the correct order, and whether they map the chromosomes before decoding the sequence. First there was the BAC to BAC approach. A second, newer method is called whole genome shotgun sequencing.

BAC to BAC Sequencing

The BAC to BAC approach first creates a crude physical map of the whole genome before sequencing the DNA. Constructing a map requires cutting the chromosomes into large pieces and figuring out the order of these big chunks of DNA before taking a closer look and sequencing all the fragments.
Whole Genome Shotgun Sequencing

The shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical map. Therefore, it is much faster.

Several copies of the genome are randomly cut into pieces that are about 150,000 base pairs (bp) long.

Multiple copies of the genome are randomly shredded into pieces that are 2,000 base pairs (bp) long by squeezing the DNA through a pressurized syringe. This is done a second time to generate pieces that are 10,000 bp long.


Each of these 150,000 bp fragments is inserted into a BAC-a bacterial artificial chromosome. A BAC is a man made piece of DNA that can replicate inside a bacterial cell. The whole collection of BACs containing the entire human genome is called a BAC library, because each BAC is like a book in a library that can be accessed and copied.

Each 2,000 and 10,000 bp fragment is inserted into a plasmid, which is a piece of DNA that can replicate in bacteria. The two collections of plasmids containing 2,000 and 10,000 bp chunks of human DNA are known as plasmid libraries.


These pieces are fingerprinted to give each piece a unique identification tag that determines the order of the fragments. Fingerprinting involves cutting each BAC fragment with a single enzyme and finding common sequence landmarks in overlapping fragments that determine the location of each BAC along the chromosome. Then overlapping BACs with markers every 100,000 bp form a map of each chromosome.

This step not needed in shotgun sequencing

Each BAC is then broken randomly into 1,500 bp pieces and placed in another artificial piece of DNA called M13. This collection is known as an M13 library.

This step not needed in shotgun sequencing

All the M13 libraries are sequenced. 500 bp from one end of the fragment are sequenced generating millions of sequences.

Both the 2,000 and the 10,000 bp plasmid libraries are sequenced. 500 bp from each end of each fragment are decoded generating millions of sequences. Sequencing both ends of each insert is critical for the assembling the entire chromosome.


These sequences are fed into a computer program called PHRAP that looks for common sequences that join two fragments together.

Computer algorithms assemble the millions of sequenced fragments into a continuous stretch resembling each chromosome.


(All images created by Mary S. Gibbs (GNN))

. . .

 
Burke, D.T. et al. Cloning of large segments of exogenous DNA into yeast by means of artificial chromosomal vectors. Science 236, 806-812 (1987).
 
Smith, L.M. et al. Fluorescence detection in automated DNA sequencing analysis. Nature 321, 674-679 (1986).
 
Shizuya, H. et al. Cloning and stable integration of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc Natl Acad Sci USA 89, 8794-8797 (September 1992).
 
Venter, J.C. et al. A new strategy for genome sequencing. Nature 381, 364-366 (May 30, 1996).
 
Venter, J.C. et al. Shotgun sequencing of the human genome. Science 280, 1540-1542 (June 5, 1998).
 

Back to GNN Home Page