June 2, 2000
|
There
are essentially two ways to sequence a genome. The BAC-to-BAC method,
the first to be employed in human genome studies, is slow but sure. The
BAC-to-BAC approach, also referred to as the map-based method, evolved
from procedures developed by a number of researchers during the late 1980s
and 90s and that continues to develop and change.*
The other technique, known as whole genome shotgun sequencing, brings
speed into the picture, enabling researchers to do the job in months to
a year. The shotgun method was developed by GNN president J. Craig Venter
in 1996 when he was at the Institute for Genomic Research (TIGR).*
Now that the human genome sequence is nearing completion, the next phaseunderstanding
the meaning and function of genescan begin.
A primer on the two approaches to sequencing follows.
* References to the
scientific literature are at the end of this primer.
The human body has about 100 trillion cells. Inside each of those cells
is the nucleus that contains the genome46 human chromosomeswhich
govern human development.
Each chromosome is one long string of DNA that is tightly coiled in a
compact bundle. Chromosomes are comprised of millions of copies of the
four letters of the genetic codethe DNA bases A, C, G, Tthat
are arranged into genes and non-coding sections. Finding the order, or
sequence, of these four letters is the goal of genomics. The entire human
genome is made up of about 3.5 billion bases.
To read the DNA, the chromosomes are cut into tiny pieces, each of which
is read individually. When all the segments have been read they are assembled
in the correct order.
Two approaches have been used to sequence the genome. They differ in the
methods they use to cut up the DNA, assemble it in the correct order,
and whether they map the chromosomes before decoding the sequence. First
there was the BAC to BAC approach. A second, newer method is called whole
genome shotgun sequencing.
BAC
to BAC Sequencing
The BAC to BAC approach first creates a crude physical map of the
whole genome before sequencing the DNA. Constructing a map requires
cutting the chromosomes into large pieces and figuring out the order
of these big chunks of DNA before taking a closer look and sequencing
all the fragments. |
Whole
Genome Shotgun Sequencing
The shotgun sequencing method goes straight to the job of decoding,
bypassing the need for a physical map. Therefore, it is much faster.
|
Several copies of the genome are randomly cut into pieces that
are about 150,000 base pairs (bp) long.
|
Multiple copies of the genome are randomly shredded into pieces
that are 2,000 base pairs (bp) long by squeezing the DNA through
a pressurized syringe. This is done a second time to generate pieces
that are 10,000 bp long.
|
Each of these 150,000 bp fragments is inserted into a BAC-a bacterial
artificial chromosome. A BAC is a man made piece of DNA that can
replicate inside a bacterial cell. The whole collection of BACs
containing the entire human genome is called a BAC library, because
each BAC is like a book in a library that can be accessed and copied.
|
Each 2,000 and 10,000 bp fragment is inserted into a plasmid, which
is a piece of DNA that can replicate in bacteria. The two collections
of plasmids containing 2,000 and 10,000 bp chunks of human DNA are
known as plasmid libraries.
|
These pieces are fingerprinted to give each piece a unique identification
tag that determines the order of the fragments. Fingerprinting involves
cutting each BAC fragment with a single enzyme and finding common
sequence landmarks in overlapping fragments that determine the location
of each BAC along the chromosome. Then overlapping BACs with markers
every 100,000 bp form a map of each chromosome.
|
This step not needed
in shotgun sequencing |
Each BAC is then broken
randomly into 1,500 bp pieces and placed in another artificial piece
of DNA called M13. This collection is known as an M13 library.
|
This step not needed
in shotgun sequencing |
All the M13 libraries
are sequenced. 500 bp from one end of the fragment are sequenced generating
millions of sequences.
|
Both the 2,000 and the
10,000 bp plasmid libraries are sequenced. 500 bp from each end of
each fragment are decoded generating millions of sequences. Sequencing
both ends of each insert is critical for the assembling the entire
chromosome.
|
These sequences are
fed into a computer program called PHRAP that looks for common sequences
that join two fragments together.
|
Computer algorithms assemble
the millions of sequenced fragments into a continuous stretch resembling
each chromosome.
|
(All images created by Mary S. Gibbs (GNN))
 |
|
Burke,
D.T. et al. Cloning of large segments of exogenous DNA into
yeast by means of artificial chromosomal vectors. Science 236,
806-812 (1987). |
|
Smith,
L.M. et al. Fluorescence detection in automated DNA sequencing
analysis. Nature 321, 674-679 (1986). |
|
Shizuya,
H. et al. Cloning and stable integration of 300-kilobase-pair
fragments of human DNA in Escherichia coli using an F-factor-based
vector. Proc Natl Acad Sci USA 89, 8794-8797 (September
1992). |
|
Venter,
J.C. et al. A new strategy for genome sequencing. Nature
381, 364-366 (May 30, 1996). |
|
Venter, J.C. et al. Shotgun sequencing of
the human genome. Science 280, 1540-1542 (June 5, 1998). |
|
 |
|