Genome sequencing is figuring out the order of DNA nucleotides, or
bases, in a genomethe order of As, Cs, Gs, and Ts that make up
an organism's DNA. The human genome is made up of over 3 billion of
these genetic letters.
Today, DNA sequencing on a large scalethe scale necessary for
ambitious projects such as sequencing an entire genomeis mostly
done by high-tech machines. Much as your eye scans a sequence of letters
to read a sentence, these machines "read" a sequence of DNA bases.
A DNA sequence that has been translated from life's chemical alphabet
into our alphabet of written letters might look like this:
That is, in this particular piece of DNA, an adenine (A) is followed
by a guanine (G), which is followed by a thymine (T), which in turn
is followed by a cytosine (C), another cytosine (C), and so on.
What is genome sequencing?
By itself, not a whole lot. Genome sequencing is often compared to
"decoding," but a sequence is still very much in code. In a sense, a
genome sequence is simply a very long string of letters in a mysterious
When you read a sentence, the meaning is not just in the sequence of
the letters. It is also in the words those letters make and in the grammar
of the language. Similarly, the human genome is more than just its sequence.
Imagine the genome as a book written without capitalization or punctuation,
without breaks between words, sentences, or paragraphs, and with strings
of nonsense letters scattered between and even within sentences. A passage
from such a book in English might look like this:
Pass your mouse over the letters to see the hidden words.
Even in a familiar language it is difficult to pick out the meaning
of the passage: The quick brown fox jumped over the lazy dog. The
dog lay quietly dreaming of dinner. And the genome is "written"
in a far less familiar language, multiplying the difficulties involved
in reading it.
So sequencing the genome doesn't immediately lay open the genetic secrets
of an entire species. Even with a rough draft of the human genome sequence
in hand, much work remains to be done. Scientists still have to translate
those strings of letters into an understanding of how the genome works:
what the various genes that make up the genome do, how different genes
are related, and how the various parts of the genome are coordinated.
That is, they have to figure out what those letters of the genome sequence
Why is genome sequencing so important?
Sequencing the genome is an important step towards understanding it.
At the very least, the genome sequence will represent a valuable shortcut,
helping scientists find genes much more easily and quickly. A genome
sequence does contain some clues about where genes are, even though
scientists are just learning to interpret these clues.
Scientists also hope that being able to study the entire genome sequence
will help them understand how the genome as a whole workshow genes
work together to direct the growth, development and maintenance of an
Finally, genes account for less than 25 percent of the DNA in the genome,
and so knowing the entire genome sequence will help scientists study the
parts of the genome outside the genes. This includes the regulatory regions
that control how genes are turned on an off, as well as long stretches
of "nonsense" or "junk" DNAso called because we don't yet know what,
if anything, it does.
How do you sequence a genome?
Lab technician working with sequencing
Courtesy of Celera Genomics
The quick answer to this question is: in pieces. The whole genome can't
be sequenced all at once because available methods of DNA sequencing
can only handle short stretches of DNA at a time.
So instead, scientists must break the genome into small pieces, sequence
the pieces, and then reassemble them in the proper order to arrive at
the sequence of the whole genome. Much of the work involved in sequencing
lies in putting together this giant biological jigsaw puzzle.
There are two approaches to the task of cutting up the genome and putting
it back together again. One strategy, known as the "clone-by-clone" approach,
involves first breaking the genome up into relatively large chunks, called
clones, about 150,000 base pairs (bp) long. Scientists use genome mapping
techniques (discussed in further detail later) to figure out where in
the genome each clone belongs. Next they cut each clone into smaller,
overlapping pieces the right size for sequencingabout 500 BP each.
Finally, they sequence the pieces and use the overlaps to reconstruct
the sequence of the whole clone.
The other strategy, called "whole-genome shotgun" method, involves
breaking the genome up into small pieces, sequencing the pieces, and
reassembling the pieces into the full genome sequence.
Room filled with sequencing machines
Courtesy of Celera Genomics
Each of these approaches has advantages and disadvantages. The clone-by-clone
method is reliable but slow, and the mapping step can be especially
time-consuming. By contrast, the whole-genome shotgun method is potentially
very fast, but it can be extremely difficult to put together so many
tiny pieces of sequence all at once.
Both approaches have already been used to sequence whole genomes. The
whole-genome shotgun method was used to sequence the genome of the bacterium
Haemophilus influenzae, while the genome of baker's yeast, Saccharomyces
cerevisiae, was sequenced with a clone-by-clone method. Sequencing
the human genome was done using both approaches.
. . . .
. . . . . . . . .
. . . . . . . .
Updated on January 15, 2003