GNN - Genome News Network  
  Home | About | Topics
   

 

How does DNA sequencing work?

Regardless of the approach to the genome as a whole, the actual process of DNA sequencing is the same. Sequencing employs a technique known as electrophoresis to separate pieces of DNA that differ in length by only one base.


Lab with sequencing machines

In electrophoresis, DNA to be sequenced is placed at one end of a gel—a slab of a gelatin-like substance. (A major part of DNA sequencing simply comes down to making a bunch of Jell-O.) Electrodes are placed at either end of the gel and an electrical current is applied, causing the DNA molecules to move through the gel. Smaller molecules move through the gel more rapidly, so the DNA molecules become separated into different bands according to their size. The catch is that electrophoresis can only separate about 500 bases into clear bands—hence the need for chopping DNA up into small pieces in order to sequence it.

Until the late 1980s, electrophoresis gels were always read by a person. Each piece of DNA was attached to a radioactive label, and an X-ray picture was made of the gel to make the positions of the DNA bands visible. Painstakingly analyzing the rows and columns of bands on the gel, a person could determine the sequence of the DNA.

But this process was slow, tedious, and fraught with error. Today's large-scale sequencing projects would be impossible without automatic sequencing machines, which became commercially available in the late 1980s and have made DNA sequencing much quicker and more reliable. In one year, a person can produce a finished sequence of 20,000 to 50,000 bases; a machine can produce a rough draft of a sequence that long in just a few hours.

Most automatic sequencing machines have a design based closely on the original, manual sequencing process. To run the machine, a technician pours gel into the space between two glass plates set less than half a millimeter (two-hundredths of an inch) apart. After the gel sets, DNA is loaded into each of the 96 lanes—just like the lanes on a highway or in a pool—that run the length of the 30-cm (about 1 foot) gel. As the DNA pieces move through the gel, the sequencing machine reads the order of DNA bases and stores this information in its computer memory.

In some newer machines, known as capillary sequencers, DNA is run through an array of 96 gel-filled capillaries—glass tubes about the width of a human hair—rather than through a slab of gel. But just like the slab-gel machines, capillary machines read the base sequence as DNA moves through the gel.


Close up of capillaries from a capillary sequencing machine

Capillary sequencers can sequence each piece of DNA about twice as fast as slab-gel machines. Moreover, they are fully automated—a robotic arm places the DNA into the top of the capillaries. The machine automatically fills the capillaries with gel and cleans them between runs, so only a minimum of human attention—about 15 minutes a day—is necessary to refill the containers of gel, water, and other solutions located in the machine's "guts." On the other hand, sequencing machines are expensive and capillary sequencers are so new that some labs have had trouble getting them to work at top efficiency. Most large-scale sequencing projects use a combination of slab-gel and capillary machines.

How does the sequencing machine know whether a base is an A, C, G, or T?

Sequencing machines can't "see" DNA directly, so scientists must use a complex set of procedures to prepare DNA for sequencing. When DNA is finally in a form that the machines can read, it has been chopped up, copied, chemically modified, and tagged with fluorescent dyes corresponding to the four different DNA bases, or genetic letters.

Before it is sequenced, a piece of DNA is copied many times, then divided into four batches in preparation for another round of copying. In this second round, a small amount of chemically modified base is added to each batch—that is, modified T to one batch, A to another, and so on. When one of these modified bases is incorporated into a DNA molecule, the chain of bases stops growing. The result of all this is that one batch of DNA will contain only pieces that end in T, another only pieces that end in A, a third only pieces that end in G, and the fourth batch only pieces that end in C.

In the second round of copying, a different fluorescent dye is also added to each batch of DNA. Thus, every piece of DNA that ends with T has a blue dye tag, for example; those that end in A have a red dye tag; those that end in G have a yellow dye tag; and those that end in C have a green dye tag.

Suppose you apply that procedure to this sequence of DNA:

TAGACT

At the end of the second round of copying, each batch will contain the following pieces of DNA:

1: blue-T, blue-TAGACT
2: red-TA, red-TAGA
3: yellow-TAG
4: green-TAGAC

Into one lane or capillary of a sequencing machine goes a mixture of DNA from all four batches. Because smaller molecules move through the gel faster, the DNA pieces come through the gel in increasing order of size—each piece one base longer than the last.

Thus, in this example, the first piece to make it all the way through the gel is a T attached to a blue dye tag; the next piece is TA with a red dye tag; next is TAG attached to a yellow dye tag; and so on.

As the pieces emerge from the gel, they move past a laser that causes the dye molecules to fluoresce. A detector reads the color of the fluorescence—blue, red, yellow…—and a software program matches the color to the corresponding base—T, A, G…. In this way, the sequence grows base by base. Each sequence of 500 bases or so that a sequencing machine generates is known as a "read."

What happens after DNA sequences come out of the sequencing machines?

An automatic sequencing machine spits out what genome scientists call "raw" sequence. In raw sequence, the reads or short DNA sequences are all jumbled together, like the pieces of a jigsaw puzzle in a just-opened box. Inevitably, raw sequence also contains a few gaps, mistakes, and ambiguities.

The process of polishing that raw sequence—transforming the fragmented rough draft into a long, continuous final product without breaks or errors—is called finishing. Finishing involves both assembly, in which individual reads are hooked together in the proper order, and a laborious process of double-checking and refining the sequence to eliminate mistakes and close gaps. Finishing often takes longer than the sequencing itself.

.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 

Previous Top of page Continue