GNN - Genome News Network  
  Home | About | Topics
Protein Structure Becomes Crystal Clear
The National Institute of General Medical Sciences launches 10-year quest to determine the structures of 10,000 proteins
By Bijal P. Trivedi

Featured Article.

As Osnat Herzberg aligns a tray under a low power microscope, clusters of dazzling crystals come into view. Some look like piles of yellow sugar cubes randomly fused together, others stand alone like exquisitely cut diamonds. Another tray reveals crystals like a fistful of pencils with their points spraying out in all directions, and others resemble a spiral of thin leaves of paper. The shelves of Herzberg's laboratory are lined with stacks of transparent palm-sized trays each nurturing the growth of hundreds of tiny crystals. These crystals are made from bacterial proteins, and growing them is the first step in determining their precise atomic structure. Herzberg's work at the University of Maryland's Center for Advanced Research in Biotechnology in Rockville is one of many mini pilot projects launched to determine the feasibility of the "Protein Structure Initiative"—a descendent of the Human Genome Project—and possibly the next mega-project for biology.

A protein crystal.

In late September, the National Institute of General Medical Sciences (NIGMS) awarded a total of $30 million to seven US institutions to begin the "Protein Structure Initiative." The goal is to determine the structure of 10,000 proteins—selected from bacteria, fruit flies, worms, yeast and humans—in 10 years. The approximately 4 million dollars awarded to each center will fund the research for the first of the next five years.

"I cannot think of anything society could do to advance medicine more rapidly," says David Eisenberg, of the University of California, Los Angeles. "To design a drug, it is of enormous importance to have the crystal structure available; every major pharmaceutical company has an x-ray crystallography unit for doing rational drug design."

Proteins are large globular molecules with a complex topology of ridges, grooves, and bulges, and their three-dimensional structure is an invaluable tool for designing therapeutic drugs. The structure gives researchers an idea of which regions of a bacterial protein, for example, allow it to cause disease, and it puts into perspective the shape and size of drug that must be designed to interfere with this process.

"If we look at the human genome sequence and ask what's it good for, we can see it's a great tool for diagnostics, because now we can look at our genes and see the variations and gauge susceptibility to various diseases," says Eisenberg, whose lab is a member of the TB Structural Genomics Consortium. "If we ask what will come out of a human structural genomics project, or proteome project, the answer is therapeutics. It will give us the actual three dimensional structures of every protein, and the information to design drugs."

An x-ray diffraction picture (left) of a protein crystal. A computer program interprets this array of spots and converts the data into a protein structure (right). This close-up shows one section of the protein structure in yellow and purple. The red cage-like structure shows the density of electrons.

Each of the seven centers has a different focus. The Joint Center for Structural Genomics, based at The Scripps Research Institute, La Jolla, California, will concentrate on human and worm proteins. The Midwest Center for Structural Genomics, at the Argonne National Laboratory, Illinois, will not favor proteins from any particular organism, but will focus on reducing the cost of deciphering protein structure from $100,000 to $20,000 per protein. The TB Structural Genomics Consortium, led by scientists at the Los Alamos National Laboratory in New Mexico, will determine the structure of proteins from Mycobacterium tuberculosis, the bacterium that causes TB.

About 2 billion people worldwide are infected with TB. Every year approximately eight million people become sick with active tuberculosis, and of these more than two million die. Five drugs exist today that effectively treat TB, but there are strains of the bacterium that are resistant to each one of these drugs individually, and many that are resistant to two or more.

The TB genome was sequenced by French and English scientists in 1998, revealing a collection of about 4000 genes. The goal of the TB consortium—currently more than 50 labs from 12 countries including US, Germany, Japan, Britain, France, S. Korea, New Zealand and India—is to determine the structure of 400 of these proteins in five years.

"We have identified 60 proteins that are excellent drug targets. We think these are proteins involved in causing infection that are also vulnerable to drugs," says Eisenberg. The UCLA team is focusing on proteins functionally linked to other proteins that are already targeted by known drugs. The drug isoniazid is known to act via a receptor protein in the membrane of host cells; Eisenberg's team has identified other proteins that are linked to the receptor.

Ideally, biologists would like to know the structure of all the proteins encoded by the estimated 25,000 genes in the human genome, as well as the structures of proteins from pathogenic bacteria and viruses. However, the current technology cannot support such an enormous and expensive undertaking.

The first five years of the Protein Structure Initiative is viewed as the pilot stage. Many of the centers will focus on transforming structural genomics into a high-throughput science. Today's laborious and time consuming processes will be taken over by robots in protein structure factories. By the end of the pilot period each of the seven centers is expected to produce 100 to 200 structures per year.

Scientists simplify the complex protein structure [in the previous illustration] into a schematic called a ribbon diagram (shown above). The ribbon diagram shows how the backbone of the protein twists and folds.

Scaling up to high-throughput science means ironing out bottlenecks in small-scale labs like Osnat Herzberg's, and then ramping up the technology to outfit the new factory-labs. "There are many problems right now," says Herzberg. "Producing pure batches of protein takes a long time. Determining which proteins will actually crystallize is done by trial and error—not all proteins like to form crystals. And even if you get a beautiful crystal, it doesn't mean it will always diffract and produce good pictures." Right now Herzberg and her collaborators are committed to solving ten protein structures a year from the bacterium Haemophilus influenzae, the first free-living organism ever sequenced.

The ultimate goal of the Protein Structure Initiative is to determine how to predict the 3-D protein structure directly from the gene sequence. This would allow scientists to understand exactly how genetic mutations lead to the changes in the protein structure that cause disease. Or how SNPs, minute variations between human genes, can change a protein structure in such a way that one person has a normal risk of contracting heart disease while a person carrying a different SNP has a ten-fold higher risk of developing the disease. The 5,000 to 10,000 structures will provide the raw material needed for modeling any sequence, says Charles Edmonds of the NIGMS.

While the potential benefits for medicine are indisputable, Eisenberg points out that there is more to be gained. His lab could have focused on human proteins, but it chose a bacterium. "We wanted to know how much you can learn about an organism by studying all the proteins in the entire genome. We really want to understand life at the atomic level, and we can't study enough human proteins at this point and get the same feeling as if we studied a simpler organism and looked at all of the proteins," says Eisenberg.

Scientists working at the human genome sequencing centers had to wean themselves away from their specific gene of interest to concentrate on the whole genome sequence. In the same way, says Edmonds, structural biologists need to take a more universal approach and abandon pursuits of particular projects in the interest of producing a complete catalogue of protein structures.

. . .

Back to GNN Home Page