GNN - Genome News Network  
  Home | About | Topics
  A Selection of Sequenced Organisms 2000 - 2002. Link to Shigella. Link to Bacillus anthracis. Link to Fugu rubripes. Link to Buchnera aphidicola. Link to Ciona. Link to Chlorobium tepidum. Link to Streptomyces coelicolor. Link to Group A Streptococci.

Here, GNN posts abstracts of scientific papers on whole genome sequences that have been reported by GenomeNewsNetwork.

Recently added:
Blochmannia floridanus

Agrobacterium tumefaciens C58 Anabaena sp. strain PCC 7120 Anopheles gambiae PEST   Arabidopsis thaliana    
Bacillus anthracis Bacteroides thetaiotaomicron VPI-5482 Bifidobacterium longum NCC2705 Blochmannia floridanus Brucella melitensis 16M
Brucella suis 1330 strain Buchnera aphidicola [symbiont of Schizaphis graminum (Sg)] Buchnera sp. APS    
Caulobacter crescentus Chlorobium tepidum TLS Ciona intestinalis Clostridium perfringens strain 13 Coxiella burnetiiRSA 493
D - E - F          
Drosophila melanogaster Encephalitozoon cuniculi (GB-M1) Enterococcus faecalis Escherichia coli O157:H7 Fugu rubripes
Fusobacterium nucleatum strain ATCC 25586        
H - L          
Halobacterium sp. NRC-1 Homo sapiens Lactococcus lactis ssp. lactis IL1403    
Methanococcoides burtonii Methanogenium frigidum Methanopyrus kandleri AV19 Methanosarcina acetivorans C2A Mycobacterium leprae Mycoplasma pulmonis UAB CTIP
N - O          
Neisseria meningitidis MC58 (serogroup B) Neisseria meningitidis Z2491 Neurospora crassa Oceanobacillus iheyensis HTE831 Oryza sativa L. ssp. indica
Oryza sativa L. ssp. japonica        
Pasteurella multocida, Pm70 Plasmodium falciparum 3D7 Plasmodium yoelii yoelii Pseudomonas aeruginosa PAO1 Pseudomonas putida KT2440 Pyrobaculum aerophilum IM2
Ralstonia solanacearum strain GMI1000 Rhodopirellula baltica Rickettsia conorii strain Malish 7    
Salmonella enterica serovar Typhi CT18 Salmonella enterica serovar Typhimurium LT2 Schizosaccharomyces pombe Shewanella oneidensis MR-1 Shigella flexneri 2a Strawberry mottle virus
Streptococcus agalactiae strain 2603 V/R Streptococcus group A strain MGAS8232 Streptococcus mutans UA159 Streptococcus pneumoniae (serotype 4) Streptococcus pneumoniae strain R6 Streptococcus pyogenes M1
Streptomyces avermitilis ATCC31267 Streptomyces coelicolor A3(2) Sulfolobus solfataricus P2 Sulfolobus tokodaii strain7    
T - V - W          
Thermoanaerobacter tengcongensis MB4(T) Thermoplasma acidophilum Tropheryma whippleiTW08/27 Vibrio cholerae El Tor N16961 Wigglesworthia glossinidia brevipalpis
X - Y          
Xanthomonas axonopodis pv. citri (strain 306) Xanthomonas campestris pv. campestris (strain ATCC33913) Xylella fastidiosa 9a5c strain (citrus) Xylella fastidiosa Dixon strain (almond)
Xylella fastidiosa Ann-1 strain (oleander) Yersinia pestis strain CO92      



The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes.

Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life.

Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9388-93.

See GNN article Life Inside Carpenter Ants


Mechanisms of thermal adaptation revealed from the genomes of the antarctic Archaea Methanogenium frigidum and Methanococcoides burtonii.

We generated draft genome sequences for two cold-adapted Archaea, Methanogenium frigidum and Methanococcoides burtonii, to identify genotypic characteristics that distinguish them from Archaea with a higher optimal growth temperature (OGT). Comparative genomics revealed trends in amino acid and tRNA composition, and structural features of proteins. Proteins from the cold-adapted Archaea are characterized by a higher content of noncharged polar amino acids, particularly Gln and Thr and a lower content of hydrophobic amino acids, particularly Leu. Sequence data from nine methanogen genomes (OGT 15 degrees -98 degrees C) were used to generate 1111 modeled protein structures. Analysis of the models from the cold-adapted Archaea showed a strong tendency in the solvent-accessible area for more Gln, Thr, and hydrophobic residues and fewer charged residues. A cold shock domain (CSD) protein (CspA homolog) was identified in M. frigidum, two hypothetical proteins with CSD-folds in M. burtonii, and a unique winged helix DNA-binding domain protein in M. burtonii. This suggests that these types of nucleic acid binding proteins have a critical role in cold-adapted Archaea. Structural analysis of tRNA sequences from the Archaea indicated that GC content is the major factor influencing tRNA stability in hyperthermophiles, but not in the psychrophiles, mesophiles or moderate thermophiles. Below an OGT of 60 degrees C, the GC content in tRNA was largely unchanged, indicating that any requirement for flexibility of tRNA in psychrophiles is mediated by other means. This is the first time that comparisons have been performed with genome data from Archaea spanning the growth temperature extremes from psychrophiles to hyperthermophiles.

Genome Res. 2003 Jul;13(7):1580-8.

See GNN article Extremophiles, Antarctica, and Extraterrestrial Life


Complete genome sequence of the marine planctomycete Pirellula sp. strain 1.

Pirellula sp. strain 1 ("Rhodopirellula baltica") is a marine representative of the globally distributed and environmentally important bacterial order Planctomycetales. Here we report the complete genome sequence of a member of this independent phylum. With 7.145 megabases, Pirellula sp. strain 1 has the largest circular bacterial genome sequenced so far. The presence of all genes required for heterolactic acid fermentation, key genes for the interconversion of C1 compounds, and 110 sulfatases were unexpected for this aerobic heterotrophic isolate. Although Pirellula sp. strain 1 has a proteinaceous cell wall, remnants of genes for peptidoglycan synthesis were found. Genes for lipid A biosynthesis and homologues to the flagellar L- and P-ring protein indicate a former Gram-negative type of cell wall. Phylogenetic analysis of all relevant markers clearly affiliates the Planctomycetales to the domain Bacteria as a distinct phylum, but a deepest branching is not supported by our analyses.

Proc Natl Acad Sci U S A. 2003 Jul 8;100(14):8298-303.

See GNN article The Small Red Pear of the Sea


Complete genome sequence of the Q-fever pathogen Coxiella burnetii.

The 1,995,275-bp genome of Coxiella burnetii, Nine Mile phase I RSA493, a highly virulent zoonotic pathogen and category B bioterrorism agent, was sequenced by the random shotgun method. This bacterium is an obligate intracellular acidophile that is highly adapted for life within the eukaryotic phagolysosome. Genome analysis revealed many genes with potential roles in adhesion, invasion, intracellular trafficking, host-cell modulation, and detoxification. A previously uncharacterized 13-member family of ankyrin repeat-containing proteins is implicated in the pathogenesis of this organism. Although the lifestyle and parasitic strategies of C. burnetii resemble that of Rickettsiae and Chlamydiae, their genome architectures differ considerably in terms of presence of mobile elements, extent of genome reduction, metabolic capabilities, and transporter profiles. The presence of 83 pseudogenes displays an ongoing process of gene degradation. Unlike other obligate intracellular bacteria, 32 insertion sequences are found dispersed in the chromosome, indicating some plasticity in the C. burnetii genome. These analyses suggest that the obligate intracellular lifestyle of C. burnetii may be a relatively recent innovation.

Proc Natl Acad Sci U S A 2003 Apr 29;100(9):5455-60.

See GNN article Potential Bioweapon: Q Fever Genome Is Sequenced


The genome sequence of the filamentous fungus Neurospora crassa.

Neurospora crassa is a central organism in the history of twentieth-century genetics, biochemistry and molecular biology. Here, we report a high-quality draft sequence of the N. crassa genome. The approximately 40-megabase genome encodes about 10,000 protein-coding genes-more than twice as many as in the fission yeast Schizosaccharomyces pombe and only about 25% fewer than in the fruitfly Drosophila melanogaster. Analysis of the gene set yields insights into unexpected aspects of Neurospora biology including the identification of genes potentially associated with red light photobiology, genes implicated in secondary metabolism, and important differences in Ca(2+) signalling as compared with plants and animals. Neurospora possesses the widest array of genome defence mechanisms known for any eukaryotic organism, including a process unique to fungi called repeat-induced point mutation (RIP). Genome analysis suggests that RIP has had a profound impact on genome evolution, greatly slowing the creation of new genes through genomic duplication and resulting in a genome with an unusually low proportion of closely related genes.

Nature 2003 Apr 24;422(6934):859-68.

See GNN article Mighty Mold Is Sequenced


Role of mobile DNA in the evolution of vancomycin-resistant Enterococcus faecalis.

The complete genome sequence of Enterococcus faecalis V583, a vancomycin-resistant clinical isolate, revealed that more than a quarter of the genome consists of probable mobile or foreign DNA. One of the predicted mobile elements is a previously unknown vanB vancomycin-resistance conjugative transposon. Three plasmids were identified, including two pheromone-sensing conjugative plasmids, one encoding a previously undescribed pheromone inhibitor. The apparent propensity for the incorporation of mobile elements probably contributed to the rapid acquisition and dissemination of drug resistance in the enterococci.

Science 2003 Mar 28;299(5615):2071-4.

See GNN article Mobile DNA: Genomic Studies Illuminate Antibiotic Resistance


A genomic view of the human-Bacteroides thetaiotaomicron symbiosis.

The human gut is colonized with a vast community of indigenous microorganisms that help shape our biology. Here, we present the complete genome sequence of the Gram-negative anaerobe Bacteroides thetaiotaomicron, a dominant member of our normal distal intestinal microbiota. Its 4779-member proteome includes an elaborate apparatus for acquiring and hydrolyzing otherwise indigestible dietary polysaccharides and an associated environment-sensing system consisting of a large repertoire of extracytoplasmic function sigma factors and one- and two-component signal transduction systems. These and other expanded paralogous groups shed light on the molecular mechanisms underlying symbiotic host-bacterial relationships in our intestine.

Science 2003 Mar 28;299(5615):2074-6.

See GNN article Mobile DNA: Genomic Studies Illuminate Antibiotic Resistance


Sequencing and analysis of the genome of the Whipple's disease bacterium Tropheryma whipplei.

BACKGROUND: Whipple's disease is a rare multisystem chronic infection, involving the intestinal tract as well as various other organs. The causative agent, Tropheryma whipplei, is a Gram-positive bacterium about which little is known. Our aim was to investigate the biology of this organism by generating and analysing the complete DNA sequence of its genome. METHODS: We isolated and propagated T whipplei strain TW08/27 from the cerebrospinal fluid of a patient diagnosed with Whipple's disease. We generated the complete sequence of the genome by the whole genome shotgun method, and analysed it with a combination of automatic and manual bioinformatic techniques. FINDINGS: Sequencing revealed a condensed 925938 bp genome with a lack of key biosynthetic pathways and a reduced capacity for energy metabolism. A family of large surface proteins was identified, some associated with large amounts of non-coding repetitive DNA, and an unexpected degree of sequence variation. INTERPRETATION: The genome reduction and lack of metabolic capabilities point to a host-restricted lifestyle for the organism. The sequence variation indicates both known and novel mechanisms for the elaboration and variation of surface structures, and suggests that immune evasion and host interaction play an important part in the lifestyle of this persistent bacterial pathogen.

Lancet 2003 Feb 22;361(9358):637-44.

See GNN article Hiding from the World: Scientists Sequence the Elusive Whipple Genome


Complete genome sequence and comparative analysis of the metabolically versatile Pseudomonas putida KT2440.

Pseudomonas putida is a metabolically versatile saprophytic soil bacterium that has been certified as a biosafety host for the cloning of foreign genes. The bacterium also has considerable potential for biotechnological applications. Sequence analysis of the 6.18 Mb genome of strain KT2440 reveals diverse transport and metabolic systems. Although there is a high level of genome conservation with the pathogenic Pseudomonad Pseudomonas aeruginosa (85% of the predicted coding regions are shared), key virulence factors including exotoxin A and type III secretion systems are absent. Analysis of the genome gives insight into the non-pathogenic nature of P. putida and points to potential new applications in agriculture, biocatalysis, bioremediation and bioplastic production.

Environ Microbiol 2002 Dec;4(12):799-808.

See GNN article Versatile soil-dwelling microbe is mapped


The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins

The first chordates appear in the fossil record at the time of the Cambrian explosion, nearly 550 million years ago. The modern ascidian tadpole represents a plausible approximation to these ancestral chordates. To illuminate the origins of chordate and vertebrates, we generated a draft of the protein-coding portion of the genome of the most studied ascidian, Ciona intestinalis. The Ciona genome contains approximately 16,000 protein-coding genes, similar to the number in other invertebrates, but only half that found in vertebrates. Vertebrate gene families are typically found in simplified form in Ciona, suggesting that ascidians contain the basic ancestral complement of genes involved in cell signaling and development. The ascidian genome has also acquired a number of lineage-specific innovations, including a group of genes engaged in cellulose metabolism that are related to those in bacteria and fungi.

Science 2002 Dec 13;298(5601):2157-67.

See GNN article Sea squirt spouts its genome


Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157

We have sequenced the genome of Shigella flexneri serotype 2a, the most prevalent species and serotype that causes bacillary dysentery or shigellosis in man. The whole genome is composed of a 4 607 203 bp chromosome and a 221 618 bp virulence plasmid, designated pCP301. While the plasmid shows minor divergence from that sequenced in serotype 5a, striking characteristics of the chromosome have been revealed. The S. flexneri chromosome has, astonishingly, 314 IS elements, more than 7-fold over those possessed by its close relatives, the non-pathogenic K12 strain and enterohemorrhagic O157:H7 strain of Escherichia coli. There are 13 translocations and inversions compared with the E. coli sequences, all involve a segment larger than 5 kb, and most are associated with deletions or acquired DNA sequences, of which several are likely to be bacteriophage-transmitted pathogenicity islands. Furthermore, S. flexneri, resembling another human-restricted enteric pathogen, Salmonella typhi, also has hundreds of pseudogenes compared with the E. coli strains. All of these could be subjected to investigations towards novel preventative and treatment strategies against shigellosis.

Nucleic Acids Res 2002 Oct 15;30(20):4432-41.

See GNN article Infant mortality: New clues from the sequenced Shigella genome

Genome sequence of Streptococcus mutans UA159, a cariogenic dental pathogen

Streptococcus mutans is the leading cause of dental caries (tooth decay) worldwide and is considered to be the most cariogenic of all of the oral streptococci. The genome of S. mutans UA159, a serotype c strain, has been completely sequenced and is composed of 2,030,936 base pairs. It contains 1,963 ORFs, 63% of which have been assigned putative functions. The genome analysis provides further insight into how S. mutans has adapted to surviving the oral environment through resource acquisition, defense against host factors, and use of gene products that maintain its niche against microbial competitors. S. mutans metabolizes a wide variety of carbohydrates via nonoxidative pathways, and all of these pathways have been identified, along with the associated transport systems whose genes account for almost 15% of the genome. Virulence genes associated with extracellular adherent glucan production, adhesins, acid tolerance, proteases, and putative hemolysins have been identified. Strain UA159 is naturally competent and contains all of the genes essential for competence and quorum sensing. Mobile genetic elements in the form of IS elements and transposons are prominent in the genome and include a previously uncharacterized conjugative transposon and a composite transposon containing genes for the synthesis of antibiotics of the gramicidin/bacitracin family; however, no bacteriophage genomes are present.

Proc Natl Acad Sci U S A 2002 Oct 23; [epub ahead of print].

See GNN article Fighting Cavities: Bacterium that causes tooth decay, S. mutans, is sequenced

The genome sequence of Bifidobacterium longum reflects its adaptation to the human gastrointestinal tract

Bifidobacteria are Gram-positive prokaryotes that naturally colonize the human gastrointestinal tract (GIT) and vagina. Although not numerically dominant in the complex intestinal microflora, they are considered as key commensals that promote a healthy GIT. We determined the 2.26-Mb genome sequence of an infant-derived strain of Bifidobacterium longum, and identified 1,730 possible coding sequences organized in a 60%-GC circular chromosome. Bioinformatic analysis revealed several physiological traits that could partially explain the successful adaptation of this bacteria to the colon. An unexpectedly large number of the predicted proteins appeared to be specialized for catabolism of a variety of oligosaccharides, some possibly released by rare or novel glycosyl hydrolases acting on "nondigestible" plant polymers or host-derived glycoproteins and glycoconjugates. This ability to scavenge from a large variety of nutrients likely contributes to the competitiveness and persistence of bifidobacteria in the colon. Many genes for oligosaccharide metabolism were found in self-regulated modules that appear to have arisen in part from gene duplication or horizontal acquisition. Complete pathways for all amino acids, nucleotides, and some key vitamins were identified; however, routes for Asp and Cys were atypical. More importantly, genome analysis provided insights into the reciprocal interactions of bifidobacteria with their hosts. We identified polypeptides that showed homology to most major proteins needed for production of glycoprotein-binding fimbriae, structures that could possibly be important for adhesion and persistence in the GIT. We also found a eukaryotic-type serine protease inhibitor (serpin) possibly involved in the reported immunomodulatory activity of bifidobacteria.

Proc Natl Acad Sci U S A 2002 Oct 15; [epub ahead of print].

See GNN article Friendly tenants in the human gut: The genome of B. longum

Genome sequence of the dissimilatory metal ion-reducing bacterium Shewanella oneidensis

Shewanella oneidensis is an important model organism for bioremediation studies because of its diverse respiratory capabilities, conferred in part by multicomponent, branched electron transport systems. Here we report the sequencing of the S. oneidensis genome, which consists of a 4,969,803-base pair circular chromosome with 4,758 predicted protein-encoding open reading frames (CDS) and a 161,613-base pair plasmid with 173 CDSs. We identified the first Shewanella lambda-like phage, providing a potential tool for further genome engineering. Genome analysis revealed 39 c-type cytochromes, including 32 previously unidentified in S. oneidensis, and a novel periplasmic [Fe] hydrogenase, which are integral members of the electron transport system. This genome sequence represents a critical step in the elucidation of the pathways for reduction (and bioremediation) of pollutants such as uranium (U) and chromium (Cr), and offers a starting point for defining this organism's complex electron transport systems and metal ion-reducing capabilities.

Nat Biotechnol 2002 Oct 7; [epub ahead of print].

See GNN article Microbe that breaks down metals, S. oneidensis, is sequenced

The genome sequence of the malaria mosquito Anopheles gambiae

Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.

Science 2002 Oct 4;298(5591):129-49.

See GNN article The Parasite and the Mosquito: Malaria's deadly partners are sequenced

Genome sequence of the human malaria parasite Plasmodium falciparum

The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

Nature 2002 Oct 3;419(6906):498-511.

See GNN article The Parasite and the Mosquito: Malaria's deadly partners are sequenced

Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii

Species of malaria parasite that infect rodents have long been used as models for malaria disease research. Here we report the whole-genome shotgun sequence of one species, Plasmodium yoelii yoelii, and comparative studies with the genome of the human malaria parasite Plasmodium falciparum clone 3D7. A synteny map of 2,212 P. y. yoelii contiguous DNA sequences (contigs) aligned to 14 P. falciparum chromosomes reveals marked conservation of gene synteny within the body of each chromosome. Of about 5,300 P. falciparum genes, more than 3,300 P. y. yoelii orthologues of predominantly metabolic function were identified. Over 800 copies of a variant antigen gene located in subtelomeric regions were found. This is the first genome sequence of a model eukaryotic parasite, and it provides insight into the use of such systems in the modelling of Plasmodium biology and disease.

Nature 2002 Oct 3;419(6906):512-9.

See GNN article The Parasite and the Mosquito: Malaria's deadly partners are sequenced

Genome sequence of Oceanobacillus iheyensis isolated from the Iheya Ridge and its unexpected adaptive capabilities to extreme environments

Oceanobacillus iheyensis HTE831 is an alkaliphilic and extremely halotolerant Bacillus-related species isolated from deep-sea sediment. We present here the complete genome sequence of HTE831 along with analyses of genes required for adaptation to highly alkaline and saline environments. The genome consists of 3.6 Mb, encoding many proteins potentially associated with roles in regulation of intracellular osmotic pressure and pH homeostasis. The candidate genes involved in alkaliphily were determined based on comparative analysis with three Bacillus species and two other Gram-positive species. Comparison with the genomes of other major Gram-positive bacterial species suggests that the backbone of the genus Bacillus is composed of approximately 350 genes. This second genome sequence of an alkaliphilic Bacillus-related species will be useful in understanding life in highly alkaline environments and microbial diversity within the ubiquitous bacilli.

Nucleic Acids Res 2002 Sep 15;30(18):3927-35.

See GNN article Japanese extremophile, O. iheyensis, from the deep sea

The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts

The 3.31-Mb genome sequence of the intracellular pathogen and potential bioterrorism agent, Brucella suis, was determined. Comparison of B. suis with Brucella melitensis has defined a finite set of differences that could be responsible for the differences in virulence and host preference between these organisms, and indicates that phage have played a significant role in their divergence. Analysis of the B. suis genome reveals transport and metabolic capabilities akin to soil/plant-associated bacteria. Extensive gene synteny between B. suis chromosome 1 and the genome of the plant symbiont Mesorhizobium loti emphasizes the similarity between this animal pathogen and plant pathogens and symbionts. A limited repertoire of genes homologous to known bacterial virulence factors were identified.

Proc Natl Acad Sci U S A 2002 Sep 23 [epub ahead of print].

See GNN article Potential bioweapon, Brucella suis, is sequenced

Whole-genome comparative analysis of three phytopathogenic Xylella fastidiosa strains

Xylella fastidiosa (Xf) causes wilt disease in plants and is responsible for major economic and crop losses globally. Owing to the public importance of this phytopathogen we embarked on a comparative analysis of the complete genome of Xf pv citrus and the partial genomes of two recently sequenced strains of this species: Xf pv almond and Xf pv oleander, which cause leaf scorch in almond and oleander plants, respectively. We report a reanalysis of the previously sequenced Xf 9a5c (CVC, citrus) strain and the two "gapped" Xf genomes revealing ORFs encoding critical functions in pathogenicity and conjugative transfer. Second, a detailed whole-genome functional comparison was based on the three sequenced Xf strains, identifying the unique genes present in each strain, in addition to those shared between strains. Third, an "in silico" cellular reconstruction of these organisms was made, based on a comparison of their core functional subsystems that led to a characterization of their conjugative transfer machinery, identification of potential differences in their adhesion mechanisms, and highlighting of the absence of a classical quorum-sensing mechanism. This study demonstrates the effectiveness of comparative analysis strategies in the interpretation of genomes that are closely related.

Proc Natl Acad Sci U S A 2002 Sep 17;99(19):12403-12408.

See GNN article New strains of fruit and nut pathogen yield their genomes

Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae

The 2,160,267 bp genome sequence of Streptococcus agalactiae, the leading cause of bacterial sepsis, pneumonia, and meningitis in neonates in the U.S. and Europe, is predicted to encode 2,175 genes. Genome comparisons among S. agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, and the other completely sequenced genomes identified genes specific to the streptococci and to S. agalactiae. These in silico analyses, combined with comparative genome hybridization experiments between the sequenced serotype V strain 2603 V/R and 19 S. agalactiae strains from several serotypes using whole-genome microarrays, revealed the genetic heterogeneity among S. agalactiae strains, even of the same serotype, and provided insights into the evolution of virulence mechanisms.

Proc Natl Acad Sci U S A 2002 Sep 17;99(19):12391-12396.

See GNN article To Make a Vaccine, First Sequence a Genome: The fight against group B strep starts with its genome sequence

Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia

Many insects that rely on a single food source throughout their developmental cycle harbor beneficial microbes that provide nutrients absent from their restricted diet. Tsetse flies, the vectors of African trypanosomes, feed exclusively on blood and rely on one such intracellular microbe for nutritional provisioning and fecundity. As a result of co-evolution with hosts over millions of years, these mutualists have lost the ability to survive outside the sheltered environment of their host insect cells. We present the complete annotated genome of Wigglesworthia glossinidia brevipalpis, which is composed of one chromosome of 697,724 base pairs (bp) and one small plasmid, called pWig1, of 5,200 bp. Genes involved in the biosynthesis of vitamin metabolites, apparently essential for host nutrition and fecundity, have been retained. Unexpectedly, this obligate's genome bears hallmarks of both parasitic and free-living microbes, and the gene encoding the important regulatory protein DnaA is absent.

Nat Genet 2002 Sep 3 [epub ahead of print].

See GNN article Wigglesworthia wiggles into the world of sequenced genomes

Whole genome shotgun assembly and analysis of Fugu rubripes

The compact genome of Fugu rubripes has been sequenced to over 95% coverage, and more than 80% of the assembly is in multigene-sized scaffolds. In this 365-megabase vertebrate genome, repetitive DNA accounts for less than one-sixth of the sequence, and gene loci occupy about one-third of the genome. As with the human genome, gene loci are not evenly distributed, but are clustered into sparse and dense regions. Some "giant" genes were observed that had average coding sequence sizes but were spread over genomic lengths significantly larger than those of their human orthologs. Although three-quarters of predicted human proteins have a strong match to Fugu, approximately a quarter of the human proteins had highly diverged from or had no pufferfish homologs, highlighting the extent of protein evolution in the 450 million years since teleosts and mammals diverged. Conserved linkages between Fugu and human genes indicate the preservation of chromosomal segments from the common vertebrate ancestor, but with considerable scrambling of gene order.

Science 2002 Jul 25; [epub ahead of print].

See GNN article Pufferfish genome reveals nearly a thousand potentially new human genes

50 million years of genomic stasis in endosymbiotic bacteria

Comparison of two fully sequenced genomes of Buchnera aphidicola, the obligate endosymbionts of aphids, reveals the most extreme genome stability to date: no chromosome rearrangements or gene acquisitions have occurred in the past 50 to 70 million years, despite substantial sequence evolution and the inactivation and loss of individual genes. In contrast, the genomes of their closest free-living relatives, Escherichia coli and Salmonella spp., are more than 2000-fold more labile in content and gene order. The genomic stasis of B. aphidicola, likely attributable to the loss of phages, repeated sequences, and recA, indicates that B. aphidicola is no longer a source of ecological innovation for its hosts.

Science 2002 Jun 28;296(5577):2376-9.

See GNN article Inside insects, life is unchanged for 50 million years

The complete genome sequence of Chlorobium tepidum TLS, a photosynthetic, anaerobic, green-sulfur bacterium

The complete genome of the green-sulfur eubacterium Chlorobium tepidum TLS was determined to be a single circular chromosome of 2,154,946 bp. This represents the first genome sequence from the phylum Chlorobia, whose members perform anoxygenic photosynthesis by the reductive tricarboxylic acid cycle. Genome comparisons have identified genes in C. tepidum that are highly conserved among photosynthetic species. Many of these have no assigned function and may play novel roles in photosynthesis or photobiology. Phylogenomic analysis reveals likely duplications of genes involved in biosynthetic pathways for photosynthesis and the metabolism of sulfur and nitrogen as well as strong similarities between metabolic processes in C. tepidum and many Archaeal species.

Proc Natl Acad Sci U S A 2002 Jul 9;99(14):9509-14.

See GNN article Light-harvesting bacterium C. tepidum is sequenced

Comparison of the genomes of two Xanthomonas pathogens with differing host specificities

The genus Xanthomonas is a diverse and economically important group of bacterial phytopathogens, belonging to the gamma-subdivision of the Proteobacteria. Xanthomonas axonopodis pv. citri (Xac) causes citrus canker, which affects most commercial citrus cultivars, resulting in significant losses worldwide. Symptoms include canker lesions, leading to abscission of fruit and leaves and general tree decline. Xanthomonas campestris pv. campestris (Xcc) causes black rot, which affects crucifers such as Brassica and Arabidopsis. Symptoms include marginal leaf chlorosis and darkening of vascular tissue, accompanied by extensive wilting and necrosis. Xanthomonas campestris pv. campestris is grown commercially to produce the exopolysaccharide xanthan gum, which is used as a viscosifying and stabilizing agent in many industries. Here we report and compare the complete genome sequences of Xac and Xcc. Their distinct disease phenotypes and host ranges belie a high degree of similarity at the genomic level. More than 80% of genes are shared, and gene order is conserved along most of their respective chromosomes. We identified several groups of strain-specific genes, and on the basis of these groups we propose mechanisms that may explain the differing host specificities and pathogenic processes.

Nature 2002 May 23;417(6887):459-63.

See GNN article Two Xanthomonas bacteria that damage crops are sequenced

A complete sequence of the T. tengcongensis genome

Thermoanaerobacter tengcongensis is a rod-shaped, gram-negative, anaerobic eubacterium that was isolated from a freshwater hot spring in Tengchong, China. Using a whole-genome-shotgun method, we sequenced its 2,689,445-bp genome from an isolate, MB4(T) (Genbank accession no. AE008691). The genome encodes 2588 predicted coding sequences (CDS). Among them, 1764 (68.2%) are classified according to homology to other documented proteins, and the rest, 824 CDS (31.8%), are functionally unknown. One of the interesting features of the T. tengcongensis genome is that 86.7% of its genes are encoded on the leading strand of DNA replication. Based on protein sequence similarity, the T. tengcongensis genome is most similar to that of Bacillus halodurans, a mesophilic eubacterium, among all fully sequenced prokaryotic genomes up to date. Computational analysis on genes involved in basic metabolic pathways supports the experimental discovery that T. tengcongensis metabolizes sugars as principal energy and carbon source and utilizes thiosulfate and element sulfur, but not sulfate, as electron acceptors. T. tengcongensis, as a gram-negative rod by empirical definitions (such as staining), shares many genes that are characteristics of gram-positive bacteria whereas it is missing molecular components unique to gram-negative bacteria. A strong correlation between the G + C content of tDNA and rDNA genes and the optimal growth temperature is found among the sequenced thermophiles. It is concluded that thermophiles are a biologically and phylogenetically divergent group of prokaryotes that have converged to sustain extreme environmental conditions over evolutionary timescale.

Genome Res 2002 May;12(5):689-700.

See GNN article From a hot spring in China, T. tengcongensis is sequenced

Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis

Comparison of the whole-genome sequence of Bacillus anthracis isolated from a victim of a recent bioterrorist anthrax attack with a reference reveals 60 new markers that include single nucleotide polymorphisms, indels and tandem repeats. Genome comparison detected four highquality SNPs between the two sequenced B. anthracis chromosomes and seven differences between different preparations of the reference genome. These markers have been tested on a collection of anthrax isolates, and were found to divide these samples into distinct families. These results demonstrate that genome-based analysis of microbial pathogens will provide a powerful new tool for investigation of infectious disease outbreaks.

Science 2002 May 8; [epub ahead of print].

See GNN article Florida Anthrax Bacterium Sequenced

Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2)

Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.

Nature 2002 May 9;417(6885):141-147.

See GNN article Medicinal microbe Streptomyces coelicolor is sequenced

The genome of M. acetivorans reveals extensive metabolic and physiological diversity

Methanogenesis, the biological production of methane, plays a pivotal role in the global carbon cycle and contributes significantly to global warming. The majority of methane in nature is derived from acetate. Here we report the complete genome sequence of an acetate-utilizing methanogen, Methanosarcina acetivorans C2A. Methanosarcineae are the most metabolically diverse methanogens, thrive in a broad range of environments, and are unique among the Archaea in forming complex multicellular structures. This diversity is reflected in the genome of M. acetivorans. At 5,751,492 base pairs it is by far the largest known archaeal genome. The 4524 open reading frames code for a strikingly wide and unanticipated variety of metabolic and cellular capabilities. The presence of novel methyltransferases indicates the likelihood of undiscovered natural energy sources for methanogenesis, whereas the presence of single-subunit carbon monoxide dehydrogenases raises the possibility of nonmethanogenic growth. Although motility has not been observed in any Methanosarcineae, a flagellin gene cluster and two complete chemotaxis gene clusters were identified. The availability of genetic methods, coupled with its physiological and metabolic diversity, makes M. acetivorans a powerful model organism for the study of archaeal biology. [Sequence, data, annotations, and analyses are available at The sequence data described in this paper have been submitted to the GenBank data library under accession no. AE010299.]

Genome Res 2002 Apr;12(4):532-42.

See GNN article Key player in global warming: M. acetivorans is sequenced

The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens

We have determined the complete 1,694,969-nt sequence of the GC-rich genome of Methanopyrus kandleri by using a whole direct genome sequencing approach. This approach is based on unlinking of genomic DNA with the ThermoFidelase version of M. kandleri topoisomerase V and cycle sequencing directed by 2'-modified oligonucleotides (Fimers). Sequencing redundancy (3.3x) was sufficient to assemble the genome with less than one error per 40 kb. Using a combination of sequence database searches and coding potential prediction, 1,692 protein-coding genes and 39 genes for structural RNAs were identified. M. kandleri proteins show an unusually high content of negatively charged amino acids, which might be an adaptation to the high intracellular salinity. Previous phylogenetic analysis of 16S RNA suggested that M. kandleri belonged to a very deep branch, close to the root of the archaeal tree. However, genome comparisons indicate that, in both trees constructed using concatenated alignments of ribosomal proteins and trees based on gene content, M. kandleri consistently groups with other archaeal methanogens. M. kandleri shares the set of genes implicated in methanogenesis and, in part, its operon organization with Methanococcus jannaschii and Methanothermobacter thermoautotrophicum. These findings indicate that archaeal methanogens are monophyletic. A distinctive feature of M. kandleri is the paucity of proteins involved in signaling and regulation of gene expression. Also, M. kandleri appears to have fewer genes acquired via lateral transfer than other archaea. These features might reflect the extreme habitat of this organism.

Proc Natl Acad Sci U S A 2002 Apr 2;99(7):4644-4649.

See GNN article The genome of the hyperthermophile Methanopyrus kandleri

A draft sequence of the rice genome (Oryza sativa L. ssp. indica)

We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.

Science 2002 Apr 5;296(5565):79-92.

See GNN article Two Groups Sequence Rice

A draft sequence of the rice genome (Oryza sativa L. ssp. japonica)

The genome of the japonica subspecies of rice, an important cereal and model monocot, was sequenced and assembled by whole-genome shotgun sequencing. The assembled sequence covers 93% of the 420-megabase genome. Gene predictions on the assembled sequence suggest that the genome contains 32,000 to 50,000 genes. Homologs of 98% of the known maize, wheat, and barley proteins are found in rice. Synteny and gene homology between rice and the other cereal genomes are extensive, whereas synteny with Arabidopsis is limited. Assignment of candidate rice orthologs to Arabidopsis genes is possible in many cases. The rice genome sequence provides a foundation for the improvement of cereals, our most important crops.

Science 2002 Apr 5;296(5565):92-100.

See GNN article Two Groups Sequence Rice

Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks

Acute rheumatic fever (ARF), a sequelae of group A Streptococcus (GAS) infection, is the most common cause of preventable childhood heart disease worldwide. The molecular basis of ARF and the subsequent rheumatic heart disease are poorly understood. Serotype M18 GAS strains have been associated for decades with ARF outbreaks in the U.S. As a first step toward gaining new insight into ARF pathogenesis, we sequenced the genome of strain MGAS8232, a serotype M18 organism isolated from a patient with ARF. The genome is a circular chromosome of 1,895,017 bp, and it shares 1.7 Mb of closely related genetic material with strain SF370 (a sequenced serotype M1 strain). Strain MGAS8232 has 178 ORFs absent in SF370. Phages, phage-like elements, and insertion sequences are the major sources of variation between the genomes. The genomes of strain MGAS8232 and SF370 encode many of the same proven or putative virulence factors. Importantly, strain MGAS8232 has genes encoding many additional secreted proteins involved in human-GAS interactions, including streptococcal pyrogenic exotoxin A (scarlet fever toxin) and two uncharacterized pyrogenic exotoxin homologues, all phage-associated. DNA microarray analysis of 36 serotype M18 strains from diverse localities showed that most regions of variation were phages or phage-like elements. Two epidemics of ARF occurring 12 years apart in Salt Lake City, UT, were caused by serotype M18 strains that were genetically identical, or nearly so. Our analysis provides a critical foundation for accelerated research into ARF pathogenesis and a molecular framework to study the plasticity of GAS genomes.

Proc Natl Acad Sci U S A 2002 Apr 2;99(7):4668-4673.

See GNN article Rheumatic Fever Bacterium Sequenced

Genome sequence and analysis of the oral bacterium Fusobacterium nucleatum strain ATCC 25586

We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite ( The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H(2)S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth.

J Bacteriol 2002 Apr;184(7):2005-18.

See GNN article Scientists sequence oral pathogen Fusobacterium nucleatum

Characterization and complete nucleotide sequence of Strawberry mottle virus: a tentative member of a new family of bipartite plant picorna-like viruses

An isolate of Strawberry mottle virus (SMoV) was transferred from Fragaria vesca to Nicotiana occidentalis and Chenopodium quinoa by mechanical inoculation. Electron micrographs of infected tissues showed the presence of isometric particles of approximately 28 nm in diameter. SMoV-associated tubular structures were also conspicuous, particularly in the plasmodesmata of C. quinoa. DsRNA extraction of SMoV-infected N. occidentalis yielded two bands of 6.3 and 7.8 kbp which were cloned and sequenced. Gaps in the sequence, including the 5' and 3' ends, were filled using RT-PCR and RACE. The genome of SMoV was found to consist of RNA1 and RNA2 of 7036 and 5619 nt, respectively, excluding a poly(A) tail. Each RNA encodes one polyprotein and has a 3' non-coding region of approximately 1150 nt. The polyprotein of RNA1 contains regions with identities to helicase, viral genome-linked protein, protease and polymerase (RdRp), and shares its closest similarity with RNA1 of the tentative nepovirus Satsuma dwarf virus (SDV). The polyprotein of RNA2 displayed some similarity to the large coat protein domain of SDV and related viruses. Phylogenetic analysis of the RdRp region showed that SMoV falls into a separate group containing SDV, Apple latent spherical virus, Naval orange infectious mottling virus and Rice tungro spherical virus. Given the size of RNA2 and the presence of a long 3' non-coding region, SMoV is more typical of a nepovirus, although atypically for a nepovirus it is aphid transmissible. We propose that SMoV is a tentative member of an SDV-like lineage of picorna-like viruses.

J Gen Virol 2002 Jan;83(Pt 1):229-39.

See GNN article Scientists sequence Strawberry mottle virus

The genome sequence of Schizosaccharomyces pombe

We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

Nature 2002 Feb 21;415(6874):871-880.

See GNN article Schizosaccharomyces pombe: Second yeast genome sequenced

Genome sequence of the plant pathogen Ralstonia solanacearum

Ralstonia solanacearum is a devastating, soil-borne plant pathogen with a global distribution and an unusually wide host range. It is a model system for the dissection of molecular determinants governing pathogenicity. We present here the complete genome sequence and its analysis of strain GMI1000. The 5.8-megabase (Mb) genome is organized into two replicons: a 3.7-Mb chromosome and a 2.1-Mb megaplasmid. Both replicons have a mosaic structure providing evidence for the acquisition of genes through horizontal gene transfer. Regions containing genetically mobile elements associated with the percentage of G+C bias may have an important function in genome evolution. The genome encodes many proteins potentially associated with a role in pathogenicity. In particular, many putative attachment factors were identified. The complete repertoire of type III secreted effector proteins can be studied. Over 40 candidates were identified. Comparison with other genomes suggests that bacterial plant pathogens and animal pathogens harbour distinct arrays of specialized type III-dependent effectors.

Nature 2002 Jan 31;415(6871):497-502.

See GNN article Scientists sequence the plant pathogen Ralstonia solanacearum

Complete genome sequence of Clostridium perfringens, an anaerobic flesh-eater

Clostridium perfringens is a Gram-positive anaerobic spore-forming bacterium that causes life-threatening gas gangrene and mild enterotoxaemia in humans, although it colonizes as normal intestinal flora of humans and animals. The organism is known to produce a variety of toxins and enzymes that are responsible for the severe myonecrotic lesions. Here we report the complete 3,031,430-bp sequence of C. perfringens strain 13 that comprises 2,660 protein coding regions and 10 rRNA genes, showing pronounced low overall G + C content (28.6%). The genome contains typical anaerobic fermentation enzymes leading to gas production but no enzymes for the tricarboxylic acid cycle or respiratory chain. Various saccharolytic enzymes were found, but many enzymes for amino acid biosynthesis were lacking in the genome. Twenty genes were newly identified as putative virulence factors of C. perfringens, and we found a total of five hyaluronidase genes that will also contribute to virulence. The genome analysis also proved an efficient method for finding four members of the two-component VirR/VirS regulon that coordinately regulates the pathogenicity of C. perfringens. Clearly, C. perfringens obtains various essential materials from the host by producing several degradative enzymes and toxins, resulting in massive destruction of the host tissues.

Proc Natl Acad Sci U S A 2002 Jan 15; [epub ahead of print]

See GNN article The genome sequence of the flesh-eating Clostridium perfringens

Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum

We determined and annotated the complete 2.2-megabase genome sequence of Pyrobaculum aerophilum, a facultatively aerobic nitrate-reducing hyperthermophilic (Topt = 100°C) crenarchaeon. Clues were found suggesting explanations of the organism's surprising intolerance to sulfur, which may aid in the development of methods for genetic studies of the organism. Many interesting features worthy of further genetic studies were revealed. Whole genome computational analysis confirmed experiments showing that P. aerophilum (and perhaps all crenarchaea) lack 5' untranslated regions in their mRNAs and thus appear not to use a ribosome-binding site (Shine-Dalgarno)-based mechanism for translation initiation at the 5' end of transcripts. Inspection of the lengths and distribution of mononucleotide repeat-tracts revealed some interesting features. For instance, it was seen that mononucleotide repeat-tracts of Gs (or Cs) are highly unstable, a pattern expected for an organism deficient in mismatch repair. This result, together with an independent study on mutation rates, suggests a "mutator" phenotype.

Proc Natl Acad Sci U S A 2002 Jan 15; [epub ahead of print]

See GNN article A sequenced hyperthermophile: Pyrobaculum aerophilum

Complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120

The nucleotide sequence of the entire genome of a filamentous cyanobacterium, Anabaena sp. strain PCC 7120, was determined. The genome of Anabaena consisted of a single chromosome (6,413,771 bp) and six plasmids, designated pCC7120alpha (408,101 bp), pCC7120beta (186,614 bp), pCC7120gamma (101,965 bp), pCC7120delta (55,414 bp), pCC7120epsilon (40,340 bp), and pCC7120zeta (5,584 bp). The chromosome bears 5368 potential protein-encoding genes, four sets of rRNA genes, 48 tRNA genes representing 42 tRNA species, and 4 genes for small structural RNAs. The predicted products of 45% of the potential protein-encoding genes showed sequence similarity to known and predicted proteins of known function, and 27% to translated products of hypothetical genes. The remaining 28% lacked significant similarity to genes for known and predicted proteins in the public DNA databases. More than 60 genes involved in various processes of heterocyst formation and nitrogen fixation were assigned to the chromosome based on their similarity to the reported genes. One hundred and ninety-five genes coding for components of two-component signal transduction systems, nearly 2.5 times as many as those in Synechocystis sp. PCC 6803, were identified on the chromosome. Only 37% of the Anabaena genes showed significant sequence similarity to those of Synechocystis, indicating a high degree of divergence of the gene information between the two cyanobacterial strains.

DNA Res 2001 Oct 31;8(5):205-13.

See GNN article Strings of pearls: The genome sequence of Anabaena

The genome sequence of the facultative intracellular pathogen Brucella melitensis

Brucella melitensis is a facultative intracellular bacterial pathogen that causes abortion in goats and sheep and Malta fever in humans. The genome of B. melitensis strain 16M was sequenced and found to contain 3,294,935 bp distributed over two circular chromosomes of 2,117,144 bp and 1,177,787 bp encoding 3,197 ORFs. By using the bioinformatics suite ERGO, 2,487 (78%) ORFs were assigned functions. The origins of replication of the two chromosomes are similar to those of other a-proteobacteria. Housekeeping genes, including those involved in DNA replication, transcription, translation, core metabolism, and cell wall biosynthesis, are distributed on both chromosomes. Type I, II, and III secretion systems are absent, but genes encoding sec-dependent, sec-independent, and flagella-specific type III, type IV, and type V secretion systems as well as adhesins, invasins, and hemolysins were identified. Several features of the B. melitensis genome are similar to those of the symbiotic Sinorhizobium meliloti.

Proc Natl Acad Sci U S A. 2002 Jan 8;99(1):443-448.

See GNN article Sequencing Brucella melitensis, an obscure candidate for biological warfare

The genome of the natural genetic engineer Agrobacterium tumefaciens C58

The 5.67-megabase genome of the plant pathogen Agrobacterium tumefaciens C58 consists of a circular chromosome, a linear chromosome, and two plasmids. Extensive orthology and nucleotide colinearity between the genomes of A. tumefaciens and the plant symbiont Sinorhizobium meliloti suggest a recent evolutionary divergence. Their similarities include metabolic, transport, and regulatory systems that promote survival in the highly competitive rhizosphere; differences are apparent in their genome structure and virulence gene complement. Availability of the A. tumefaciens sequence will facilitate investigations into the molecular basis of pathogenesis and the evolutionary divergence of pathogenic and symbiotic lifestyles.

Science. 2001 Dec 14;294(5550):2317-2323.

Genome sequence of the plant pathogen and biotechnology agent Agrobacterium tumefaciens C58

Agrobacterium tumefaciens is a plant pathogen capable of transferring a defined segment of DNA to a host plant, generating a gall tumor. Replacing the transferred tumor-inducing genes with exogenous DNA allows the introduction of any desired gene into the plant. Thus, A. tumefaciens has been critical for the development of modern plant genetics and agricultural biotechnology. Here we describe the genome of A. tumefaciens strain C58, which has an unusual structure consisting of one circular and one linear chromosome. We discuss genome architecture and evolution and additional genes potentially involved in virulence and metabolic parasitism of host plants.

Science. 2001 Dec 14;294(5550):2323-2328.

See GNN article The Genome of Agrobacterium tumefaciens: A plant pathogen with a talent for transferring genes

Genome sequence of an industrial microorganism Streptomyces avermitilis: Deducing the ability of producing secondary metabolites

Streptomyces avermitilis is a soil bacterium that carries out not only a complex morphological differentiation but also the production of secondary metabolites, one of which, avermectin, is commercially important in human and veterinary medicine. The major interest in this genus Streptomyces is the diversity of its production of secondary metabolites as an industrial microorganism. A major factor in its prominence as a producer of the variety of secondary metabolites is its possession of several metabolic pathways for biosynthesis. Here we report sequence analysis of S. avermitilis, covering 99% of its genome. At least 8.7 million base pairs exist in the linear chromosome; this is the largest bacterial genome sequence, and it provides insights into the intrinsic diversity of the production of the secondary metabolites of Streptomyces. Twenty-five kinds of secondary metabolite gene clusters were found in the genome of S. avermitilis. Four of them are concerned with the biosyntheses of melanin pigments, in which two clusters encode tyrosinase and its cofactor, another two encode an ochronotic pigment derived from homogentiginic acid, and another polyketide-derived melanin. The gene clusters for carotenoid and siderophore biosyntheses are composed of seven and five genes, respectively. There are eight kinds of gene clusters for type-I polyketide compound biosyntheses, and two clusters are involved in the biosyntheses of type-II polyketide-derived compounds. Furthermore, a polyketide synthase that resembles phloroglucinol synthase was detected. Eight clusters are involved in the biosyntheses of peptide compounds that are synthesized by nonribosomal peptide synthetases. These secondary metabolite clusters are widely located in the genome but half of them are near both ends of the genome. The total length of these clusters occupies about 6.4% of the genome.

Proc Natl Acad Sci U S A 2001 Oct 9;98(21):12215-20

See GNN article Antibiotics from a microbe: The genome of an industrial organism

Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi

Microsporidia are obligate intracellular parasites infesting many animal groups. Lacking mitochondria and peroxysomes, these unicellular eukaryotes were first considered a deeply branching protist lineage that diverged before the endosymbiotic event that led to mitochondria. The discovery of a gene for a mitochondrial-type chaperone combined with molecular phylogenetic data later implied that microsporidia are atypical fungi that lost mitochondria during evolution. Here we report the DNA sequences of the 11 chromosomes of the approximately 2.9-megabase (Mb) genome of Encephalitozoon cuniculi (1,997 potential protein-coding genes). Genome compaction is reflected by reduced intergenic spacers and by the shortness of most putative proteins relative to their eukaryote orthologues. The strong host dependence is illustrated by the lack of genes for some biosynthetic pathways and for the tricarboxylic acid cycle. Phylogenetic analysis lends substantial credit to the fungal affiliation of microsporidia. Because the E. cuniculi genome contains genes related to some mitochondrial functions (for example, Fe-S cluster assembly), we hypothesize that microsporidia have retained a mitochondrion-derived organelle.

Nature 2001 Nov 22;414(6862):450-3.

See GNN article Reduced and specialized: The genome of the parasite E. cuniculi

Complete genome sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7

The complete genomic sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7 which optimally grows at 80 degrees C, at low pH, and under aerobic conditions, has been determined by the whole genome shotgun method with slight modifications. The genomic size was 2,694,756 bp long and the G + C content was 32.8%. The following RNA-coding genes were identified: a single 16S-23S rRNA cluster, one 5S rRNA gene and 46 tRNA genes (including 24 intron-containing tRNA genes). The repetitive sequences identified were SR-type repetitive sequences, long dispersed-type repetitive sequences and Tn-like repetitive elements. The genome contained 2826 potential protein-coding regions (open reading frames, ORFs). By similarity search against public databases, 911 (32.2%) ORFs were related to functional assigned genes, 921 (32.6%) were related to conserved ORFs of unknown function, 145 (5.1%) contained some motifs, and remaining 849 (30.0%) did not show any significant similarity to the registered sequences. The ORFs with functional assignments included the candidate genes involved in sulfide metabolism, the TCA cycle and the respiratory chain. Sequence comparison provided evidence suggesting the integration of plasmid, rearrangement of genomic structure, and duplication of genomic regions that may be responsible for the larger genomic size of the S. tokodaii strain7 genome. The genome contained eukaryote-type genes which were not identified in other archaea and lacked the CCA sequence in the tRNA genes. The result suggests that this strain is closer to eukaryotes among the archaea strains so far sequenced. The data presented in this paper are also available on the internet homepage (

DNA Res 2001 Aug 31;8(4):123-40.

See GNN article Sulfolobus tokodaii: A genome from Japan

Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18

Salmonella enterica serovar Typhi (S. typhi) is the aetiological agent of typhoid fever, a serious invasive bacterial disease of humans with an annual global burden of approximately 16 million cases, leading to 600,000 fatalities. Many S. enterica serovars actively invade the mucosal surface of the intestine but are normally contained in healthy individuals by the local immune defence mechanisms. However, S. typhi has evolved the ability to spread to the deeper tissues of humans, including liver, spleen and bone marrow. Here we have sequenced the 4,809,037-base pair (bp) genome of a S. typhi (CT18) that is resistant to multiple drugs, revealing the presence of hundreds of insertions and deletions compared with the Escherichia coli genome, ranging in size from single genes to large islands. Notably, the genome sequence identifies over two hundred pseudogenes, several corresponding to genes that are known to contribute to virulence in Salmonella typhimurium. This genetic degradation may contribute to the human-restricted host range for S. typhi. CT18 harbours a 218,150-bp multiple-drug-resistance incH1 plasmid (pHCM1), and a 106,516-bp cryptic plasmid (pHCM2), which shows recent common ancestry with a virulence plasmid of Yersinia pestis.

Nature 2001 Oct 25;413(6858):848-52.

See GNN article Two sequenced Salmonella genomes

Complete genome sequence of Salmonella enterica serovar Typhimurium LT2

Salmonella enterica subspecies I, serovar Typhimurium (S. typhimurium), is a leading cause of human gastroenteritis, and is used as a mouse model of human typhoid fever. The incidence of non-typhoid salmonellosis is increasing worldwide, causing millions of infections and many deaths in the human population each year. Here we sequenced the 4,857-kilobase (kb) chromosome and 94-kb virulence plasmid of S. typhimurium strain LT2. The distribution of close homologues of S. typhimurium LT2 genes in eight related enterobacteria was determined using previously completed genomes of three related bacteria, sample sequencing of both S. enterica serovar Paratyphi A (S. paratyphi A) and Klebsiella pneumoniae, and hybridization of three unsequenced genomes to a microarray of S. typhimurium LT2 genes. Lateral transfer of genes is frequent, with 11% of the S. typhimurium LT2 genes missing from S. enterica serovar Typhi (S. typhi), and 29% missing from Escherichia coli K12. The 352 gene homologues of S. typhimurium LT2 confined to subspecies I of S. enterica-containing most mammalian and bird pathogens-are useful for studies of epidemiology, host specificity and pathogenesis. Most of these homologues were previously unknown, and 50 may be exported to the periplasm or outer membrane, rendering them accessible as therapeutic or vaccine targets.

Nature 2001 Oct 25;413(6858):852-56.

See GNN article Two sequenced Salmonella genomes

Mechanisms of evolution in Rickettsia conorii and R. prowazekii

Rickettsia conorii is an obligate intracellular bacterium that causes Mediterranean spotted fever in humans. We determined the 1,268,755-nucleotide complete genome sequence of R. conorii, containing 1374 open reading frames. This genome exhibits 804 of the 834 genes of the previously determined R. prowazekii genome plus 552 supplementary open reading frames and a 10-fold increase in the number of repetitive elements. Despite these differences, the two genomes exhibit a nearly perfect colinearity that allowed the clear identification of different stages of gene alterations with gene remnants and 37 genes split in 105 fragments, of which 59 are transcribed. A 38-kilobase sequence inversion was dated shortly after the divergence of the genus.

Science 2001 Sep 14;293(5537):2093-8.

See GNN article Insights into genome evolution: The sequence of Rickettsia conorii

Genome sequence of Yersinia pestis, the causative agent of plague

The Gram-negative bacterium Yersinia pestis is the causative agent of the systemic invasive infectious disease classically referred to as plague, and has been responsible for three human pandemics: the Justinian plague (sixth to eighth centuries), the Black Death (fourteenth to nineteenth centuries) and modern plague (nineteenth century to the present day). The recent identification of strains resistant to multiple drugs and the potential use of Y. pestis as an agent of biological warfare mean that plague still poses a threat to human health. Here we report the complete genome sequence of Y. pestis strain CO92, consisting of a 4.65-megabase (Mb) chromosome and three plasmids of 96.2 kilobases (kb), 70.3 kb and 9.6 kb. The genome is unusually rich in insertion sequences and displays anomalies in GC base-composition bias, indicating frequent intragenomic recombination. Many genes seem to have been acquired from other bacteria and viruses (including adhesins, secretion systems and insecticidal toxins). The genome contains around 150 pseudogenes, many of which are remnants of a redundant enteropathogenic lifestyle. The evidence of ongoing genome fluidity, expansion and decay suggests Y. pestis is a pathogen that has undergone large-scale genetic flux and provides a unique insight into the ways in which new and highly virulent pathogens evolve.

Nature 2001 Oct 4;413(6855):523-7.

See GNN article From stomach bug to blood-borne pathogen: The genome sequence of the plague bacterium

Genome of the bacterium Streptococcus pneumoniae strain R6

Streptococcus pneumoniae is among the most significant causes of bacterial disease in humans. Here we report the 2,038,615-bp genomic sequence of the gram-positive bacterium S. pneumoniae R6. Because the R6 strain is a virulent and, more importantly, because it is readily transformed with DNA from homologous species and many heterologous species, it is the principal platform for investigation of the biology of this important pathogen. It is also used as a primary vehicle for genomics-based development of antibiotics for gram-positive bacteria. In our analysis of the genome, we identified a large number of new uncharacterized genes predicted to encode proteins that either reside on the surface of the cell or are secreted. Among those proteins there may be new targets for vaccine and antibiotic development.

J Bacteriol 2001 Oct;183(19):5709-17.

See GNN article Streptococcus pneumoniae strain R6 is sequenced

The composite genome of the legume symbiont Sinorhizobium meliloti

The scarcity of usable nitrogen frequently limits plant growth. A tight metabolic association with rhizobial bacteria allows legumes to obtain nitrogen compounds by bacterial reduction of dinitrogen (N2) to ammonium (NH4+). We present here the annotated DNA sequence of the alpha-proteobacterium Sinorhizobium meliloti, the symbiont of alfalfa. The tripartite 6.7-megabase (Mb) genome comprises a 3.65-Mb chromosome, and 1.35-Mb pSymA and 1.68-Mb pSymB megaplasmids. Genome sequence analysis indicates that all three elements contribute, in varying degrees, to symbiosis and reveals how this genome may have emerged during evolution. The genome sequence will be useful in understanding the dynamics of interkingdom associations and of life in soil environments.

Science 2001 Jul 27;293(5530):668-72.

See GNN article In symbiosis with alfalfa: The complex genome sequence of Sinorhizobium meliloti

Complete genome sequence of a virulent isolate of Streptococcus pneumoniae

The 2,160,837-base pair genome sequence of an isolate of Streptococcus pneumoniae, a Gram-positive pathogen that causes pneumonia, bacteremia, meningitis, and otitis media, contains 2236 predicted coding regions; of these, 1440 (64%) were assigned a biological role. Approximately 5% of the genome is composed of insertion sequences that may contribute to genome rearrangements through uptake of foreign DNA. Extracellular enzyme systems for the metabolism of polysaccharides and hexosamines provide a substantial source of carbon and nitrogen for S. pneumoniae and also damage host tissues and facilitate colonization. A motif identified within the signal peptide of proteins is potentially involved in targeting these proteins to the cell surface of low-guanine/cytosine (GC) Gram-positive species. Several surface-exposed proteins that may serve as potential vaccine candidates were identified. Comparative genome hybridization with DNA arrays revealed strain differences in S. pneumoniae that could contribute to differences in virulence and antigenicity.

Science 2001 Jul 20;293(5529):498-506.

See GNN article Sugar Transporters and Foreign DNA: The sequenced Streptococcus pneumoniae genome

The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis

Mycoplasma pulmonis is a wall-less eubacterium belonging to the Mollicutes (trivial name, mycoplasmas) and responsible for murine respiratory diseases. The genome of strain UAB CTIP is composed of a single circular 963 879 bp chromosome with a G + C content of 26.6 mol%, i.e. the lowest reported among bacteria, Ureaplasma urealyticum apart. This genome contains 782 putative coding sequences (CDSs) covering 91.4% of its length and a function could be assigned to 486 CDSs whilst 92 matched the gene sequences of hypothetical proteins, leaving 204 CDSs without significant database match. The genome contains a single set of rRNA genes and only 29 tRNAs genes. The replication origin oriC was localized by sequence analysis and by using the G + C skew method. Sequence polymorphisms within stretches of repeated nucleotides generate phase-variable protein antigens whilst a recombinase gene is likely to catalyse the site-specific DNA inversions in major M.pulmonis surface antigens. Furthermore, a hemolysin, secreted nucleases and a glyco-protease are predicted virulence factors. Surprisingly, several of the genes previously reported to be essential for a self-replicating minimal cell are missing in the M.pulmonis genome although this one is larger than the other mycoplasma genomes fully sequenced until now.

Nucleic Acids Res. 2001 May 15;29(10):2145-53.

See GNN article Mycoplasma pulmonis: The genome of a minimalist

The complete genome of the crenarchaeon Sulfolobus solfataricus P2

The genome of the crenarchaeon Sulfolobus solfataricus P2 contains 2,992,245 bp on a single chromosome and encodes 2,977 proteins and many RNAs. One-third of the encoded proteins have no detectable homologs in other sequenced genomes. Moreover, 40% appear to be archaeal-specific, and only 12% and 2.3% are shared exclusively with bacteria and eukarya, respectively. The genome shows a high level of plasticity with 200 diverse insertion sequence elements, many putative nonautonomous mobile elements, and evidence of integrase-mediated insertion events. There are also long clusters of regularly spaced tandem repeats. Different transfer systems are used for the uptake of inorganic and organic solutes, and a wealth of intracellular and extracellular proteases, sugar, and sulfur metabolizing enzymes are encoded, as well as enzymes of the central metabolic pathways and motility proteins. The major metabolic electron carrier is not NADH as in bacteria and eukarya but probably ferredoxin. The essential components required for DNA replication, DNA repair and recombination, the cell cycle, transcriptional initiation and translation, but not DNA folding, show a strong eukaryal character with many archaeal-specific features. The results illustrate major differences between crenarchaea and euryarchaea, especially for their DNA replication mechanism and cell cycle processes and their translational apparatus.

Proc Natl Acad Sci U S A. 2001 Jul 3;98(14):7835-7840.

See GNN article Life at very high temperatures: The genome of Sulfolobus solfataricus is sequenced

Complete genome sequence of an M1 strain of Streptococcus pyogenes

The 1,852,442-bp sequence of an M1 strain of Streptococcus pyogenes, a Gram-positive pathogen, has been determined and contains 1,752 predicted protein-encoding genes. Approximately one-third of these genes have no identifiable function, with the remainder falling into previously characterized categories of known microbial function. Consistent with the observation that S. pyogenes is responsible for a wider variety of human disease than any other bacterial species, more than 40 putative virulence-associated genes have been identified. Additional genes have been identified that encode proteins likely associated with microbial "molecular mimicry" of host characteristics and involved in rheumatic fever or acute glomerulonephritis. The complete or partial sequence of four different bacteriophage genomes is also present, with each containing genes for one or more previously undiscovered superantigen-like proteins. These prophage-associated genes encode at least six potential virulence factors, emphasizing the importance of bacteriophages in horizontal gene transfer and a possible mechanism for generating new strains with increased pathogenic potential.

Proc Natl Acad Sci U S A 2001 Apr 10;98(8):4658-63.

See GNN article Genome sequence of Streptococcus pyogenes, the flesh-eating bacterium

The Complete Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis ssp. lactis IL1403

Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group. [The sequence data described in this paper has been submitted to the GenBank data library under accession no. AE005176.]

Genome Res 2001 May;11(5):731-53.

See GNN article French scientists sequence the genome of a bacterium vital for cheese production

Complete genome sequence of Caulobacter crescentus

The complete genome sequence of was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extra cytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living -class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus.

Proc Natl Acad Sci U S A 2001 Mar 27;98(7):4136-41.

See GNN article Genome sequence of the bacterium Caulobacter crescentus

Complete genomic sequence of Pasteurella multocida,Pm70

We present here the complete genome sequence of a common avian clone of Pasteurella multocida, Pm70. The genome of Pm70 is a single circular chromosome 2,257,487 base pairs in length and contains 2,014 predicted coding regions, 6 ribosomal RNA operons, and 57 tRNAs. Genome-scale evolutionary analyses based on pairwise comparisons of 1,197 orthologous sequences between P. multocida, Haemophilus influenzae, and Escherichia coli suggest that P. multocida and H. influenzae diverged approximately 270 million years ago and the gamma subdivision of the proteobacteria radiated about 680 million years ago. Two previously undescribed open reading frames, accounting for approximately 1% of the genome, encode large proteins with homology to the virulence-associated filamentous hemagglutinin of Bordetella pertussis. Consistent with the critical role of iron in the survival of many microbial pathogens, in silico and whole-genome microarray analyses identified more than 50 Pm70 genes with a potential role in iron acquisition and metabolism. Overall, the complete genomic sequence and preliminary functional analyses provide a foundation for future research into the mechanisms of pathogenesis and host specificity of this important multispecies pathogen.

Proc Natl Acad SCI U S A 2001 Mar 13;98(6):3460-5.

See GNN article Genome sequence of Pasteurella multocida

Massive gene decay in the leprosy bacillus

Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.

Nature 2001 Feb 22;409(6823):1007-11.

See GNN article Genome of bacterium that causes leprosy is sequenced

The sequence of the human genome

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion BP DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-BP segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 BP or more, and 25% of the genome is in scaffolds of 10 million BP or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 BP per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

Science 2001 Feb 16;291(5507):1304-51.

See GNN articles:

One Gene, Many Proteins

Genomes, Proteomes, and Medicine

Whose Genome Is It Anyway?

The landscape of the genome: deserts, islands and oases

Making predictions about the proteome

Don't despair. You have at least twice as many genes as a fruit fly.

Sizing up genomes: Amoeba is king

Genome sequence of enterohaemorrhagic Escherichia coli O157:H7

The bacterium Escherichia coli O157:H7 is a worldwide threat to public health and has been implicated in many outbreaks of haemorrhagic colitis, some of which included fatalities caused by haemolytic uraemic syndrome. Close to 75,000 cases of O157:H7 infection are now estimated to occur annually in the United States. The severity of disease, the lack of effective treatment and the potential for large-scale outbreaks from contaminated food supplies have propelled intensive research on the pathogenesis and detection of E. coli O157:H7 (ref. 4). Here we have sequenced the genome of E. coli O157:H7 to identify candidate genes responsible for pathogenesis, to develop better methods of strain detection and to advance our understanding of the evolution of E. coli, through comparison with the genome of the non-pathogenic laboratory strain E. coli K-12 (ref. 5). We find that lateral gene transfer is far more extensive than previously anticipated. In fact, 1,387 new genes encoded in strain-specific clusters of diverse sizes were found in O157:H7. These include candidate virulence factors, alternative metabolic capacities, several prophages and other new functions—all of which could be targets for surveillance.

Nature 2001 Jan 25;409(6819):529-33.

See GNN article Researchers sequence strain of E. coli linked to disease

Analysis of the genome sequence of the flowering plant Arabidopsis thaliana

The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans—the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

Nature 2000 Dec 14;408(6814):796-815.

See GNN article First complete plant genome yields bountiful harvest of genes

Genome sequence of Halobacterium species NRC-1

We report the complete sequence of an extreme halophile, Halobacterium sp. NRC-1, harboring a dynamic 2,571,010-BP genome containing 91 insertion sequences representing 12 families and organized into a large chromosome and 2 related minichromosomes. The Halobacterium NRC-1 genome codes for 2,630 predicted proteins, 36% of which are unrelated to any previously reported. Analysis of the genome sequence shows the presence of pathways for uptake and utilization of amino acids, active sodium-proton antiporter and potassium uptake systems, sophisticated photosensory and signal transduction pathways, and DNA replication, transcription, and translation systems resembling more complex eukaryotic organisms. Whole proteome comparisons show the definite archaeal nature of this halophile with additional similarities to the Gram-positive Bacillus subtilis and other bacteria. The ease of culturing Halobacterium and the availability of methods for its genetic manipulation in the laboratory, including construction of gene knockouts and replacements, indicate this halophile can serve as an excellent model system among the archaea.

Proc Natl Acad SCI U S A 2000 Oct 24;97(22):12176-81.

See GNN article Some like it salty: Halobacterium genome sequenced

The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum

Thermoplasma acidophilum is a thermoacidophilic archaeon that thrives at 59 degrees C and pH 2, which was isolated from self-heating coal refuse piles and solfatara fields. Species of the genus Thermoplasma do not possess a rigid cell wall, but are only delimited by a plasma membrane. Many macromolecular assemblies from Thermoplasma, primarily proteases and chaperones, have been pivotal in elucidating the structure and function of their more complex eukaryotic homologues. Our interest in protein folding and degradation led us to seek a more complete representation of the proteins involved in these pathways by determining the genome sequence of the organism. Here we have sequenced the 1,564,905-base-pair genome in just 7,855 sequencing reactions by using a new strategy. The 1,509 open reading frames identify Thermoplasma as a typical euryarchaeon with a substantial complement of bacteria-related genes; however, evidence indicates that there has been much lateral gene transfer between Thermoplasma and Sulfolobus solfataricus, a phylogenetically distant crenarchaeon inhabiting the same environment. At least 252 open reading frames, including a complete protein degradation pathway and various transport proteins, resemble Sulfolobus proteins most closely.

Nature 2000 Sep 28;407(6803):508-13.

See GNN article Thermoplasma acidophilum: Living the hot, acidic life

Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS

Almost all aphid species (Homoptera, Insecta) have 60-80 huge cells called bacteriocytes, within which are round-shaped bacteria that are designated Buchnera. These bacteria are maternally transmitted to eggs and embryos through host generations, and the mutualism between the host and the bacteria is so obligate that neither can reproduce independently. Buchnera is a close relative of Escherichia coli, but it contains more than 100 genomic copies per cell, and its genome size is only a seventh of that of E. coli. Here we report the complete genome sequence of Buchnera sp. strain APS, which is composed of one 640,681-base-pair chromosome and two small plasmids. There are genes for the biosyntheses of amino acids essential for the hosts in the genome, but those for non-essential amino acids are missing, indicating complementarity and syntrophy between the host and the symbiont. In addition, Buchnera lacks genes for the biosynthesis of cell-surface components, including lipopolysaccharides and phospholipids, regulator genes and genes involved in defence of the cell. These results indicate that Buchnera is completely symbiotic and viable only in its limited niche, the bacteriocyte.

Nature 2000 Sep 7;407(6800):81-6.

See GNN article Buchnera: the genomic evolution of a bacterium

Complete genome sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen

Pseudomonas aeruginosa is a ubiquitous environmental bacterium that is one of the top three causes of opportunistic human infections. A major factor in its prominence as a pathogen is its intrinsic resistance to antibiotics and disinfectants. Here we report the complete sequence of P. aeruginosa strain PAO1. At 6.3 million base pairs, this is the largest bacterial genome sequenced, and the sequence provides insights into the basis of the versatility and intrinsic drug resistance of P. aeruginosa. Consistent with its larger genome size and environmental adaptability, P. aeruginosa contains the highest proportion of regulatory genes observed for a bacterial genome and a large number of genes involved in the catabolism, transport and efflux of organic compounds as well as four potential chemotaxis systems. We propose that the size and complexity of the P. aeruginosa genome reflect an evolutionary adaptation permitting it to thrive in diverse environments and resist the effects of a variety of antimicrobial substances.

Nature 2000 Aug 31;406(6799):959-64.

See GNN article The Pseudomonas aeruginosa genome

DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae

Here we determine the complete genomic sequence of the gram negative, gamma-Proteobacterium Vibrio cholerae El Tor N16961 to be 4,033,460 base pairs (BP). The genome consists of two circular chromosomes of 2,961,146 BP and 1,072,314 BP that together encode 3,885 open reading frames. The vast majority of recognizable genes for essential cell functions (such as DNA replication, transcription, translation and cell-wall biosynthesis) and pathogenicity (for example, toxins, surface antigens and adhesins) are located on the large chromosome. In contrast, the small chromosome contains a larger fraction (59%) of hypothetical genes compared with the large chromosome (42%), and also contains many more genes that appear to have origins other than the gamma-Proteobacteria. The small chromosome also carries a gene capture system (the integron island) and host 'addiction' genes that are typically found on plasmids; thus, the small chromosome may have originally been a megaplasmid that was captured by an ancestral Vibrio species. The V. cholerae genomic sequence provides a starting point for understanding how a free-living, environmental organism emerged to become a significant human bacterial pathogen.

Nature 2000 Aug 3;406(6795):477-83.

See GNN article Cholera genome sequenced

The genome sequence of the plant pathogen Xylella fastidiosa. The Xylella fastidiosa Consortium of the Organization for Nucleotide Sequencing and Analysis

Xylella fastidiosa is a fastidious, xylem-limited bacterium that causes a range of economically important plant diseases. Here we report the complete genome sequence of X. fastidiosa clone 9a5c, which causes citrus variegated chlorosis—a serious disease of orange trees. The genome comprises a 52.7% GC-rich 2,679,305-base-pair (BP) circular chromosome and two plasmids of 51,158 BP and 1,285 BP We can assign putative functions to 47% of the 2,904 predicted coding regions. Efficient metabolic functions are predicted, with sugars as the principal energy and carbon source, supporting existence in the nutrient-poor xylem sap. The mechanisms associated with pathogenicity and virulence involve toxins, antibiotics and ion sequestration systems, as well as bacterium-bacterium and bacterium-host interactions mediated by a range of proteins. Orthologues of some of these proteins have only been identified in animal and human pathogens; their presence in X. fastidiosa indicates that the molecular basis for bacterial pathogenicity is both conserved and independent of host. At least 83 genes are bacteriophage-derived and include virulence-associated genes from other bacteria, providing direct evidence of phage-mediated horizontal gene transfer.

Nature 2000 Jul 13;406(6792):151-7.

See GNN article Genome of bacteria Xylella fastidiosa, a threat to fruit and nut crops, is sequenced

Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491

Neisseria meningitidis causes bacterial meningitis and is therefore responsible for considerable morbidity and mortality in both the developed and the developing world. Meningococci are opportunistic pathogens that colonize the nasopharynges and oropharynges of asymptomatic carriers. For reasons that are still mostly unknown, they occasionally gain access to the blood, and subsequently to the cerebrospinal fluid, to cause septicaemia and meningitis. N. meningitidis strains are divided into a number of serogroups on the basis of the immunochemistry of their capsular polysaccharides; serogroup A strains are responsible for major epidemics and pandemics of meningococcal disease, and therefore most of the morbidity and mortality associated with this disease. Here we have determined the complete genome sequence of a serogroup A strain of Neisseria meningitidis, Z2491. The sequence is 2,184,406 base pairs in length, with an overall G+C content of 51.8%, and contains 2,121 predicted coding sequences. The most notable feature of the genome is the presence of many hundreds of repetitive elements, ranging from short repeats, positioned either singly or in large multiple arrays, to insertion sequences and gene duplications of one kilobase or more. Many of these repeats appear to be involved in genome fluidity and antigenic variation in this important human pathogen.

Nature 2000 Mar 30;404(6777):502-6.

See GNN article Learning our ABCs: The bacteria that cause meningitis

The genome sequence of Drosophila melanogaster

The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

Science 2000 Mar 24;287(5461):2185-95.

See GNN articles:

The humanized fly

A fruitful collaboration

Complete genome sequence of Neisseria meningitidis serogroup B strain MC58

The 2,272,351-base pair genome of Neisseria meningitidis strain MC58 (serogroup B), a causative agent of meningitis and septicemia, contains 2158 predicted coding regions, 1158 (53.7%) of which were assigned a biological role. Three major islands of horizontal DNA transfer were identified; two of these contain genes encoding proteins involved in pathogenicity, and the third island contains coding sequences only for hypothetical proteins. Insights into the commensal and virulence behavior of N. meningitidis can be gleaned from the genome, in which sequences for structural proteins of the pilus are clustered and several coding regions unique to serogroup B capsular polysaccharide synthesis can be identified. Finally, N. meningitidis contains more genes that undergo phase variation than any pathogen studied to date, a mechanism that controls their expression and contributes to the evasion of the host immune system.

Science 2000 Mar 10;287(5459):1809-15.

See GNN articles:

Bacteria use quick-switch genes to dodge host defenses

New genome sequence focuses search for type B meningitis vaccine

. . .

Back to GNN Home Page