The signature genes tool developed for the SEED and implemented at the National
Microbial Pathogen Data Resource (NMPDR), www.nmpdr.org,
was used to compare the
translated genomes of all completely sequenced strains of
Streptococcus pyogenes
(group A streptococcus or GAS) to define a core genome for this human pathogen.
The tool allows the user to select a reference genome to compare with any number
of genomes selected in a comparison set. The commonality factor is set to 80% by
default but may be reset by the user. For example, the 80% common core of GAS with
respect to the strain with the largest genome (MGAS 10750) contains 1,476 proteins
that have bidirectional, best BlastP hits (BBH), at an E-value of 1 x 10-10 or
less, in 10 of the 12 available genomes. Increasing the stringency of the analysis
to 100% reduces the number of core proteins to 1,368. In addition to determining
the proteins in common to a set of genomes, we used the signature genes tool to
define a signature set of proteins that distinguishes the strains having the same
M-type, e.g. M1 and M12. The information generated by this genome comparison could
be used to design a microarray for the simultaneous analysis of the core GAS genome
as well as signatures for each sequenced strain or M-type. The bioinformatics analysis
reveals interesting consistencies and inconsistencies which generate hypotheses
for testing on microarrays. Because protein functions in NMPDR are organized in
subsystems, it is possible to infer functional differences imparted by gene signatures.
Subsystems annotation is used for metabolic reconstruction, analysis of central
machinery and signaling pathways, finding missing genes, integrating regulatory
networks, detection of horizontally transferred genes, and prediction of the functions
of hypothetical proteins. We show examples on the use of NMPDR applications and
the SEED subsystems in understanding the pathogenesis of S. pyogenes, the evolution
of its virulent strains, and the study of horizontally transferred toxins. These
genome annotation and comparison tools provide unprecedented information about GAS
biology and pathogenesis, and can be eventually applied to all sequenced bacterial
pathogens.
|