Limits...
Comparative supragenomic analyses among the pathogens Staphylococcus aureus, Streptococcus pneumoniae, and Haemophilus influenzae using a modification of the finite supragenome model.

Boissy R, Ahmed A, Janto B, Earl J, Hall BG, Hogg JS, Pusch GD, Hiller LN, Powell E, Hayes J, Yu S, Kathju S, Stoodley P, Post JC, Ehrlich GD, Hu FZ - BMC Genomics (2011)

Bottom Line: We developed a revised version of our finite supragenome model to estimate the size of the S. aureus supragenome (3,221 genes, with 2,245 core genes), and compared it with those of Haemophilus influenzae and Streptococcus pneumoniae.Using a multi-species comparative supragenomic analysis enabled by an improved version of our finite supragenome model we provide data and an interpretation explaining the relatively larger core genome of S. aureus compared to other opportunistic nasopharyngeal pathogens.In addition, we provide independent validation for the efficiency and effectiveness of our orthologous gene clustering algorithm.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Genomic Sciences, Allegheny-Singer Research Institute, Pittsburgh, PA 15212, USA.

ABSTRACT

Background: Staphylococcus aureus is associated with a spectrum of symbiotic relationships with its human host from carriage to sepsis and is frequently associated with nosocomial and community-acquired infections, thus the differential gene content among strains is of interest.

Results: We sequenced three clinical strains and combined these data with 13 publically available human isolates and one bovine strain for comparative genomic analyses. All genomes were annotated using RAST, and then their gene similarities and differences were delineated. Gene clustering yielded 3,155 orthologous gene clusters, of which 2,266 were core, 755 were distributed, and 134 were unique. Individual genomes contained between 2,524 and 2,648 genes. Gene-content comparisons among all possible S. aureus strain pairs (n = 136) revealed a mean difference of 296 genes and a maximum difference of 476 genes. We developed a revised version of our finite supragenome model to estimate the size of the S. aureus supragenome (3,221 genes, with 2,245 core genes), and compared it with those of Haemophilus influenzae and Streptococcus pneumoniae. There was excellent agreement between RAST's annotations and our CDS clustering procedure providing for high fidelity metabolomic subsystem analyses to extend our comparative genomic characterization of these strains.

Conclusions: Using a multi-species comparative supragenomic analysis enabled by an improved version of our finite supragenome model we provide data and an interpretation explaining the relatively larger core genome of S. aureus compared to other opportunistic nasopharyngeal pathogens. In addition, we provide independent validation for the efficiency and effectiveness of our orthologous gene clustering algorithm.

Show MeSH

Related in: MedlinePlus

Finite supragenome model results using (K = 6) variable population gene frequency classes. In our previous supragenome analyses carried out with Haemophilus influenzae and Streptococcus pneumoniae we used a version of the finite supragenome model that required fixed population gene frequency classes. This model has been updated to make the optimization function (the log-likelihood of the observed sample gene frequency histogram, i.e., the observed gene frequency class distribution among the /S/ strains examined) dependent on the values of the population gene frequency vector (μ) as well as the values of the corresponding mixture coefficient vector (π, for the probability that a gene in a supragenome will be represented in one of the K classes of population gene frequencies). For a given species, the bottom graph plots the values of the vector μ against the product of the estimate of supragenome size and the values of the vector π, all obtained at the maximization of the log-likelihood function.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3094309&req=5

Figure 3: Finite supragenome model results using (K = 6) variable population gene frequency classes. In our previous supragenome analyses carried out with Haemophilus influenzae and Streptococcus pneumoniae we used a version of the finite supragenome model that required fixed population gene frequency classes. This model has been updated to make the optimization function (the log-likelihood of the observed sample gene frequency histogram, i.e., the observed gene frequency class distribution among the /S/ strains examined) dependent on the values of the population gene frequency vector (μ) as well as the values of the corresponding mixture coefficient vector (π, for the probability that a gene in a supragenome will be represented in one of the K classes of population gene frequencies). For a given species, the bottom graph plots the values of the vector μ against the product of the estimate of supragenome size and the values of the vector π, all obtained at the maximization of the log-likelihood function.

Mentions: An overview of the results obtained using the revised model is shown for three human bacterial pathogens: Haemophilus influenzae, Streptococcus pneumoniae, and Staphylococcus aureus (Figure 3). The results obtained for these three supragenomes differ significantly in their plots of the log-likelihood of the observed data against the values of supragenome size N that were examined during the optimization. Fortuitously, these results illustrate two contrasting types of supragenomes (H. influenzae and S. aureus) and a third (S. pneumoniae) whose general characteristics are intermediate between these two types. Thus, a broad plateau was observed in this plot for H. influenzae, whereas the log-likelihood plot for S. aureus declined very abruptly at estimated values of N that were significantly greater than the estimated size of its supragenome (Figure 3, upper panels). The revised supragenome model employed herein has the advantage that values of μk (where k <K) are allowed to vary during the maximization of the log-likelihood. Hence a priori estimates of fixed values for these parameters (i.e., as was required in our initial supragenome model)--a procedure that the bottom panels of Figure 3 show is difficult--are conveniently avoided. At the extreme case of the lowest population gene frequency class, the values of μ1 and π1 at the maximization of the log-likelihood of the observed data indicate that the H. influenzae supragenome is dominated by a large pool of very rare genes. In contrast, the value for μ1 at the maximization of the log-likelihood of the observed data for the S. aureus supragenome (0.11) is an order of magnitude greater than that of H. influenzae. At the other extreme of population gene frequencies, even though the estimated size of the S. aureus supragenome at 3,221 chromosomal genes is the smallest value for N observed among these three species, the absolute number of S. aureus core genes (2,245) and their fraction of N (i.e., the value of πK = 0.6971) are both significantly greater than the same values for either H. influenzae or S. pneumoniae (Figure 3, lower panels). This estimate that approximately 30% of the S. aureus genes are non-core is in reasonable agreement with the results of earlier, more limited studies that used comparative genomic hybridization to estimate a value for this parameter of 22% [30].


Comparative supragenomic analyses among the pathogens Staphylococcus aureus, Streptococcus pneumoniae, and Haemophilus influenzae using a modification of the finite supragenome model.

Boissy R, Ahmed A, Janto B, Earl J, Hall BG, Hogg JS, Pusch GD, Hiller LN, Powell E, Hayes J, Yu S, Kathju S, Stoodley P, Post JC, Ehrlich GD, Hu FZ - BMC Genomics (2011)

Finite supragenome model results using (K = 6) variable population gene frequency classes. In our previous supragenome analyses carried out with Haemophilus influenzae and Streptococcus pneumoniae we used a version of the finite supragenome model that required fixed population gene frequency classes. This model has been updated to make the optimization function (the log-likelihood of the observed sample gene frequency histogram, i.e., the observed gene frequency class distribution among the /S/ strains examined) dependent on the values of the population gene frequency vector (μ) as well as the values of the corresponding mixture coefficient vector (π, for the probability that a gene in a supragenome will be represented in one of the K classes of population gene frequencies). For a given species, the bottom graph plots the values of the vector μ against the product of the estimate of supragenome size and the values of the vector π, all obtained at the maximization of the log-likelihood function.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3094309&req=5

Figure 3: Finite supragenome model results using (K = 6) variable population gene frequency classes. In our previous supragenome analyses carried out with Haemophilus influenzae and Streptococcus pneumoniae we used a version of the finite supragenome model that required fixed population gene frequency classes. This model has been updated to make the optimization function (the log-likelihood of the observed sample gene frequency histogram, i.e., the observed gene frequency class distribution among the /S/ strains examined) dependent on the values of the population gene frequency vector (μ) as well as the values of the corresponding mixture coefficient vector (π, for the probability that a gene in a supragenome will be represented in one of the K classes of population gene frequencies). For a given species, the bottom graph plots the values of the vector μ against the product of the estimate of supragenome size and the values of the vector π, all obtained at the maximization of the log-likelihood function.
Mentions: An overview of the results obtained using the revised model is shown for three human bacterial pathogens: Haemophilus influenzae, Streptococcus pneumoniae, and Staphylococcus aureus (Figure 3). The results obtained for these three supragenomes differ significantly in their plots of the log-likelihood of the observed data against the values of supragenome size N that were examined during the optimization. Fortuitously, these results illustrate two contrasting types of supragenomes (H. influenzae and S. aureus) and a third (S. pneumoniae) whose general characteristics are intermediate between these two types. Thus, a broad plateau was observed in this plot for H. influenzae, whereas the log-likelihood plot for S. aureus declined very abruptly at estimated values of N that were significantly greater than the estimated size of its supragenome (Figure 3, upper panels). The revised supragenome model employed herein has the advantage that values of μk (where k <K) are allowed to vary during the maximization of the log-likelihood. Hence a priori estimates of fixed values for these parameters (i.e., as was required in our initial supragenome model)--a procedure that the bottom panels of Figure 3 show is difficult--are conveniently avoided. At the extreme case of the lowest population gene frequency class, the values of μ1 and π1 at the maximization of the log-likelihood of the observed data indicate that the H. influenzae supragenome is dominated by a large pool of very rare genes. In contrast, the value for μ1 at the maximization of the log-likelihood of the observed data for the S. aureus supragenome (0.11) is an order of magnitude greater than that of H. influenzae. At the other extreme of population gene frequencies, even though the estimated size of the S. aureus supragenome at 3,221 chromosomal genes is the smallest value for N observed among these three species, the absolute number of S. aureus core genes (2,245) and their fraction of N (i.e., the value of πK = 0.6971) are both significantly greater than the same values for either H. influenzae or S. pneumoniae (Figure 3, lower panels). This estimate that approximately 30% of the S. aureus genes are non-core is in reasonable agreement with the results of earlier, more limited studies that used comparative genomic hybridization to estimate a value for this parameter of 22% [30].

Bottom Line: We developed a revised version of our finite supragenome model to estimate the size of the S. aureus supragenome (3,221 genes, with 2,245 core genes), and compared it with those of Haemophilus influenzae and Streptococcus pneumoniae.Using a multi-species comparative supragenomic analysis enabled by an improved version of our finite supragenome model we provide data and an interpretation explaining the relatively larger core genome of S. aureus compared to other opportunistic nasopharyngeal pathogens.In addition, we provide independent validation for the efficiency and effectiveness of our orthologous gene clustering algorithm.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Genomic Sciences, Allegheny-Singer Research Institute, Pittsburgh, PA 15212, USA.

ABSTRACT

Background: Staphylococcus aureus is associated with a spectrum of symbiotic relationships with its human host from carriage to sepsis and is frequently associated with nosocomial and community-acquired infections, thus the differential gene content among strains is of interest.

Results: We sequenced three clinical strains and combined these data with 13 publically available human isolates and one bovine strain for comparative genomic analyses. All genomes were annotated using RAST, and then their gene similarities and differences were delineated. Gene clustering yielded 3,155 orthologous gene clusters, of which 2,266 were core, 755 were distributed, and 134 were unique. Individual genomes contained between 2,524 and 2,648 genes. Gene-content comparisons among all possible S. aureus strain pairs (n = 136) revealed a mean difference of 296 genes and a maximum difference of 476 genes. We developed a revised version of our finite supragenome model to estimate the size of the S. aureus supragenome (3,221 genes, with 2,245 core genes), and compared it with those of Haemophilus influenzae and Streptococcus pneumoniae. There was excellent agreement between RAST's annotations and our CDS clustering procedure providing for high fidelity metabolomic subsystem analyses to extend our comparative genomic characterization of these strains.

Conclusions: Using a multi-species comparative supragenomic analysis enabled by an improved version of our finite supragenome model we provide data and an interpretation explaining the relatively larger core genome of S. aureus compared to other opportunistic nasopharyngeal pathogens. In addition, we provide independent validation for the efficiency and effectiveness of our orthologous gene clustering algorithm.

Show MeSH
Related in: MedlinePlus