Limits...
Reduced set of virulence genes allows high accuracy prediction of bacterial pathogenicity in humans.

Iraola G, Vazquez G, Spangenberg L, Naya H - PLoS ONE (2012)

Bottom Line: An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens.A reduced subset of highly informative genes (120) is presented and applied to an external validation set.Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions.

View Article: PubMed Central - PubMed

Affiliation: Unidad de Bioinformática, Institut Pasteur Montevideo, Montevideo, Uruguay.

ABSTRACT
Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of 814 different virulence-related genes among more than 600 finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes (120) is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at http : ==bacfier:googlecode:com=files=Bacfier v1 0:zip), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions.

Show MeSH

Related in: MedlinePlus

Frequency distribution of ABC transporter genes in Alphaproteobacteria and Gammaproteobacteria.For each gene, abcisse value is the number of pathogenic strains inside a certain taxonomic group in which it is present, divided by the total number of pathogenic strains inside the taxonomic group. The ordinate value is the same but for the non-pathogenic strains inside the group. White circles show that genes coding for ABC transporters are more frequent in pathogenic species of Gammaproteobacteria than in non-pathogenic species of this group. The opposite pattern is observed for Alphaproteobacteria in black circles.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3412846&req=5

pone-0042144-g002: Frequency distribution of ABC transporter genes in Alphaproteobacteria and Gammaproteobacteria.For each gene, abcisse value is the number of pathogenic strains inside a certain taxonomic group in which it is present, divided by the total number of pathogenic strains inside the taxonomic group. The ordinate value is the same but for the non-pathogenic strains inside the group. White circles show that genes coding for ABC transporters are more frequent in pathogenic species of Gammaproteobacteria than in non-pathogenic species of this group. The opposite pattern is observed for Alphaproteobacteria in black circles.

Mentions: The analysis was accomplished by calculating the frequency of genes belonging to each functional category in pathogenic and non-pathogenic species of each taxon. The assumed hypothesis was that, if a certain gene is not related to pathogenicity, its frequency would not be biased towards pathogenic or non-pathogenic species; furthermore, it would be almost equally distributed within both classes. Genes presenting a high frequency among pathogens and a low frequency in non-pathogens are probably contributing to a pathogen-related phenotype, for example genes coding for toxins. Conversely, a gene that presents low frequency among pathogens and high frequency in non-pathogens could be indicating the loss of genes coding for redundant functions. For example, proteins that transport certain molecules across membranes, which are essential for a free-living style, are often dispensable when bacteria are well-adapted to the environment inside their hosts. The frequency distribution of ABC transporter genes in Alphaproteobacteria and Gammaproteobacteria clearly exemplifies this situation. Figure 2 shows the frequency of each gene in pathogenic and non-pathogenic organisms. Points falling on the diagonal line represent genes whose frequency is balanced between pathogens and non-pathogens. Points closer to the Y axis are more represented in non-pathogens and points closer to the X axis are more frequent in pathogens. As it is shown in this figure, ABC genes are strongly related to non-pathogenic species in Alphaproteobacteria, while there are overrepresented in pathogenic species in Gammaproteobacteria (Figure 2).


Reduced set of virulence genes allows high accuracy prediction of bacterial pathogenicity in humans.

Iraola G, Vazquez G, Spangenberg L, Naya H - PLoS ONE (2012)

Frequency distribution of ABC transporter genes in Alphaproteobacteria and Gammaproteobacteria.For each gene, abcisse value is the number of pathogenic strains inside a certain taxonomic group in which it is present, divided by the total number of pathogenic strains inside the taxonomic group. The ordinate value is the same but for the non-pathogenic strains inside the group. White circles show that genes coding for ABC transporters are more frequent in pathogenic species of Gammaproteobacteria than in non-pathogenic species of this group. The opposite pattern is observed for Alphaproteobacteria in black circles.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3412846&req=5

pone-0042144-g002: Frequency distribution of ABC transporter genes in Alphaproteobacteria and Gammaproteobacteria.For each gene, abcisse value is the number of pathogenic strains inside a certain taxonomic group in which it is present, divided by the total number of pathogenic strains inside the taxonomic group. The ordinate value is the same but for the non-pathogenic strains inside the group. White circles show that genes coding for ABC transporters are more frequent in pathogenic species of Gammaproteobacteria than in non-pathogenic species of this group. The opposite pattern is observed for Alphaproteobacteria in black circles.
Mentions: The analysis was accomplished by calculating the frequency of genes belonging to each functional category in pathogenic and non-pathogenic species of each taxon. The assumed hypothesis was that, if a certain gene is not related to pathogenicity, its frequency would not be biased towards pathogenic or non-pathogenic species; furthermore, it would be almost equally distributed within both classes. Genes presenting a high frequency among pathogens and a low frequency in non-pathogens are probably contributing to a pathogen-related phenotype, for example genes coding for toxins. Conversely, a gene that presents low frequency among pathogens and high frequency in non-pathogens could be indicating the loss of genes coding for redundant functions. For example, proteins that transport certain molecules across membranes, which are essential for a free-living style, are often dispensable when bacteria are well-adapted to the environment inside their hosts. The frequency distribution of ABC transporter genes in Alphaproteobacteria and Gammaproteobacteria clearly exemplifies this situation. Figure 2 shows the frequency of each gene in pathogenic and non-pathogenic organisms. Points falling on the diagonal line represent genes whose frequency is balanced between pathogens and non-pathogens. Points closer to the Y axis are more represented in non-pathogens and points closer to the X axis are more frequent in pathogens. As it is shown in this figure, ABC genes are strongly related to non-pathogenic species in Alphaproteobacteria, while there are overrepresented in pathogenic species in Gammaproteobacteria (Figure 2).

Bottom Line: An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens.A reduced subset of highly informative genes (120) is presented and applied to an external validation set.Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions.

View Article: PubMed Central - PubMed

Affiliation: Unidad de Bioinformática, Institut Pasteur Montevideo, Montevideo, Uruguay.

ABSTRACT
Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of 814 different virulence-related genes among more than 600 finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes (120) is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at http : ==bacfier:googlecode:com=files=Bacfier v1 0:zip), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions.

Show MeSH
Related in: MedlinePlus