Limits...
Searching algorithm for type IV secretion system effectors 1.0: a tool for predicting type IV effectors and exploring their genomic context.

Meyer DF, Noroy C, Moumène A, Raffaele S, Albina E, Vachiéry N - Nucleic Acids Res. (2013)

Bottom Line: To help biologists identify putative T4Es from the complete genome of α- and γ-proteobacteria, we developed a Perl-based command line bioinformatics tool called S4TE (searching algorithm for type-IV secretion system effectors).The tool predicts and ranks T4E candidates by using a combination of 13 sequence characteristics, including homology to known effectors, homology to eukaryotic domains, presence of subcellular localization signals or secretion signals, etc.The algorithm also provides a GC% and local gene density analysis, which strengthen the selection of T4E candidates.

View Article: PubMed Central - PubMed

Affiliation: CIRAD, UMR CMAEE, F-97170 Petit-Bourg, Guadeloupe, France, INRA, UMR1309 CMAEE, F-34398, Montpellier, France, Université des Antilles et de la Guyane, 97159 Pointe-à-Pitre cedex, Guadeloupe, France, INRA, Laboratoire des Interactions Plantes-Microorganismes, UMR441, Castanet-Tolosan, France and CNRS, Laboratoire des Interactions Plantes-Microorganismes, UMR2594, Castanet-Tolosan, France.

ABSTRACT
Type IV effectors (T4Es) are proteins produced by pathogenic bacteria to manipulate host cell gene expression and processes, divert the cell machinery for their own profit and circumvent the immune responses. T4Es have been characterized for some bacteria but many remain to be discovered. To help biologists identify putative T4Es from the complete genome of α- and γ-proteobacteria, we developed a Perl-based command line bioinformatics tool called S4TE (searching algorithm for type-IV secretion system effectors). The tool predicts and ranks T4E candidates by using a combination of 13 sequence characteristics, including homology to known effectors, homology to eukaryotic domains, presence of subcellular localization signals or secretion signals, etc. S4TE software is modular, and specific motif searches are run independently before ultimate combination of the outputs to generate a score and sort the strongest T4Es candidates. The user keeps the possibility to adjust various searching parameters such as the weight of each module, the selection threshold or the input databases. The algorithm also provides a GC% and local gene density analysis, which strengthen the selection of T4E candidates. S4TE is a unique predicting tool for T4Es, finding its utility upstream from experimental biology.

Show MeSH

Related in: MedlinePlus

Distribution of the number of features that detected effector candidates in L. pneumophila. (A) Cumulated numbers of effectors correctly detected (TPs) and called by error (false positives, FP) by S4TE L. pneumophila genome. (B) Accuracy, sensitivity and specificity of S4TE analysis on L. pneumophila genome with combinations of 3, 4, 5, 6 and 7 features.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3814349&req=5

gkt718-F2: Distribution of the number of features that detected effector candidates in L. pneumophila. (A) Cumulated numbers of effectors correctly detected (TPs) and called by error (false positives, FP) by S4TE L. pneumophila genome. (B) Accuracy, sensitivity and specificity of S4TE analysis on L. pneumophila genome with combinations of 3, 4, 5, 6 and 7 features.

Mentions: Beyond the final outcome of S4TE, we sought to determine the relative importance of each searching feature in the algorithm prediction of L. pneumophila T4Es. A variable distribution of the effectors across features was observed; some of them were highly selective and specific, whereas others were less efficient (Table 2). For Legionella, we confirmed the importance of the hydrophilic profile in the overall length of the protein and its C-terminus (PPV = 69% for feature 12), the charge of the C-terminus (PPV = 70% for feature 10) and the presence eukaryotic domains (PPV = 71% for feature 3), coiled-coil domains (PPV = 76% for feature 8) and E-block motif (PPV = 72% for feature 13) (Table 2). We then enumerated TP and FP identified by S4TE in L. pneumophila according to the number of matching features. TP and FP were well discriminated for combinations of 3 and 4 features (Figure 2A). Even with a slight increase in FP, combinations of 5, 6 and 7 features remained discriminant (Figure 2A). Although accuracy increased from 93% with a combination of 3 features to 95% with 7 features, specificity decreased from 99 to 97% (Figure 2B). The constant rise of the sensitivity from 26% with 3 features to 81% with 7 features shows the importance of our multi-criterion approach to identify a majority of candidate T4Es (Figure 2B). The complete list of feature combinations that generated hits for L. pneumophila was used to propose two performance indicators, SIL and PIL (see Materials and Methods section and Supplementary Table S2). These indicators are included in the result file appended to each predicted effector and will advise the user on the prediction efficacy of the same combination of features on L. pneumophila, thus providing additional help to select the right candidates for further biological evaluation.Figure 2.


Searching algorithm for type IV secretion system effectors 1.0: a tool for predicting type IV effectors and exploring their genomic context.

Meyer DF, Noroy C, Moumène A, Raffaele S, Albina E, Vachiéry N - Nucleic Acids Res. (2013)

Distribution of the number of features that detected effector candidates in L. pneumophila. (A) Cumulated numbers of effectors correctly detected (TPs) and called by error (false positives, FP) by S4TE L. pneumophila genome. (B) Accuracy, sensitivity and specificity of S4TE analysis on L. pneumophila genome with combinations of 3, 4, 5, 6 and 7 features.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3814349&req=5

gkt718-F2: Distribution of the number of features that detected effector candidates in L. pneumophila. (A) Cumulated numbers of effectors correctly detected (TPs) and called by error (false positives, FP) by S4TE L. pneumophila genome. (B) Accuracy, sensitivity and specificity of S4TE analysis on L. pneumophila genome with combinations of 3, 4, 5, 6 and 7 features.
Mentions: Beyond the final outcome of S4TE, we sought to determine the relative importance of each searching feature in the algorithm prediction of L. pneumophila T4Es. A variable distribution of the effectors across features was observed; some of them were highly selective and specific, whereas others were less efficient (Table 2). For Legionella, we confirmed the importance of the hydrophilic profile in the overall length of the protein and its C-terminus (PPV = 69% for feature 12), the charge of the C-terminus (PPV = 70% for feature 10) and the presence eukaryotic domains (PPV = 71% for feature 3), coiled-coil domains (PPV = 76% for feature 8) and E-block motif (PPV = 72% for feature 13) (Table 2). We then enumerated TP and FP identified by S4TE in L. pneumophila according to the number of matching features. TP and FP were well discriminated for combinations of 3 and 4 features (Figure 2A). Even with a slight increase in FP, combinations of 5, 6 and 7 features remained discriminant (Figure 2A). Although accuracy increased from 93% with a combination of 3 features to 95% with 7 features, specificity decreased from 99 to 97% (Figure 2B). The constant rise of the sensitivity from 26% with 3 features to 81% with 7 features shows the importance of our multi-criterion approach to identify a majority of candidate T4Es (Figure 2B). The complete list of feature combinations that generated hits for L. pneumophila was used to propose two performance indicators, SIL and PIL (see Materials and Methods section and Supplementary Table S2). These indicators are included in the result file appended to each predicted effector and will advise the user on the prediction efficacy of the same combination of features on L. pneumophila, thus providing additional help to select the right candidates for further biological evaluation.Figure 2.

Bottom Line: To help biologists identify putative T4Es from the complete genome of α- and γ-proteobacteria, we developed a Perl-based command line bioinformatics tool called S4TE (searching algorithm for type-IV secretion system effectors).The tool predicts and ranks T4E candidates by using a combination of 13 sequence characteristics, including homology to known effectors, homology to eukaryotic domains, presence of subcellular localization signals or secretion signals, etc.The algorithm also provides a GC% and local gene density analysis, which strengthen the selection of T4E candidates.

View Article: PubMed Central - PubMed

Affiliation: CIRAD, UMR CMAEE, F-97170 Petit-Bourg, Guadeloupe, France, INRA, UMR1309 CMAEE, F-34398, Montpellier, France, Université des Antilles et de la Guyane, 97159 Pointe-à-Pitre cedex, Guadeloupe, France, INRA, Laboratoire des Interactions Plantes-Microorganismes, UMR441, Castanet-Tolosan, France and CNRS, Laboratoire des Interactions Plantes-Microorganismes, UMR2594, Castanet-Tolosan, France.

ABSTRACT
Type IV effectors (T4Es) are proteins produced by pathogenic bacteria to manipulate host cell gene expression and processes, divert the cell machinery for their own profit and circumvent the immune responses. T4Es have been characterized for some bacteria but many remain to be discovered. To help biologists identify putative T4Es from the complete genome of α- and γ-proteobacteria, we developed a Perl-based command line bioinformatics tool called S4TE (searching algorithm for type-IV secretion system effectors). The tool predicts and ranks T4E candidates by using a combination of 13 sequence characteristics, including homology to known effectors, homology to eukaryotic domains, presence of subcellular localization signals or secretion signals, etc. S4TE software is modular, and specific motif searches are run independently before ultimate combination of the outputs to generate a score and sort the strongest T4Es candidates. The user keeps the possibility to adjust various searching parameters such as the weight of each module, the selection threshold or the input databases. The algorithm also provides a GC% and local gene density analysis, which strengthen the selection of T4E candidates. S4TE is a unique predicting tool for T4Es, finding its utility upstream from experimental biology.

Show MeSH
Related in: MedlinePlus