Limits...
Prediction of type III secretion signals in genomes of gram-negative bacteria.

Löwer M, Schneider G - PLoS ONE (2009)

Bottom Line: Common sequence features were most pronounced in the first 30 amino acids of the effector sequences.Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%).We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes.

View Article: PubMed Central - PubMed

Affiliation: Johann Wolfgang Goethe-University, Chair for Chem- and Bioinformatics, Frankfurt, Germany.

ABSTRACT

Background: Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway ("effector proteins") have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown.

Methodology/principal findings: In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%).

Conclusions/significance: We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server (www.modlab.org).

Show MeSH
Three-layered feedforward neural networks were trained on the prediction of T3SS effector proteins.In this schematic, artificial neurons are drawn as circles (white: fan-out neuron; black: sigmoidal activation). For clarity, not all neurons are shown. The output neuron computes values between 0 and 1, which can be interpreted as the probability of an input sequence window being part of a T3SS effector signal.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2690842&req=5

pone-0005917-g002: Three-layered feedforward neural networks were trained on the prediction of T3SS effector proteins.In this schematic, artificial neurons are drawn as circles (white: fan-out neuron; black: sigmoidal activation). For clarity, not all neurons are shown. The output neuron computes values between 0 and 1, which can be interpreted as the probability of an input sequence window being part of a T3SS effector signal.

Mentions: We used MATLAB version R2007a [25] and SVMlight version 6.02 [26] software for training of the classifier models. The ANNs had feed-forward architecture with a single hidden neuron layer (Figure 2). All neurons in the hidden layer and the single output neuron had sigmoidal activation [16]. We used gradient descent backpropagation learning with momentum and an adaptive learning rate, as described previously [16]. Early termination of the training process was implemented by splitting the training data into two smaller training and validation sets, and stopping the training when the calculated error for the validation data rose for a predefined number of training cycles. For each set of training data, the number of hidden neurons was systematically varied from one to ten. For binary (yes/no) classification, the output of the ANN was converted to binary value using a threshold value of θ = 0.5. The overall function modelled by the implemented ANN is given by Eq. (1). (1) where logsig is a sigmoidal transfer function (activation function) limiting the neuron output to the interval [0,1], v and w are the connection weights, the hidden neurons' bias values, and Θ the bias of the output neuron.


Prediction of type III secretion signals in genomes of gram-negative bacteria.

Löwer M, Schneider G - PLoS ONE (2009)

Three-layered feedforward neural networks were trained on the prediction of T3SS effector proteins.In this schematic, artificial neurons are drawn as circles (white: fan-out neuron; black: sigmoidal activation). For clarity, not all neurons are shown. The output neuron computes values between 0 and 1, which can be interpreted as the probability of an input sequence window being part of a T3SS effector signal.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2690842&req=5

pone-0005917-g002: Three-layered feedforward neural networks were trained on the prediction of T3SS effector proteins.In this schematic, artificial neurons are drawn as circles (white: fan-out neuron; black: sigmoidal activation). For clarity, not all neurons are shown. The output neuron computes values between 0 and 1, which can be interpreted as the probability of an input sequence window being part of a T3SS effector signal.
Mentions: We used MATLAB version R2007a [25] and SVMlight version 6.02 [26] software for training of the classifier models. The ANNs had feed-forward architecture with a single hidden neuron layer (Figure 2). All neurons in the hidden layer and the single output neuron had sigmoidal activation [16]. We used gradient descent backpropagation learning with momentum and an adaptive learning rate, as described previously [16]. Early termination of the training process was implemented by splitting the training data into two smaller training and validation sets, and stopping the training when the calculated error for the validation data rose for a predefined number of training cycles. For each set of training data, the number of hidden neurons was systematically varied from one to ten. For binary (yes/no) classification, the output of the ANN was converted to binary value using a threshold value of θ = 0.5. The overall function modelled by the implemented ANN is given by Eq. (1). (1) where logsig is a sigmoidal transfer function (activation function) limiting the neuron output to the interval [0,1], v and w are the connection weights, the hidden neurons' bias values, and Θ the bias of the output neuron.

Bottom Line: Common sequence features were most pronounced in the first 30 amino acids of the effector sequences.Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%).We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes.

View Article: PubMed Central - PubMed

Affiliation: Johann Wolfgang Goethe-University, Chair for Chem- and Bioinformatics, Frankfurt, Germany.

ABSTRACT

Background: Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway ("effector proteins") have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown.

Methodology/principal findings: In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%).

Conclusions/significance: We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server (www.modlab.org).

Show MeSH