Limits...
Improved contact predictions using the recognition of protein like contact patterns.

Skwark MJ, Raimondi D, Michel M, Elofsson A - PLoS Comput. Biol. (2014)

Bottom Line: Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins.However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent.The improved contact prediction enables improved structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm University, Solna, Sweden; Department of Information and Computer Science, Aalto University, Aalto, Finland.

ABSTRACT
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

Show MeSH

Related in: MedlinePlus

(a–c) ROC plot depicting the PPV values for different predictors.The x-axis represents the number of contacts prediction in relationship to the length of the protein. At L = 1 on average one prediction is included for each residue in a protein. (a) Performance on the PSICOV set, (b) Performance on the new dataset, (c) Performance on the CASP10 dataset. (d–f) Positive predictive value plotted versus efficient number of sequences for predictions considering top  contacts per protein. The lines show a running average and the red dots individual predictions by PconsC2.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4222596&req=5

pcbi-1003889-g006: (a–c) ROC plot depicting the PPV values for different predictors.The x-axis represents the number of contacts prediction in relationship to the length of the protein. At L = 1 on average one prediction is included for each residue in a protein. (a) Performance on the PSICOV set, (b) Performance on the new dataset, (c) Performance on the CASP10 dataset. (d–f) Positive predictive value plotted versus efficient number of sequences for predictions considering top contacts per protein. The lines show a running average and the red dots individual predictions by PconsC2.

Mentions: In all three datasets PconsC2 shows a higher performance than all the other methods, see Figure 6a-c and S1. At one prediction per residue, , the PPV values of PconsC2 ranges from 0.75, in the PSICOV set to 0.5 in the independent dataset. In comparison plmDCA PPV values range from 0.5 to 0.25 and CMAPpro has PPV values of approximately 0.45 to 0.3. In all sets the PPV values are at least 0.1 units higher for PconsC2 than for the best of the other methods. The improvement exist for all sequence separations, see Table 3.


Improved contact predictions using the recognition of protein like contact patterns.

Skwark MJ, Raimondi D, Michel M, Elofsson A - PLoS Comput. Biol. (2014)

(a–c) ROC plot depicting the PPV values for different predictors.The x-axis represents the number of contacts prediction in relationship to the length of the protein. At L = 1 on average one prediction is included for each residue in a protein. (a) Performance on the PSICOV set, (b) Performance on the new dataset, (c) Performance on the CASP10 dataset. (d–f) Positive predictive value plotted versus efficient number of sequences for predictions considering top  contacts per protein. The lines show a running average and the red dots individual predictions by PconsC2.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4222596&req=5

pcbi-1003889-g006: (a–c) ROC plot depicting the PPV values for different predictors.The x-axis represents the number of contacts prediction in relationship to the length of the protein. At L = 1 on average one prediction is included for each residue in a protein. (a) Performance on the PSICOV set, (b) Performance on the new dataset, (c) Performance on the CASP10 dataset. (d–f) Positive predictive value plotted versus efficient number of sequences for predictions considering top contacts per protein. The lines show a running average and the red dots individual predictions by PconsC2.
Mentions: In all three datasets PconsC2 shows a higher performance than all the other methods, see Figure 6a-c and S1. At one prediction per residue, , the PPV values of PconsC2 ranges from 0.75, in the PSICOV set to 0.5 in the independent dataset. In comparison plmDCA PPV values range from 0.5 to 0.25 and CMAPpro has PPV values of approximately 0.45 to 0.3. In all sets the PPV values are at least 0.1 units higher for PconsC2 than for the best of the other methods. The improvement exist for all sequence separations, see Table 3.

Bottom Line: Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins.However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent.The improved contact prediction enables improved structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm University, Solna, Sweden; Department of Information and Computer Science, Aalto University, Aalto, Finland.

ABSTRACT
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

Show MeSH
Related in: MedlinePlus