Limits...
Improved contact predictions using the recognition of protein like contact patterns.

Skwark MJ, Raimondi D, Michel M, Elofsson A - PLoS Comput. Biol. (2014)

Bottom Line: However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent.In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs.The improved contact prediction enables improved structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm University, Solna, Sweden; Department of Information and Computer Science, Aalto University, Aalto, Finland.

ABSTRACT
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

Show MeSH

Related in: MedlinePlus

(a) Comparison of the quality of models generated using the PconsFold pipeline [40] and contacts predicted by either PconsC or PconsC2.In PconsFold, Rosetta [2] is used together with predicted contacts from PconsC (blue) or PconsC2 (red). Each dot represents one model and the lines show the running average. The average TM-score for models generated with PconsC contacts is 0.34 vs. 0.37 for models generated using PconsC2. (b) Comparison of agreement with predicted contacts for the native structure and the top-ranked model. The colours of each dot represent the TM-score of the model. Dots below the diagonal line indicates that the model agrees worse with predicted contacts than the native structure, that is it should be possible to obtain better models given a more efficient modelling procedure.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4222596&req=5

pcbi-1003889-g008: (a) Comparison of the quality of models generated using the PconsFold pipeline [40] and contacts predicted by either PconsC or PconsC2.In PconsFold, Rosetta [2] is used together with predicted contacts from PconsC (blue) or PconsC2 (red). Each dot represents one model and the lines show the running average. The average TM-score for models generated with PconsC contacts is 0.34 vs. 0.37 for models generated using PconsC2. (b) Comparison of agreement with predicted contacts for the native structure and the top-ranked model. The colours of each dot represent the TM-score of the model. Dots below the diagonal line indicates that the model agrees worse with predicted contacts than the native structure, that is it should be possible to obtain better models given a more efficient modelling procedure.

Mentions: The average quality increases with the effective number of sequences in the protein family, see Figure 8. However, the variation is also rather large, that is for some small families good models are generated, while for some large families the models are not optimal. The average improvement obtained by PconsC2 is quite constant for families with more than 50 effective sequences.


Improved contact predictions using the recognition of protein like contact patterns.

Skwark MJ, Raimondi D, Michel M, Elofsson A - PLoS Comput. Biol. (2014)

(a) Comparison of the quality of models generated using the PconsFold pipeline [40] and contacts predicted by either PconsC or PconsC2.In PconsFold, Rosetta [2] is used together with predicted contacts from PconsC (blue) or PconsC2 (red). Each dot represents one model and the lines show the running average. The average TM-score for models generated with PconsC contacts is 0.34 vs. 0.37 for models generated using PconsC2. (b) Comparison of agreement with predicted contacts for the native structure and the top-ranked model. The colours of each dot represent the TM-score of the model. Dots below the diagonal line indicates that the model agrees worse with predicted contacts than the native structure, that is it should be possible to obtain better models given a more efficient modelling procedure.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4222596&req=5

pcbi-1003889-g008: (a) Comparison of the quality of models generated using the PconsFold pipeline [40] and contacts predicted by either PconsC or PconsC2.In PconsFold, Rosetta [2] is used together with predicted contacts from PconsC (blue) or PconsC2 (red). Each dot represents one model and the lines show the running average. The average TM-score for models generated with PconsC contacts is 0.34 vs. 0.37 for models generated using PconsC2. (b) Comparison of agreement with predicted contacts for the native structure and the top-ranked model. The colours of each dot represent the TM-score of the model. Dots below the diagonal line indicates that the model agrees worse with predicted contacts than the native structure, that is it should be possible to obtain better models given a more efficient modelling procedure.
Mentions: The average quality increases with the effective number of sequences in the protein family, see Figure 8. However, the variation is also rather large, that is for some small families good models are generated, while for some large families the models are not optimal. The average improvement obtained by PconsC2 is quite constant for families with more than 50 effective sequences.

Bottom Line: However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent.In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs.The improved contact prediction enables improved structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm University, Solna, Sweden; Department of Information and Computer Science, Aalto University, Aalto, Finland.

ABSTRACT
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

Show MeSH
Related in: MedlinePlus