Limits...
Improved contact predictions using the recognition of protein like contact patterns.

Skwark MJ, Raimondi D, Michel M, Elofsson A - PLoS Comput. Biol. (2014)

Bottom Line: However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent.In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs.The improved contact prediction enables improved structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm University, Solna, Sweden; Department of Information and Computer Science, Aalto University, Aalto, Finland.

ABSTRACT
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

Show MeSH

Related in: MedlinePlus

a) Performance of PconsC2 at different sequence separations compared to PconsC, plmDCA and PSICOV, considering top L contacts per protein.Curves are smoothed with a rolling average window of 5 residues. b) Number of contacts predicted at different sequence separations. The read line represent the distribution of observed contacts in the dataset, normalised in so that the total number of contacts is identical to the number of predicted contacts.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4222596&req=5

pcbi-1003889-g004: a) Performance of PconsC2 at different sequence separations compared to PconsC, plmDCA and PSICOV, considering top L contacts per protein.Curves are smoothed with a rolling average window of 5 residues. b) Number of contacts predicted at different sequence separations. The read line represent the distribution of observed contacts in the dataset, normalised in so that the total number of contacts is identical to the number of predicted contacts.

Mentions: Contacts at different separation provide different types of information and the underlying contact prediction methods; plmDCA and PSICOV, behave quite differently in this aspect. Both methods predict a lower fraction of long-range contacts than observed in proteins, but PSICOV predicts more long-range contacts among its top-ranked predictions than plmDCA, see Figure 4. However, the accuracy of these long-range predictions is lower. And the reverse is true for short-range contacts (separated by less than 10 residues), here PSICOV predicts fewer but more accurate contacts. The distribution of predictions from PconsC is quite similar to the distribution from plmDCA, just with a higher accuracy.


Improved contact predictions using the recognition of protein like contact patterns.

Skwark MJ, Raimondi D, Michel M, Elofsson A - PLoS Comput. Biol. (2014)

a) Performance of PconsC2 at different sequence separations compared to PconsC, plmDCA and PSICOV, considering top L contacts per protein.Curves are smoothed with a rolling average window of 5 residues. b) Number of contacts predicted at different sequence separations. The read line represent the distribution of observed contacts in the dataset, normalised in so that the total number of contacts is identical to the number of predicted contacts.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4222596&req=5

pcbi-1003889-g004: a) Performance of PconsC2 at different sequence separations compared to PconsC, plmDCA and PSICOV, considering top L contacts per protein.Curves are smoothed with a rolling average window of 5 residues. b) Number of contacts predicted at different sequence separations. The read line represent the distribution of observed contacts in the dataset, normalised in so that the total number of contacts is identical to the number of predicted contacts.
Mentions: Contacts at different separation provide different types of information and the underlying contact prediction methods; plmDCA and PSICOV, behave quite differently in this aspect. Both methods predict a lower fraction of long-range contacts than observed in proteins, but PSICOV predicts more long-range contacts among its top-ranked predictions than plmDCA, see Figure 4. However, the accuracy of these long-range predictions is lower. And the reverse is true for short-range contacts (separated by less than 10 residues), here PSICOV predicts fewer but more accurate contacts. The distribution of predictions from PconsC is quite similar to the distribution from plmDCA, just with a higher accuracy.

Bottom Line: However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent.In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs.The improved contact prediction enables improved structure prediction.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm University, Solna, Sweden; Department of Information and Computer Science, Aalto University, Aalto, Finland.

ABSTRACT
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

Show MeSH
Related in: MedlinePlus