Limits...
Bayesian model of protein primary sequence for secondary structure prediction.

Li Q, Dahl DB, Vannucci M - PLoS ONE (2014)

Bottom Line: The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information.As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob-socket model of protein packing in secondary structure.Software implementing the methods is provided as a web application and a stand-alone implementation.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, Rice University, Houston, Texas, United States of America.

ABSTRACT
Determining the primary structure (i.e., amino acid sequence) of a protein has become cheaper, faster, and more accurate. Higher order protein structure provides insight into a protein's function in the cell. Understanding a protein's secondary structure is a first step towards this goal. Therefore, a number of computational prediction methods have been developed to predict secondary structure from just the primary amino acid sequence. The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information. As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob-socket model of protein packing in secondary structure. The method considers the packing influence of residues on the secondary structure determination, including those packed close in space but distant in sequence. By performing an assessment of our method on 2 test sets we show how incorporation of multiple sequence alignment data, similarly to PSIPRED, provides balance and improves the accuracy of the predictions. Software implementing the methods is provided as a web application and a stand-alone implementation.

Show MeSH
Local structural motifs used to model protein secondary structure as defined by the knob-socket model.On the top for each type of secondary structure, ribbon diagrams of the protein backbone with black spheres at Cα positions are presented. On the bottom, two-dimensional lattice representations are shown of the local residue interactions that define secondary structure, where solid lines represent covalent contacts between residues and broken lines are packing interactions. Because only the local interactions are being considered to predict secondary structure, only the socket portion of the knob-socket model is used. The knob portion signifies interactions at the level of tertiary structure or packing of non-local residues distant in the protein sequence. Each of the 4 types of secondary structure are described in more detail. (a) Helix Model: Relative residue positions and interactions are shown. Two types of sockets are represented in different grey scale:  sockets in dark grey and  sockets in light grey. (b) Strand Model: Double-side sheet sockets are shown. Sockets  and  in white are facing one direction, a socket  in dark grey faces the other side. Also, the side chain only socket  is shown in light grey. (c) Coil Model: Three types of coil sockets are shown. The socket  is closed socket with all three residues in contact one another, the socket  is open socket with  contact and  contact but no contact between  and , and the socket  is strained socket with no contact between  and . (d) Turn Model: Three residue sockets , , , and  in the 5 residue turn are shown.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4196994&req=5

pone-0109832-g002: Local structural motifs used to model protein secondary structure as defined by the knob-socket model.On the top for each type of secondary structure, ribbon diagrams of the protein backbone with black spheres at Cα positions are presented. On the bottom, two-dimensional lattice representations are shown of the local residue interactions that define secondary structure, where solid lines represent covalent contacts between residues and broken lines are packing interactions. Because only the local interactions are being considered to predict secondary structure, only the socket portion of the knob-socket model is used. The knob portion signifies interactions at the level of tertiary structure or packing of non-local residues distant in the protein sequence. Each of the 4 types of secondary structure are described in more detail. (a) Helix Model: Relative residue positions and interactions are shown. Two types of sockets are represented in different grey scale: sockets in dark grey and sockets in light grey. (b) Strand Model: Double-side sheet sockets are shown. Sockets and in white are facing one direction, a socket in dark grey faces the other side. Also, the side chain only socket is shown in light grey. (c) Coil Model: Three types of coil sockets are shown. The socket is closed socket with all three residues in contact one another, the socket is open socket with contact and contact but no contact between and , and the socket is strained socket with no contact between and . (d) Turn Model: Three residue sockets , , , and in the 5 residue turn are shown.

Mentions: Improving the previous models of packing in helix [32] and sheet [33], the knob-socket model provides a simple and general motif to describe the packing in protein structure that has been shown to relate the primary sequence to the packing structure at both the secondary and tertiary structure levels in both helices [30] and sheets [31]. Whereas the previous knob-into-holes [32] and ridges-into-grooves [33] are each limited to describing packing at defined angles within only a single type of secondary structure, the knob-socket model encompasses all packing within proteins at all angles and between all types of secondary structure. The knob-socket model simplifies the convoluted packing of side-chains into regular patterns of a single knob residue from one element of secondary structure packing into a socket formed by 3 residues from another element of secondary structure. Because the composition of both the knobs and socket exhibit preferences for certain amino acids, this knob-socket model not only relates primary sequence to tertiary packing structure, but also associates the primary sequence with secondary structure packing. At the level of secondary structure, only the local 3-residue socket plays a role in this model, since the knob residue defines tertiary packing structure. The repetitive main-chain hydrogen bonding for regular secondary structure produces a consistent arrangement of sockets. The arrangements defines the secondary structure packing motifs that provide the sequence patterns to identify secondary structure (Fig. 2). This is the case even for the irregular coil secondary structure.


Bayesian model of protein primary sequence for secondary structure prediction.

Li Q, Dahl DB, Vannucci M - PLoS ONE (2014)

Local structural motifs used to model protein secondary structure as defined by the knob-socket model.On the top for each type of secondary structure, ribbon diagrams of the protein backbone with black spheres at Cα positions are presented. On the bottom, two-dimensional lattice representations are shown of the local residue interactions that define secondary structure, where solid lines represent covalent contacts between residues and broken lines are packing interactions. Because only the local interactions are being considered to predict secondary structure, only the socket portion of the knob-socket model is used. The knob portion signifies interactions at the level of tertiary structure or packing of non-local residues distant in the protein sequence. Each of the 4 types of secondary structure are described in more detail. (a) Helix Model: Relative residue positions and interactions are shown. Two types of sockets are represented in different grey scale:  sockets in dark grey and  sockets in light grey. (b) Strand Model: Double-side sheet sockets are shown. Sockets  and  in white are facing one direction, a socket  in dark grey faces the other side. Also, the side chain only socket  is shown in light grey. (c) Coil Model: Three types of coil sockets are shown. The socket  is closed socket with all three residues in contact one another, the socket  is open socket with  contact and  contact but no contact between  and , and the socket  is strained socket with no contact between  and . (d) Turn Model: Three residue sockets , , , and  in the 5 residue turn are shown.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4196994&req=5

pone-0109832-g002: Local structural motifs used to model protein secondary structure as defined by the knob-socket model.On the top for each type of secondary structure, ribbon diagrams of the protein backbone with black spheres at Cα positions are presented. On the bottom, two-dimensional lattice representations are shown of the local residue interactions that define secondary structure, where solid lines represent covalent contacts between residues and broken lines are packing interactions. Because only the local interactions are being considered to predict secondary structure, only the socket portion of the knob-socket model is used. The knob portion signifies interactions at the level of tertiary structure or packing of non-local residues distant in the protein sequence. Each of the 4 types of secondary structure are described in more detail. (a) Helix Model: Relative residue positions and interactions are shown. Two types of sockets are represented in different grey scale: sockets in dark grey and sockets in light grey. (b) Strand Model: Double-side sheet sockets are shown. Sockets and in white are facing one direction, a socket in dark grey faces the other side. Also, the side chain only socket is shown in light grey. (c) Coil Model: Three types of coil sockets are shown. The socket is closed socket with all three residues in contact one another, the socket is open socket with contact and contact but no contact between and , and the socket is strained socket with no contact between and . (d) Turn Model: Three residue sockets , , , and in the 5 residue turn are shown.
Mentions: Improving the previous models of packing in helix [32] and sheet [33], the knob-socket model provides a simple and general motif to describe the packing in protein structure that has been shown to relate the primary sequence to the packing structure at both the secondary and tertiary structure levels in both helices [30] and sheets [31]. Whereas the previous knob-into-holes [32] and ridges-into-grooves [33] are each limited to describing packing at defined angles within only a single type of secondary structure, the knob-socket model encompasses all packing within proteins at all angles and between all types of secondary structure. The knob-socket model simplifies the convoluted packing of side-chains into regular patterns of a single knob residue from one element of secondary structure packing into a socket formed by 3 residues from another element of secondary structure. Because the composition of both the knobs and socket exhibit preferences for certain amino acids, this knob-socket model not only relates primary sequence to tertiary packing structure, but also associates the primary sequence with secondary structure packing. At the level of secondary structure, only the local 3-residue socket plays a role in this model, since the knob residue defines tertiary packing structure. The repetitive main-chain hydrogen bonding for regular secondary structure produces a consistent arrangement of sockets. The arrangements defines the secondary structure packing motifs that provide the sequence patterns to identify secondary structure (Fig. 2). This is the case even for the irregular coil secondary structure.

Bottom Line: The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information.As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob-socket model of protein packing in secondary structure.Software implementing the methods is provided as a web application and a stand-alone implementation.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, Rice University, Houston, Texas, United States of America.

ABSTRACT
Determining the primary structure (i.e., amino acid sequence) of a protein has become cheaper, faster, and more accurate. Higher order protein structure provides insight into a protein's function in the cell. Understanding a protein's secondary structure is a first step towards this goal. Therefore, a number of computational prediction methods have been developed to predict secondary structure from just the primary amino acid sequence. The most successful methods use machine learning approaches that are quite accurate, but do not directly incorporate structural information. As a step towards improving secondary structure reduction given the primary structure, we propose a Bayesian model based on the knob-socket model of protein packing in secondary structure. The method considers the packing influence of residues on the secondary structure determination, including those packed close in space but distant in sequence. By performing an assessment of our method on 2 test sets we show how incorporation of multiple sequence alignment data, similarly to PSIPRED, provides balance and improves the accuracy of the predictions. Software implementing the methods is provided as a web application and a stand-alone implementation.

Show MeSH