Limits...
Identifying tandem Ankyrin repeats in protein structures.

Chakrabarty B, Parekh N - BMC Bioinformatics (2014)

Bottom Line: Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases.It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole.This method is especially useful in correctly identifying new members of a repeat family.

View Article: PubMed Central - PubMed

Affiliation: Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India. broto.chakrabarty@research.iiit.ac.in.

ABSTRACT

Background: Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat.

Results: It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at 'bioinf.iiit.ac.in/AnkPred'.

Conclusions: AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.

Show MeSH
Predicted Ankyrin repeat protein 1OUV (chain A). (a) Secondary structure representation from PDBsum (b) Structural alignment of predicted ANK repeat copy (shown in blue colour) with a repeat copy of designed ANK protein 1N0R (shown in orange) (c)Alevc plot with dotted and solid lines showing the start and end of predicted ANK boundaries.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4307672&req=5

Fig10: Predicted Ankyrin repeat protein 1OUV (chain A). (a) Secondary structure representation from PDBsum (b) Structural alignment of predicted ANK repeat copy (shown in blue colour) with a repeat copy of designed ANK protein 1N0R (shown in orange) (c)Alevc plot with dotted and solid lines showing the start and end of predicted ANK boundaries.

Mentions: We performed the proposed algorithm on the complete PDB. A total number of 98,341 structures represented as proteins or proteins in complex with nucleic acids were downloaded. On removing short fragments < 50 residues (as these are unlikely to contain two contiguous copies of ANK motifs) and proteins with no secondary structures assigned, a total of 94,975 structures were used for analysis. The proposed algorithm identified 819 protein structures containing at least two tandemly repeated ANK motifs. Of these 181 are annotated as known ANK proteins in UniProt, Pfam, PROSITE and PDB of which ~ 50 structures contain designed Ankyrin repeat proteins (DARPINS). The number of correctly predicted Ankyrin repeat proteins is 178 and only 3 were missed by our approach, 1SW6 (chain A), 2ETB (chain A) and 3ZRH (chain A). In the first two cases the proposed approach missed the detection of ANK motifs as the UniProt annotated repeat regions contain 3–4 helices while according to rules defined in the algorithm, an ANK motif comprises of two anti-parallel helices. In 3ZRH the two annotated copies of ANK repeats are not contiguous but separated by 23 residues, and hence missed by our approach. Thus, the remaining 641 structures are proposed as previously unrecognised Ankyrin repeats and are listed in Additional file 2. It is observed that 27 of these proteins are annotated as containing other repeat types, viz., 9 TPR, 7 Pumilio repeat, 2 HEAT, 2 Annexin repeat, 2 Tumor necrosis factor receptor (TNFR-Cys), 2 Mitochondrial termination factor repeat (MTERF), 2 Clathrin heavy chain repeat (CHCR) and 1 HAT (Additional file 2). Structurally, TPR, HEAT and HAT motifs are very similar to ANK repeat motif, each of them comprising two anti-parallel helices forming a Helix–Turn–Helix core and are also of similar lengths, ~ 30–34 residues. The major difference being the ANK motif has a long loop ending in a β turn which is not present in TPR, HEAT and HAT motifs. Even with such strong similarity between these structural motifs, only 13 false positives (9 TPR, 3 HEAT and 1 HAT) are reported by our approach. To check the reliability of our prediction in these proteins, we carried out structure-structure superposition of the predicted ANK repeat region with a DARPin motif from 1N0R using Cealign module in Pymol [45]. For example, in protein 1OUV (chain A), seven copies of TPR are reported in UniProt database from 29–278 (Additional file 2) containing 14 helices H1-H14 as shown in the secondary structure representation from PDBsum [46] in Figure 10(a). The superposition is good with root mean square deviation (RMSD) for all the three predicted ANK repeats units < 3 Å as shown in Figure 10(b). The Alevc profile in the Ankyrin predicted region from 185 to 292 in Figure 10(c) is also very similar to that for a typical ANK motif in Figure 1(a). In this case, the predicted ANK repeat motifs are within the TPR annotated region, comprised of one helix from each adjacent TPR repeats and can be represented as where is the second helix of the ith TPR motif and is the first helix of the (i + 1)th TPR motif. The structural alignment of the 7 annotated TPR regions was performed with a representative TPR motif from designed protein 1NA0 and RMSD for each repeat unit < 2 Å (results not shown) suggesting that UniProt annotation is also correct. However, the β turn between two helices within a TPR motif was observed to be longer than that of the typical designed TPR motif and resembling the terminal loop of the ANK motif. This suggests the possibility of multi-repeat architecture in complex proteins. For 21 other repeat proteins, a similar multi-repeat architecture was observed. In the case of HEAT repeat protein 3LWW (chain A), the annotation in UniProt is six continuous copies from 124–441 and two distant copies from 602–641 and 687–726. The predicted ANK repeat lies in the non-HEAT region from 520–621 with very small overlap of 20 residues with HEAT repeat. In this case two different repeats are present in different regions in the protein and a total of 10 proteins containing two different repeat types non-overlapping each other was observed (marked ‘*’ in Additional file 2). For these proteins that exhibit multi-repeat architecture, it would be interesting to analyze the interaction sites which would help in confirming multiple annotations/functions in these proteins with complex architecture. Thus, the structure based approach proposed here is promising in detecting tandem structural repeats in proteins and is powerful enough to distinguish between very similar structural repeats, viz. Ankyrin and TPR/HEAT/HAT.Figure 10


Identifying tandem Ankyrin repeats in protein structures.

Chakrabarty B, Parekh N - BMC Bioinformatics (2014)

Predicted Ankyrin repeat protein 1OUV (chain A). (a) Secondary structure representation from PDBsum (b) Structural alignment of predicted ANK repeat copy (shown in blue colour) with a repeat copy of designed ANK protein 1N0R (shown in orange) (c)Alevc plot with dotted and solid lines showing the start and end of predicted ANK boundaries.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4307672&req=5

Fig10: Predicted Ankyrin repeat protein 1OUV (chain A). (a) Secondary structure representation from PDBsum (b) Structural alignment of predicted ANK repeat copy (shown in blue colour) with a repeat copy of designed ANK protein 1N0R (shown in orange) (c)Alevc plot with dotted and solid lines showing the start and end of predicted ANK boundaries.
Mentions: We performed the proposed algorithm on the complete PDB. A total number of 98,341 structures represented as proteins or proteins in complex with nucleic acids were downloaded. On removing short fragments < 50 residues (as these are unlikely to contain two contiguous copies of ANK motifs) and proteins with no secondary structures assigned, a total of 94,975 structures were used for analysis. The proposed algorithm identified 819 protein structures containing at least two tandemly repeated ANK motifs. Of these 181 are annotated as known ANK proteins in UniProt, Pfam, PROSITE and PDB of which ~ 50 structures contain designed Ankyrin repeat proteins (DARPINS). The number of correctly predicted Ankyrin repeat proteins is 178 and only 3 were missed by our approach, 1SW6 (chain A), 2ETB (chain A) and 3ZRH (chain A). In the first two cases the proposed approach missed the detection of ANK motifs as the UniProt annotated repeat regions contain 3–4 helices while according to rules defined in the algorithm, an ANK motif comprises of two anti-parallel helices. In 3ZRH the two annotated copies of ANK repeats are not contiguous but separated by 23 residues, and hence missed by our approach. Thus, the remaining 641 structures are proposed as previously unrecognised Ankyrin repeats and are listed in Additional file 2. It is observed that 27 of these proteins are annotated as containing other repeat types, viz., 9 TPR, 7 Pumilio repeat, 2 HEAT, 2 Annexin repeat, 2 Tumor necrosis factor receptor (TNFR-Cys), 2 Mitochondrial termination factor repeat (MTERF), 2 Clathrin heavy chain repeat (CHCR) and 1 HAT (Additional file 2). Structurally, TPR, HEAT and HAT motifs are very similar to ANK repeat motif, each of them comprising two anti-parallel helices forming a Helix–Turn–Helix core and are also of similar lengths, ~ 30–34 residues. The major difference being the ANK motif has a long loop ending in a β turn which is not present in TPR, HEAT and HAT motifs. Even with such strong similarity between these structural motifs, only 13 false positives (9 TPR, 3 HEAT and 1 HAT) are reported by our approach. To check the reliability of our prediction in these proteins, we carried out structure-structure superposition of the predicted ANK repeat region with a DARPin motif from 1N0R using Cealign module in Pymol [45]. For example, in protein 1OUV (chain A), seven copies of TPR are reported in UniProt database from 29–278 (Additional file 2) containing 14 helices H1-H14 as shown in the secondary structure representation from PDBsum [46] in Figure 10(a). The superposition is good with root mean square deviation (RMSD) for all the three predicted ANK repeats units < 3 Å as shown in Figure 10(b). The Alevc profile in the Ankyrin predicted region from 185 to 292 in Figure 10(c) is also very similar to that for a typical ANK motif in Figure 1(a). In this case, the predicted ANK repeat motifs are within the TPR annotated region, comprised of one helix from each adjacent TPR repeats and can be represented as where is the second helix of the ith TPR motif and is the first helix of the (i + 1)th TPR motif. The structural alignment of the 7 annotated TPR regions was performed with a representative TPR motif from designed protein 1NA0 and RMSD for each repeat unit < 2 Å (results not shown) suggesting that UniProt annotation is also correct. However, the β turn between two helices within a TPR motif was observed to be longer than that of the typical designed TPR motif and resembling the terminal loop of the ANK motif. This suggests the possibility of multi-repeat architecture in complex proteins. For 21 other repeat proteins, a similar multi-repeat architecture was observed. In the case of HEAT repeat protein 3LWW (chain A), the annotation in UniProt is six continuous copies from 124–441 and two distant copies from 602–641 and 687–726. The predicted ANK repeat lies in the non-HEAT region from 520–621 with very small overlap of 20 residues with HEAT repeat. In this case two different repeats are present in different regions in the protein and a total of 10 proteins containing two different repeat types non-overlapping each other was observed (marked ‘*’ in Additional file 2). For these proteins that exhibit multi-repeat architecture, it would be interesting to analyze the interaction sites which would help in confirming multiple annotations/functions in these proteins with complex architecture. Thus, the structure based approach proposed here is promising in detecting tandem structural repeats in proteins and is powerful enough to distinguish between very similar structural repeats, viz. Ankyrin and TPR/HEAT/HAT.Figure 10

Bottom Line: Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases.It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole.This method is especially useful in correctly identifying new members of a repeat family.

View Article: PubMed Central - PubMed

Affiliation: Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India. broto.chakrabarty@research.iiit.ac.in.

ABSTRACT

Background: Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat.

Results: It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at 'bioinf.iiit.ac.in/AnkPred'.

Conclusions: AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.

Show MeSH