Limits...
Identifying tandem Ankyrin repeats in protein structures.

Chakrabarty B, Parekh N - BMC Bioinformatics (2014)

Bottom Line: Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases.It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole.This method is especially useful in correctly identifying new members of a repeat family.

View Article: PubMed Central - PubMed

Affiliation: Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India. broto.chakrabarty@research.iiit.ac.in.

ABSTRACT

Background: Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat.

Results: It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at 'bioinf.iiit.ac.in/AnkPred'.

Conclusions: AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.

Show MeSH
MSA of the repeat regions in protein 3EU9. (a) predicted by the proposed approach, and (b) annotated in UniProt database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4307672&req=5

Fig8: MSA of the repeat regions in protein 3EU9. (a) predicted by the proposed approach, and (b) annotated in UniProt database.

Mentions: In Table 2 the proteins have been selected to present examples both of good agreement and of disagreement. Below we discuss a few examples in which our prediction differs from the annotation in UniProt database. For example, in the case of protein 3EU9 (chain A), five copies of ANK motifs are annotated in UniProt from 89–253, while our approach predicts seven copies, an extra copy on either side from 57–88 and 258–281. From the 3-D structure of 3EU9 in Figure 7(a) and Alevc profile shown in Figure 7(b), it is clear that the predicted terminal repeats (shown in red) exhibit Alevc profile similar to the five intermediate repeats (shown in gray). The structural alignment of these predicted terminal repeats with a representative structural ANK motif (from designed protein 1N0R) using Cealign module in Pymol [45] is shown in Figure 7(c) and (d); the Root Mean Square Deviation (RMSD) for each terminal copy is less than 1 Å indicating high structural similarity with the ANK motif. However at the sequence level these terminal repeats are not well conserved as is clear from the MSA of the predicted regions in Figure 8(a), compared to that of the UniProt annotated repeat regions in Figure 8(b). With one additional terminal copy predicted by ConSole, a total of six copies are predicted by it, but the boundaries of ConSole copies are shifted by around 10 residues as compared to UniProt annotation. In general, the terminal repeats are less conserved at the sequence level or incomplete, and their detection isn't easy. In 52 other proteins (see Additional file 1), additional copies of the ANK repeats have been predicted by the proposed approach, thus improving the annotation of the complete repeat region in these 53 proteins. In 16 of these cases, one extra copy is also predicted by ConSole. For the protein, 3SO8 (chain A, UniProt Id: Q9H9E1), initially three ANK repeats were annotated in the earlier release of UniProt (release 2012_08) from 181–279 whereas five repeats are predicted by our approach from residue 149–310, i.e., one extra repeat at each end. In the recent release of UniProt database (release 2014_05), the protein is now annotated as having five copies of the ANK motif from 148–313, which is in agreement with the prediction of the proposed approach (Table 2).Figure 7


Identifying tandem Ankyrin repeats in protein structures.

Chakrabarty B, Parekh N - BMC Bioinformatics (2014)

MSA of the repeat regions in protein 3EU9. (a) predicted by the proposed approach, and (b) annotated in UniProt database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4307672&req=5

Fig8: MSA of the repeat regions in protein 3EU9. (a) predicted by the proposed approach, and (b) annotated in UniProt database.
Mentions: In Table 2 the proteins have been selected to present examples both of good agreement and of disagreement. Below we discuss a few examples in which our prediction differs from the annotation in UniProt database. For example, in the case of protein 3EU9 (chain A), five copies of ANK motifs are annotated in UniProt from 89–253, while our approach predicts seven copies, an extra copy on either side from 57–88 and 258–281. From the 3-D structure of 3EU9 in Figure 7(a) and Alevc profile shown in Figure 7(b), it is clear that the predicted terminal repeats (shown in red) exhibit Alevc profile similar to the five intermediate repeats (shown in gray). The structural alignment of these predicted terminal repeats with a representative structural ANK motif (from designed protein 1N0R) using Cealign module in Pymol [45] is shown in Figure 7(c) and (d); the Root Mean Square Deviation (RMSD) for each terminal copy is less than 1 Å indicating high structural similarity with the ANK motif. However at the sequence level these terminal repeats are not well conserved as is clear from the MSA of the predicted regions in Figure 8(a), compared to that of the UniProt annotated repeat regions in Figure 8(b). With one additional terminal copy predicted by ConSole, a total of six copies are predicted by it, but the boundaries of ConSole copies are shifted by around 10 residues as compared to UniProt annotation. In general, the terminal repeats are less conserved at the sequence level or incomplete, and their detection isn't easy. In 52 other proteins (see Additional file 1), additional copies of the ANK repeats have been predicted by the proposed approach, thus improving the annotation of the complete repeat region in these 53 proteins. In 16 of these cases, one extra copy is also predicted by ConSole. For the protein, 3SO8 (chain A, UniProt Id: Q9H9E1), initially three ANK repeats were annotated in the earlier release of UniProt (release 2012_08) from 181–279 whereas five repeats are predicted by our approach from residue 149–310, i.e., one extra repeat at each end. In the recent release of UniProt database (release 2014_05), the protein is now annotated as having five copies of the ANK motif from 148–313, which is in agreement with the prediction of the proposed approach (Table 2).Figure 7

Bottom Line: Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases.It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole.This method is especially useful in correctly identifying new members of a repeat family.

View Article: PubMed Central - PubMed

Affiliation: Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India. broto.chakrabarty@research.iiit.ac.in.

ABSTRACT

Background: Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat.

Results: It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at 'bioinf.iiit.ac.in/AnkPred'.

Conclusions: AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.

Show MeSH