Limits...
Automatic classification of protein structures using low-dimensional structure space mappings.

Asarnow D, Singh R - BMC Bioinformatics (2014)

Bottom Line: We demonstrate that MPSS constitute highly accurate representations of protein fold space and enable automatic classification of SCOP Superfamily and Fold-level relationships.The results from our automatic classification approach are remarkably similar to those found in the distantly homologous Superfamily level and the quite remotely homologous Fold levels of SCOP.Furthermore, our research demonstrates that projection into a low-dimensional space using MDS constitutes a superior noisereducing transformation of pairwise distances than do the variety of probability- and alignment-length-based transformations currently used by structure alignment algorithms.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Protein function is closely intertwined with protein structure. Discovery of meaningful structure-function relationships is of utmost importance in protein biochemistry and has led to creation of high-quality, manually curated classification databases, such as the gold-standard SCOP (Structural Classification of Proteins) database. The SCOP database and its counterparts such as CATH provide a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and are widely employed in structural and computational biology. Since manual classification is both subjective and highly laborious, automated classification of novel structures is increasingly an active area of research. The design of methods for automated structure classification has been rendered even more important since the recent past, due to the explosion in number of solved structures arising out of various structural biology initiatives. In this paper we propose an approach to the problem of structure classification based on creating and tessellating low dimensional maps of the protein structure space (MPSS). Given a set of protein structures, an MPSS is a low dimensional embedding of structural similarity-based distances between the molecules. In an MPSS, a group of proteins (such as all the proteins in the PDB or sub-samplings thereof) under consideration are represented as point clouds and structural relatedness maps to spatial adjacency of the points. In this paper we present methods and results that show that MPSS can be used to create tessellations of the protein space comparable to the clade systems within SCOP. Though we have used SCOP as the gold standard, the proposed approach is equally applicable for other structural classifications.

Methods: In the proposed approach, we first construct MPSS using pairwise alignment distances obtained from four established structure alignment algorithms (CE, Dali, FATCAT and MATT). The low dimensional embeddings are next computed using an embedding technique called multidimensional scaling (MDS). Next, by using the remotely homologous Superfamily and Fold levels of the hierarchical SCOP database, a distance threshold is determined to relate adjacency in the low dimensional map to functional relationships. In our approach, the optimal threshold is determined as the value that maximizes the total true classification rate vis-a-vis the SCOP classification. We also show that determining such a threshold is often straightforward, once the structural relationships are represented using MPSS.

Results and conclusion: We demonstrate that MPSS constitute highly accurate representations of protein fold space and enable automatic classification of SCOP Superfamily and Fold-level relationships. The results from our automatic classification approach are remarkably similar to those found in the distantly homologous Superfamily level and the quite remotely homologous Fold levels of SCOP. The significance of our results are underlined by the fact that most automated methods developed thus far have only managed to match the closest-homology Family level of the SCOP hierarchy and tend to differ considerably at the Superfamily and Fold levels. Furthermore, our research demonstrates that projection into a low-dimensional space using MDS constitutes a superior noisereducing transformation of pairwise distances than do the variety of probability- and alignment-length-based transformations currently used by structure alignment algorithms.

Show MeSH

Related in: MedlinePlus

AUC for SCOP Fold prediction vs. MPSS dimensionality. Fold classification AUC are shown for all 24 PSS representations. The AUC is plotted against the MPSS dimensionality; pairwise distances are depicted as a flat line for reference. Detailed descriptions of the trends found in this figure are given in the text. The figure shows that the MPSS Dali(S12) is the most accurate at this SCOP level.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4016610&req=5

Figure 8: AUC for SCOP Fold prediction vs. MPSS dimensionality. Fold classification AUC are shown for all 24 PSS representations. The AUC is plotted against the MPSS dimensionality; pairwise distances are depicted as a flat line for reference. Detailed descriptions of the trends found in this figure are given in the text. The figure shows that the MPSS Dali(S12) is the most accurate at this SCOP level.

Mentions: The AUC values for prediction of SCOP Superfamily and Fold memberships are plotted versus MPSS dimensionality in Figure 7 and 8, respectively. As mentioned previously, the tested dimensionalities range from 3 to 120. The AUCs obtained using the pairwise PSS representations are displayed as flat lines, so they may be referenced easily across each figure.


Automatic classification of protein structures using low-dimensional structure space mappings.

Asarnow D, Singh R - BMC Bioinformatics (2014)

AUC for SCOP Fold prediction vs. MPSS dimensionality. Fold classification AUC are shown for all 24 PSS representations. The AUC is plotted against the MPSS dimensionality; pairwise distances are depicted as a flat line for reference. Detailed descriptions of the trends found in this figure are given in the text. The figure shows that the MPSS Dali(S12) is the most accurate at this SCOP level.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4016610&req=5

Figure 8: AUC for SCOP Fold prediction vs. MPSS dimensionality. Fold classification AUC are shown for all 24 PSS representations. The AUC is plotted against the MPSS dimensionality; pairwise distances are depicted as a flat line for reference. Detailed descriptions of the trends found in this figure are given in the text. The figure shows that the MPSS Dali(S12) is the most accurate at this SCOP level.
Mentions: The AUC values for prediction of SCOP Superfamily and Fold memberships are plotted versus MPSS dimensionality in Figure 7 and 8, respectively. As mentioned previously, the tested dimensionalities range from 3 to 120. The AUCs obtained using the pairwise PSS representations are displayed as flat lines, so they may be referenced easily across each figure.

Bottom Line: We demonstrate that MPSS constitute highly accurate representations of protein fold space and enable automatic classification of SCOP Superfamily and Fold-level relationships.The results from our automatic classification approach are remarkably similar to those found in the distantly homologous Superfamily level and the quite remotely homologous Fold levels of SCOP.Furthermore, our research demonstrates that projection into a low-dimensional space using MDS constitutes a superior noisereducing transformation of pairwise distances than do the variety of probability- and alignment-length-based transformations currently used by structure alignment algorithms.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Protein function is closely intertwined with protein structure. Discovery of meaningful structure-function relationships is of utmost importance in protein biochemistry and has led to creation of high-quality, manually curated classification databases, such as the gold-standard SCOP (Structural Classification of Proteins) database. The SCOP database and its counterparts such as CATH provide a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and are widely employed in structural and computational biology. Since manual classification is both subjective and highly laborious, automated classification of novel structures is increasingly an active area of research. The design of methods for automated structure classification has been rendered even more important since the recent past, due to the explosion in number of solved structures arising out of various structural biology initiatives. In this paper we propose an approach to the problem of structure classification based on creating and tessellating low dimensional maps of the protein structure space (MPSS). Given a set of protein structures, an MPSS is a low dimensional embedding of structural similarity-based distances between the molecules. In an MPSS, a group of proteins (such as all the proteins in the PDB or sub-samplings thereof) under consideration are represented as point clouds and structural relatedness maps to spatial adjacency of the points. In this paper we present methods and results that show that MPSS can be used to create tessellations of the protein space comparable to the clade systems within SCOP. Though we have used SCOP as the gold standard, the proposed approach is equally applicable for other structural classifications.

Methods: In the proposed approach, we first construct MPSS using pairwise alignment distances obtained from four established structure alignment algorithms (CE, Dali, FATCAT and MATT). The low dimensional embeddings are next computed using an embedding technique called multidimensional scaling (MDS). Next, by using the remotely homologous Superfamily and Fold levels of the hierarchical SCOP database, a distance threshold is determined to relate adjacency in the low dimensional map to functional relationships. In our approach, the optimal threshold is determined as the value that maximizes the total true classification rate vis-a-vis the SCOP classification. We also show that determining such a threshold is often straightforward, once the structural relationships are represented using MPSS.

Results and conclusion: We demonstrate that MPSS constitute highly accurate representations of protein fold space and enable automatic classification of SCOP Superfamily and Fold-level relationships. The results from our automatic classification approach are remarkably similar to those found in the distantly homologous Superfamily level and the quite remotely homologous Fold levels of SCOP. The significance of our results are underlined by the fact that most automated methods developed thus far have only managed to match the closest-homology Family level of the SCOP hierarchy and tend to differ considerably at the Superfamily and Fold levels. Furthermore, our research demonstrates that projection into a low-dimensional space using MDS constitutes a superior noisereducing transformation of pairwise distances than do the variety of probability- and alignment-length-based transformations currently used by structure alignment algorithms.

Show MeSH
Related in: MedlinePlus