Limits...
A framework for protein structure classification and identification of novel protein structures.

Kim YJ, Patel JM - BMC Bioinformatics (2006)

Bottom Line: Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database.The framework consists of a set of components for comparing, classifying, and clustering protein structures.These components allow us to accurately classify proteins into known folds, to detect new protein folds, and to provide a way of clustering the new folds.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computer Science and Engineering, University of Michigan, Ann Arbor, USA. youjkim@eecs.umich.edu

ABSTRACT

Background: Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database. With the rapid increase in the number of new protein structures, the need for automated and accurate methods for protein classification is increasingly important.

Results: In this paper we present a unified framework for protein structure classification and identification of novel protein structures. The framework consists of a set of components for comparing, classifying, and clustering protein structures. These components allow us to accurately classify proteins into known folds, to detect new protein folds, and to provide a way of clustering the new folds. In our evaluation with SCOP 1.69, our method correctly classifies 86.0%, 87.7%, and 90.5% of new domains at family, superfamily, and fold levels. Furthermore, for protein domains that belong to new domain families, our method is able to produce clusters that closely correspond to the new families in SCOP 1.69. As a result, our method can also be used to suggest new classification groups that contain novel folds.

Conclusion: We have developed a method called proCC for automatically classifying and clustering domains. The method is effective in classifying new domains and suggesting new domain families, and it is also very efficient. A web site offering access to proCC is freely available at http://www.eecs.umich.edu/periscope/procc.

Show MeSH
Visualization of the classification decision boundary. This figure shows the classification boundary created for entries in SCOP 1.67 using SCOP 1.65 as the database. The SVM is used to detect the boundary between "Classified" and "Unclassified" entries. This trained SVM will then be used to predict class labels for SCOP 1.69.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1622760&req=5

Figure 2: Visualization of the classification decision boundary. This figure shows the classification boundary created for entries in SCOP 1.67 using SCOP 1.65 as the database. The SVM is used to detect the boundary between "Classified" and "Unclassified" entries. This trained SVM will then be used to predict class labels for SCOP 1.69.

Mentions: To automate the classification process, we need a classification decision model which defines clear boundaries between classification and non-classification. Using SCOP as the gold standard, we generated a classification decision model that reflects the rules used for creating new folds in SCOP. In order to create such classification decision model, we used a support vector machine (SVM) to capture nonlinear classification decision boundaries in SCOP. As a training set for the decision model, we picked SCOP version 1.65 and version 1.67 and trained the model as follows: Using domains in SCOP 1.67 as the queries, and domains in SCOP 1.65 as the database, we perform structure comparison using the method described in the structure comparison section. For each query, we calculate the F1 and F2 scores. If the SCOP label of a query protein domain is the same as its nearest neighbor's SCOP label, the query with its F1 and F2 is used as a positive example, otherwise, it is used as a negative example in the training set for the SVM. The resulting training dataset is shown in Figure 2.


A framework for protein structure classification and identification of novel protein structures.

Kim YJ, Patel JM - BMC Bioinformatics (2006)

Visualization of the classification decision boundary. This figure shows the classification boundary created for entries in SCOP 1.67 using SCOP 1.65 as the database. The SVM is used to detect the boundary between "Classified" and "Unclassified" entries. This trained SVM will then be used to predict class labels for SCOP 1.69.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1622760&req=5

Figure 2: Visualization of the classification decision boundary. This figure shows the classification boundary created for entries in SCOP 1.67 using SCOP 1.65 as the database. The SVM is used to detect the boundary between "Classified" and "Unclassified" entries. This trained SVM will then be used to predict class labels for SCOP 1.69.
Mentions: To automate the classification process, we need a classification decision model which defines clear boundaries between classification and non-classification. Using SCOP as the gold standard, we generated a classification decision model that reflects the rules used for creating new folds in SCOP. In order to create such classification decision model, we used a support vector machine (SVM) to capture nonlinear classification decision boundaries in SCOP. As a training set for the decision model, we picked SCOP version 1.65 and version 1.67 and trained the model as follows: Using domains in SCOP 1.67 as the queries, and domains in SCOP 1.65 as the database, we perform structure comparison using the method described in the structure comparison section. For each query, we calculate the F1 and F2 scores. If the SCOP label of a query protein domain is the same as its nearest neighbor's SCOP label, the query with its F1 and F2 is used as a positive example, otherwise, it is used as a negative example in the training set for the SVM. The resulting training dataset is shown in Figure 2.

Bottom Line: Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database.The framework consists of a set of components for comparing, classifying, and clustering protein structures.These components allow us to accurately classify proteins into known folds, to detect new protein folds, and to provide a way of clustering the new folds.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computer Science and Engineering, University of Michigan, Ann Arbor, USA. youjkim@eecs.umich.edu

ABSTRACT

Background: Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database. With the rapid increase in the number of new protein structures, the need for automated and accurate methods for protein classification is increasingly important.

Results: In this paper we present a unified framework for protein structure classification and identification of novel protein structures. The framework consists of a set of components for comparing, classifying, and clustering protein structures. These components allow us to accurately classify proteins into known folds, to detect new protein folds, and to provide a way of clustering the new folds. In our evaluation with SCOP 1.69, our method correctly classifies 86.0%, 87.7%, and 90.5% of new domains at family, superfamily, and fold levels. Furthermore, for protein domains that belong to new domain families, our method is able to produce clusters that closely correspond to the new families in SCOP 1.69. As a result, our method can also be used to suggest new classification groups that contain novel folds.

Conclusion: We have developed a method called proCC for automatically classifying and clustering domains. The method is effective in classifying new domains and suggesting new domain families, and it is also very efficient. A web site offering access to proCC is freely available at http://www.eecs.umich.edu/periscope/procc.

Show MeSH