Limits...
Multi-view methods for protein structure comparison using latent dirichlet allocation.

Shivashankar S, Srivathsan S, Ravindran B, Tendulkar AV - Bioinformatics (2011)

Bottom Line: It involves two major issues: (i) effective protein structure representation that captures inherent relationship between fragments and facilitates efficient comparison between the structures and (ii) effective framework to address different retrieval requirements.In this article, we propose an improved representation of protein structures using latent dirichlet allocation topic model.We compare the proposed representation and retrieval framework on the benchmark dataset developed by Kolodny and co-workers.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, IIT Madras, Chennai-600 036.

ABSTRACT

Motivation: With rapidly expanding protein structure databases, efficiently retrieving structures similar to a given protein is an important problem. It involves two major issues: (i) effective protein structure representation that captures inherent relationship between fragments and facilitates efficient comparison between the structures and (ii) effective framework to address different retrieval requirements. Recently, researchers proposed vector space model of proteins using bag of fragments representation (FragBag), which corresponds to the basic information retrieval model.

Results: In this article, we propose an improved representation of protein structures using latent dirichlet allocation topic model. Another important requirement is to retrieve proteins, whether they are either close or remote homologs. In order to meet diverse objectives, we propose multi-viewpoint based framework that combines multiple representations and retrieval techniques. We compare the proposed representation and retrieval framework on the benchmark dataset developed by Kolodny and co-workers. The results indicate that the proposed techniques outperform state-of-the-art methods.

Availability: http://www.cse.iitm.ac.in/~ashishvt/research/protein-lda/.

Contact: ashishvt@cse.iitm.ac.in.

Show MeSH
Comparison of the average AUC at SAS threshold of 2.0 Å, across libraries, obtained using TF, LDA and multiview model using the best weights from Table 3.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117356&req=5

Figure 6: Comparison of the average AUC at SAS threshold of 2.0 Å, across libraries, obtained using TF, LDA and multiview model using the best weights from Table 3.

Mentions: The performance of LDA representation and retrieval based on asymmetric KL and multi-viewpoint retrieval using TF and LDA (multiview model I) are compared against naive vector space model with cosine similarity on the chosen seven libraries. For multi-viewpoint-based retrieval, the best weight combination of (λ1 and λ2) for each library is chosen for the plot. The results are shown in Figures 6, 7, 8 for SAS threshold of 2, 3.5 and 5 Å respectively. Table 9 gives overall ranking of structural and filter methods, which includes the relative positioning of proposed techniques (Kosloff and Kolodny, 2008). It is clear that our method outperforms all the filter-and-match methods. We perform a paired t-test and paired sign test with AUC values of each query obtained using proposed models and baseline state-of-the-art filter-and-match method (FragBag). Based on the statistical test, our results are significantly better than the state-of-the-art at 1% significance level. Our results are very competitive even with state-of-the-art structure comparison methods operating at the level of complete 3D representation. It must be noted that our method is much faster than these methods.Fig. 6.


Multi-view methods for protein structure comparison using latent dirichlet allocation.

Shivashankar S, Srivathsan S, Ravindran B, Tendulkar AV - Bioinformatics (2011)

Comparison of the average AUC at SAS threshold of 2.0 Å, across libraries, obtained using TF, LDA and multiview model using the best weights from Table 3.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117356&req=5

Figure 6: Comparison of the average AUC at SAS threshold of 2.0 Å, across libraries, obtained using TF, LDA and multiview model using the best weights from Table 3.
Mentions: The performance of LDA representation and retrieval based on asymmetric KL and multi-viewpoint retrieval using TF and LDA (multiview model I) are compared against naive vector space model with cosine similarity on the chosen seven libraries. For multi-viewpoint-based retrieval, the best weight combination of (λ1 and λ2) for each library is chosen for the plot. The results are shown in Figures 6, 7, 8 for SAS threshold of 2, 3.5 and 5 Å respectively. Table 9 gives overall ranking of structural and filter methods, which includes the relative positioning of proposed techniques (Kosloff and Kolodny, 2008). It is clear that our method outperforms all the filter-and-match methods. We perform a paired t-test and paired sign test with AUC values of each query obtained using proposed models and baseline state-of-the-art filter-and-match method (FragBag). Based on the statistical test, our results are significantly better than the state-of-the-art at 1% significance level. Our results are very competitive even with state-of-the-art structure comparison methods operating at the level of complete 3D representation. It must be noted that our method is much faster than these methods.Fig. 6.

Bottom Line: It involves two major issues: (i) effective protein structure representation that captures inherent relationship between fragments and facilitates efficient comparison between the structures and (ii) effective framework to address different retrieval requirements.In this article, we propose an improved representation of protein structures using latent dirichlet allocation topic model.We compare the proposed representation and retrieval framework on the benchmark dataset developed by Kolodny and co-workers.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, IIT Madras, Chennai-600 036.

ABSTRACT

Motivation: With rapidly expanding protein structure databases, efficiently retrieving structures similar to a given protein is an important problem. It involves two major issues: (i) effective protein structure representation that captures inherent relationship between fragments and facilitates efficient comparison between the structures and (ii) effective framework to address different retrieval requirements. Recently, researchers proposed vector space model of proteins using bag of fragments representation (FragBag), which corresponds to the basic information retrieval model.

Results: In this article, we propose an improved representation of protein structures using latent dirichlet allocation topic model. Another important requirement is to retrieve proteins, whether they are either close or remote homologs. In order to meet diverse objectives, we propose multi-viewpoint based framework that combines multiple representations and retrieval techniques. We compare the proposed representation and retrieval framework on the benchmark dataset developed by Kolodny and co-workers. The results indicate that the proposed techniques outperform state-of-the-art methods.

Availability: http://www.cse.iitm.ac.in/~ashishvt/research/protein-lda/.

Contact: ashishvt@cse.iitm.ac.in.

Show MeSH