Limits...
Multi-view methods for protein structure comparison using latent dirichlet allocation.

Shivashankar S, Srivathsan S, Ravindran B, Tendulkar AV - Bioinformatics (2011)

Bottom Line: It involves two major issues: (i) effective protein structure representation that captures inherent relationship between fragments and facilitates efficient comparison between the structures and (ii) effective framework to address different retrieval requirements.In this article, we propose an improved representation of protein structures using latent dirichlet allocation topic model.We compare the proposed representation and retrieval framework on the benchmark dataset developed by Kolodny and co-workers.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, IIT Madras, Chennai-600 036.

ABSTRACT

Motivation: With rapidly expanding protein structure databases, efficiently retrieving structures similar to a given protein is an important problem. It involves two major issues: (i) effective protein structure representation that captures inherent relationship between fragments and facilitates efficient comparison between the structures and (ii) effective framework to address different retrieval requirements. Recently, researchers proposed vector space model of proteins using bag of fragments representation (FragBag), which corresponds to the basic information retrieval model.

Results: In this article, we propose an improved representation of protein structures using latent dirichlet allocation topic model. Another important requirement is to retrieve proteins, whether they are either close or remote homologs. In order to meet diverse objectives, we propose multi-viewpoint based framework that combines multiple representations and retrieval techniques. We compare the proposed representation and retrieval framework on the benchmark dataset developed by Kolodny and co-workers. The results indicate that the proposed techniques outperform state-of-the-art methods.

Availability: http://www.cse.iitm.ac.in/~ashishvt/research/protein-lda/.

Contact: ashishvt@cse.iitm.ac.in.

Show MeSH
Multi-viewpoint-based IR.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117356&req=5

Figure 4: Multi-viewpoint-based IR.

Mentions: As mentioned earlier, the retrieval might have different objectives for different applications. For example, retrieving proteins that are similar, whether they are close homologs or remote homologs. Text-based IR researchers have shown that retrieval based on combination of multiple query representations, multiple representations of text documents or multiple IR techniques provide significantly improved results compared with single representation-based technique, especially when there are multiple retrieval requirements across users. These techniques are referred to as multi-viewpoint-based IR in literature (Powell and French, 1998). Schema of multi-viewpoint IR is given in Figure 4. The intuition behind doing this is: retrieval information about an author, publication or book would require exact keyword match, but querying based on topics, for example ‘sports news’, must allow more than just keyword match. Motivated by the success of multi-viewpoint-based text IR works, we propose a multi-viewpoint-based retrieval system for protein structure collection. Protein structure similarity can be captured by not only matching fragments in the protein structure, but also similar fragments (not just identity) must also be considered to help protein structure comparison. This is achieved by modeling the protein structure using LDA, which maps the fragments to a topic space using their cooccurrence information. Protein structure comparison at topic space performs a soft matching by considering similar fragments too.Fig. 4.


Multi-view methods for protein structure comparison using latent dirichlet allocation.

Shivashankar S, Srivathsan S, Ravindran B, Tendulkar AV - Bioinformatics (2011)

Multi-viewpoint-based IR.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117356&req=5

Figure 4: Multi-viewpoint-based IR.
Mentions: As mentioned earlier, the retrieval might have different objectives for different applications. For example, retrieving proteins that are similar, whether they are close homologs or remote homologs. Text-based IR researchers have shown that retrieval based on combination of multiple query representations, multiple representations of text documents or multiple IR techniques provide significantly improved results compared with single representation-based technique, especially when there are multiple retrieval requirements across users. These techniques are referred to as multi-viewpoint-based IR in literature (Powell and French, 1998). Schema of multi-viewpoint IR is given in Figure 4. The intuition behind doing this is: retrieval information about an author, publication or book would require exact keyword match, but querying based on topics, for example ‘sports news’, must allow more than just keyword match. Motivated by the success of multi-viewpoint-based text IR works, we propose a multi-viewpoint-based retrieval system for protein structure collection. Protein structure similarity can be captured by not only matching fragments in the protein structure, but also similar fragments (not just identity) must also be considered to help protein structure comparison. This is achieved by modeling the protein structure using LDA, which maps the fragments to a topic space using their cooccurrence information. Protein structure comparison at topic space performs a soft matching by considering similar fragments too.Fig. 4.

Bottom Line: It involves two major issues: (i) effective protein structure representation that captures inherent relationship between fragments and facilitates efficient comparison between the structures and (ii) effective framework to address different retrieval requirements.In this article, we propose an improved representation of protein structures using latent dirichlet allocation topic model.We compare the proposed representation and retrieval framework on the benchmark dataset developed by Kolodny and co-workers.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, IIT Madras, Chennai-600 036.

ABSTRACT

Motivation: With rapidly expanding protein structure databases, efficiently retrieving structures similar to a given protein is an important problem. It involves two major issues: (i) effective protein structure representation that captures inherent relationship between fragments and facilitates efficient comparison between the structures and (ii) effective framework to address different retrieval requirements. Recently, researchers proposed vector space model of proteins using bag of fragments representation (FragBag), which corresponds to the basic information retrieval model.

Results: In this article, we propose an improved representation of protein structures using latent dirichlet allocation topic model. Another important requirement is to retrieve proteins, whether they are either close or remote homologs. In order to meet diverse objectives, we propose multi-viewpoint based framework that combines multiple representations and retrieval techniques. We compare the proposed representation and retrieval framework on the benchmark dataset developed by Kolodny and co-workers. The results indicate that the proposed techniques outperform state-of-the-art methods.

Availability: http://www.cse.iitm.ac.in/~ashishvt/research/protein-lda/.

Contact: ashishvt@cse.iitm.ac.in.

Show MeSH