Limits...
RINQ: Reference-based Indexing for Network Queries.

Gülsoy G, Kahveci T - Bioinformatics (2011)

Bottom Line: We perform pairwise alignment only for the remaining networks.We also propose a supervised method to pick references that have a large chance of filtering the unpromising database networks.Extensive experimental evaluation suggests that (i) our method reduced the running time of a single query on a database of around 300 networks from over 2 days to only 8 h; (ii) our method outperformed the state of the art method Closure Tree and SAGA by a factor of three or more; and (iii) our method successfully identified statistically and biologically significant relationships across networks and organisms.

View Article: PubMed Central - PubMed

Affiliation: Computer and Information Sciences and Engineering Department, University of Florida, Gainesville, FL 32611, USA. ggulsoy@cise.ufl.edu

ABSTRACT
We consider the problem of similarity queries in biological network databases. Given a database of networks, similarity query returns all the database networks whose similarity (i.e. alignment score) to a given query network is at least a specified similarity cutoff value. Alignment of two networks is a very costly operation, which makes exhaustive comparison of all the database networks with a query impractical. To tackle this problem, we develop a novel indexing method, named RINQ (Reference-based Indexing for Biological Network Queries). Our method uses a set of reference networks to eliminate a large portion of the database quickly for each query. A reference network is a small biological network. We precompute and store the alignments of all the references with all the database networks. When our database is queried, we align the query network with all the reference networks. Using these alignments, we calculate a lower bound and an approximate upper bound to the alignment score of each database network with the query network. With the help of upper and lower bounds, we eliminate the majority of the database networks without aligning them to the query network. We also quickly identify a small portion of these as guaranteed to be similar to the query. We perform pairwise alignment only for the remaining networks. We also propose a supervised method to pick references that have a large chance of filtering the unpromising database networks. Extensive experimental evaluation suggests that (i) our method reduced the running time of a single query on a database of around 300 networks from over 2 days to only 8 h; (ii) our method outperformed the state of the art method Closure Tree and SAGA by a factor of three or more; and (iii) our method successfully identified statistically and biologically significant relationships across networks and organisms.

Show MeSH
Sample queries from our experiments. (a) A subnetwork of ErbB signaling pathway of Rattus Norvegicus (rat). (b) A subnetwork of VEGF signaling pathway of Mus Musculus (mouse).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117351&req=5

Figure 7: Sample queries from our experiments. (a) A subnetwork of ErbB signaling pathway of Rattus Norvegicus (rat). (b) A subnetwork of VEGF signaling pathway of Mus Musculus (mouse).

Mentions: Figure 7 depicts two of our top queries. The query in Figure 7a is a subnetwork of the ErbB signaling network of Rattus norvegicus. This query is ranked as the first for 90 and 80% similarity and the second for the 70% similarity. ErbB signaling network is closely related to cancer (Hynes and Lane, 2005). Mutations and dysregulations of ErbB network proteins are frequently observed in cancer cells. ErbB signaling network consists of ErbB family of receptor proteins and a number of intracellular signaling pathways (Yarden and Sliwkowski, 2001). ErbB family of receptors are cell membrane proteins. Extracellular growth factors bind to these proteins. In response to this binding, ErbB proteins can trigger many different cell signaling networks. One of the most important intracellular targets of ErbB receptors is MAPK signaling network. Our first query is a part of ErbB signaling pathway that is responsible for triggering the MAPK signaling pathway.Fig. 7.


RINQ: Reference-based Indexing for Network Queries.

Gülsoy G, Kahveci T - Bioinformatics (2011)

Sample queries from our experiments. (a) A subnetwork of ErbB signaling pathway of Rattus Norvegicus (rat). (b) A subnetwork of VEGF signaling pathway of Mus Musculus (mouse).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117351&req=5

Figure 7: Sample queries from our experiments. (a) A subnetwork of ErbB signaling pathway of Rattus Norvegicus (rat). (b) A subnetwork of VEGF signaling pathway of Mus Musculus (mouse).
Mentions: Figure 7 depicts two of our top queries. The query in Figure 7a is a subnetwork of the ErbB signaling network of Rattus norvegicus. This query is ranked as the first for 90 and 80% similarity and the second for the 70% similarity. ErbB signaling network is closely related to cancer (Hynes and Lane, 2005). Mutations and dysregulations of ErbB network proteins are frequently observed in cancer cells. ErbB signaling network consists of ErbB family of receptor proteins and a number of intracellular signaling pathways (Yarden and Sliwkowski, 2001). ErbB family of receptors are cell membrane proteins. Extracellular growth factors bind to these proteins. In response to this binding, ErbB proteins can trigger many different cell signaling networks. One of the most important intracellular targets of ErbB receptors is MAPK signaling network. Our first query is a part of ErbB signaling pathway that is responsible for triggering the MAPK signaling pathway.Fig. 7.

Bottom Line: We perform pairwise alignment only for the remaining networks.We also propose a supervised method to pick references that have a large chance of filtering the unpromising database networks.Extensive experimental evaluation suggests that (i) our method reduced the running time of a single query on a database of around 300 networks from over 2 days to only 8 h; (ii) our method outperformed the state of the art method Closure Tree and SAGA by a factor of three or more; and (iii) our method successfully identified statistically and biologically significant relationships across networks and organisms.

View Article: PubMed Central - PubMed

Affiliation: Computer and Information Sciences and Engineering Department, University of Florida, Gainesville, FL 32611, USA. ggulsoy@cise.ufl.edu

ABSTRACT
We consider the problem of similarity queries in biological network databases. Given a database of networks, similarity query returns all the database networks whose similarity (i.e. alignment score) to a given query network is at least a specified similarity cutoff value. Alignment of two networks is a very costly operation, which makes exhaustive comparison of all the database networks with a query impractical. To tackle this problem, we develop a novel indexing method, named RINQ (Reference-based Indexing for Biological Network Queries). Our method uses a set of reference networks to eliminate a large portion of the database quickly for each query. A reference network is a small biological network. We precompute and store the alignments of all the references with all the database networks. When our database is queried, we align the query network with all the reference networks. Using these alignments, we calculate a lower bound and an approximate upper bound to the alignment score of each database network with the query network. With the help of upper and lower bounds, we eliminate the majority of the database networks without aligning them to the query network. We also quickly identify a small portion of these as guaranteed to be similar to the query. We perform pairwise alignment only for the remaining networks. We also propose a supervised method to pick references that have a large chance of filtering the unpromising database networks. Extensive experimental evaluation suggests that (i) our method reduced the running time of a single query on a database of around 300 networks from over 2 days to only 8 h; (ii) our method outperformed the state of the art method Closure Tree and SAGA by a factor of three or more; and (iii) our method successfully identified statistically and biologically significant relationships across networks and organisms.

Show MeSH