Limits...
A fast approach to global alignment of protein-protein interaction networks.

Kollias G, Sathe M, Mohammadi S, Grama A - BMC Res Notes (2013)

Bottom Line: Recent work on computation of node-pair similarity matrices has demonstrated that the computational cost of the first step can be significantly reduced.We have demonstrated significant reductions in global network alignment computation times by coupling heuristic bipartite matching methods with the similarity scoring step of the IsoRank procedure.Our heuristic matching techniques maintain comparable - if not better - quality in resulting alignments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. gkollias@purdue.edu

ABSTRACT

Background: Global network alignment has been proposed as an effective tool for computing functional orthology. Commonly used global alignment techniques such as IsoRank rely on a two-step process: the first step is an iterative diffusion-based approach for assigning similarity scores to all possible node pairs (matchings); the second step applies a maximum-weight bipartite matching algorithm to this similarity score matrix to identify orthologous node pairs. While demonstrably successful in identifying orthologies beyond those based on sequences, this two-step process is computationally expensive. Recent work on computation of node-pair similarity matrices has demonstrated that the computational cost of the first step can be significantly reduced. The use of these accelerated methods renders the bipartite matching step as the dominant computational cost. This motivates a critical assessment of the tradeoffs of computational cost and solution quality (matching quality, topological matches, and biological significance) associated with the bipartite matching step. In this paper we utilize the state-of-the-art core diffusion-based step in IsoRank for similarity matrix computation, and couple it with two heuristic bipartite matching algorithms - a matrix-based greedy approach, and a tunable, adaptive, auction-based matching algorithm developed by us. We then compare our implementations against the performance and quality characteristics of the solution produced by the reference IsoRank binary, which also implements an optimal matching algorithm.

Results: Using heuristic matching algorithms in the IsoRank pipeline exhibits dramatic speedup improvements; typically ×30 times faster for the total alignment process in most cases of interest. More surprisingly, these improvements in compute times are typically accompanied by better or comparable topological and biological quality for the network alignments generated. These measures are quantified by the number of conserved edges in the alignment graph, the percentage of enriched components, and the total number of covered Gene Ontology (GO) terms.

Conclusions: We have demonstrated significant reductions in global network alignment computation times by coupling heuristic bipartite matching methods with the similarity scoring step of the IsoRank procedure. Our heuristic matching techniques maintain comparable - if not better - quality in resulting alignments. A consequence of our work is that network-alignment based orthologies can be computed within minutes (as compared to hours) on typical protein interaction networks, enabling a more comprehensive tuning of alignment parameters for refined orthologies.

Show MeSH

Related in: MedlinePlus

Conserved components identified by the mat3_auction method. Conserved components identified by the mat3_auction method. Each component is coherently annotated with specific branch of the biological process. (a) Peroxisome organization. (b) RNA splicing. (c) Histone modification. (d) Ribosome biogenesis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3672019&req=5

Figure 3: Conserved components identified by the mat3_auction method. Conserved components identified by the mat3_auction method. Each component is coherently annotated with specific branch of the biological process. (a) Peroxisome organization. (b) RNA splicing. (c) Histone modification. (d) Ribosome biogenesis.

Mentions: We also evaluate the performance of the mat3_auction method by extracting components that are highly enriched in both species with respect to a unique GO term. We manually curated the components identified by mat3_auction and selected four significant components for our case study, spanning four different species pairs. These components, which are shown in Figure 3, reveal that there is a strong correlation between structural conservation and functional similarity, which has been faithfully recovered by mat3_auction. Most of these components have more than one co-enriched GO term, but interestingly enough, these terms are coherent in the sense that they describe the same function in more or less detail. For example, component 3(b) is annotated with RNA processing, which is a generalization of the term nuclear mRNA splicing via spliceosome, another enriched term in 3(b). Similarly, component 3(c) is enriched with both histone acetylation and histone modification. The first term is clearly a refinement of the type of modification.


A fast approach to global alignment of protein-protein interaction networks.

Kollias G, Sathe M, Mohammadi S, Grama A - BMC Res Notes (2013)

Conserved components identified by the mat3_auction method. Conserved components identified by the mat3_auction method. Each component is coherently annotated with specific branch of the biological process. (a) Peroxisome organization. (b) RNA splicing. (c) Histone modification. (d) Ribosome biogenesis.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3672019&req=5

Figure 3: Conserved components identified by the mat3_auction method. Conserved components identified by the mat3_auction method. Each component is coherently annotated with specific branch of the biological process. (a) Peroxisome organization. (b) RNA splicing. (c) Histone modification. (d) Ribosome biogenesis.
Mentions: We also evaluate the performance of the mat3_auction method by extracting components that are highly enriched in both species with respect to a unique GO term. We manually curated the components identified by mat3_auction and selected four significant components for our case study, spanning four different species pairs. These components, which are shown in Figure 3, reveal that there is a strong correlation between structural conservation and functional similarity, which has been faithfully recovered by mat3_auction. Most of these components have more than one co-enriched GO term, but interestingly enough, these terms are coherent in the sense that they describe the same function in more or less detail. For example, component 3(b) is annotated with RNA processing, which is a generalization of the term nuclear mRNA splicing via spliceosome, another enriched term in 3(b). Similarly, component 3(c) is enriched with both histone acetylation and histone modification. The first term is clearly a refinement of the type of modification.

Bottom Line: Recent work on computation of node-pair similarity matrices has demonstrated that the computational cost of the first step can be significantly reduced.We have demonstrated significant reductions in global network alignment computation times by coupling heuristic bipartite matching methods with the similarity scoring step of the IsoRank procedure.Our heuristic matching techniques maintain comparable - if not better - quality in resulting alignments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA. gkollias@purdue.edu

ABSTRACT

Background: Global network alignment has been proposed as an effective tool for computing functional orthology. Commonly used global alignment techniques such as IsoRank rely on a two-step process: the first step is an iterative diffusion-based approach for assigning similarity scores to all possible node pairs (matchings); the second step applies a maximum-weight bipartite matching algorithm to this similarity score matrix to identify orthologous node pairs. While demonstrably successful in identifying orthologies beyond those based on sequences, this two-step process is computationally expensive. Recent work on computation of node-pair similarity matrices has demonstrated that the computational cost of the first step can be significantly reduced. The use of these accelerated methods renders the bipartite matching step as the dominant computational cost. This motivates a critical assessment of the tradeoffs of computational cost and solution quality (matching quality, topological matches, and biological significance) associated with the bipartite matching step. In this paper we utilize the state-of-the-art core diffusion-based step in IsoRank for similarity matrix computation, and couple it with two heuristic bipartite matching algorithms - a matrix-based greedy approach, and a tunable, adaptive, auction-based matching algorithm developed by us. We then compare our implementations against the performance and quality characteristics of the solution produced by the reference IsoRank binary, which also implements an optimal matching algorithm.

Results: Using heuristic matching algorithms in the IsoRank pipeline exhibits dramatic speedup improvements; typically ×30 times faster for the total alignment process in most cases of interest. More surprisingly, these improvements in compute times are typically accompanied by better or comparable topological and biological quality for the network alignments generated. These measures are quantified by the number of conserved edges in the alignment graph, the percentage of enriched components, and the total number of covered Gene Ontology (GO) terms.

Conclusions: We have demonstrated significant reductions in global network alignment computation times by coupling heuristic bipartite matching methods with the similarity scoring step of the IsoRank procedure. Our heuristic matching techniques maintain comparable - if not better - quality in resulting alignments. A consequence of our work is that network-alignment based orthologies can be computed within minutes (as compared to hours) on typical protein interaction networks, enabling a more comprehensive tuning of alignment parameters for refined orthologies.

Show MeSH
Related in: MedlinePlus