Limits...
Markov clustering versus affinity propagation for the partitioning of protein interaction graphs.

Vlasblom J, Wodak SJ - BMC Bioinformatics (2009)

Bottom Line: A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs.The advantage of MCL over AP is dramatic for unweighted protein interaction graphs, as AP displays severe convergence problems on the majority of the unweighted graph versions that we tested, whereas MCL continues to identify meaningful clusters, albeit fewer of them, as the level of noise in the graph increases.MCL thus remains the method of choice for identifying protein complexes from binary interaction networks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular Structure and Function Program, Hospital for Sick Children, Toronto, Ontario, Canada. jim.vlasblom@utoronto.ca

ABSTRACT

Background: Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. This task is commonly executed using clustering procedures, which aim at detecting densely connected regions within the interaction graphs. There exists a wealth of clustering algorithms, some of which have been applied to this problem. One of the most successful clustering procedures in this context has been the Markov Cluster algorithm (MCL), which was recently shown to outperform a number of other procedures, some of which were specifically designed for partitioning protein interactions graphs. A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs.

Results: In this work we compare the performance of the Affinity Propagation (AP) and Markov Clustering (MCL) procedures. To this end we derive an unweighted network of protein-protein interactions from a set of 408 protein complexes from S. cervisiae hand curated in-house, and evaluate the performance of the two clustering algorithms in recalling the annotated complexes. In doing so the parameter space of each algorithm is sampled in order to select optimal values for these parameters, and the robustness of the algorithms is assessed by quantifying the level of complex recall as interactions are randomly added or removed to the network to simulate noise. To evaluate the performance on a weighted protein interaction graph, we also apply the two algorithms to the consolidated protein interaction network of S. cerevisiae, derived from genome scale purification experiments and to versions of this network in which varying proportions of the links have been randomly shuffled.

Conclusion: Our analysis shows that the MCL procedure is significantly more tolerant to noise and behaves more robustly than the AP algorithm. The advantage of MCL over AP is dramatic for unweighted protein interaction graphs, as AP displays severe convergence problems on the majority of the unweighted graph versions that we tested, whereas MCL continues to identify meaningful clusters, albeit fewer of them, as the level of noise in the graph increases. MCL thus remains the method of choice for identifying protein complexes from binary interaction networks.

Show MeSH
Original unweighted protein interaction graph and graphs of curated complexes linked through their shared components. (a)Unweighted protein interaction graph comprising 1628 proteins (nodes) and 11 249 interactions (edges) generated from the 408 hand curated complexes of S. cerevisiae[28]. (b, c)two copies of a portion of the graph in (a), where complexes (nodes) are linked to one another whenever they share at least 2 components, and the node size is proportional to the number of unique proteins each complex contains. (b)and (c)have the AP and MCL clusters respectively mapped onto the curated complexes, so that pie charts show proportions of complex components that are annotated to the same AP or MCL cluster. The mapped clusters are computed from versions of the original unweighted network shown in (a) in which 20% of the edges were randomly added and 20% randomly removed. Complexes whose components distribute among many clusters appear as multi-colored pie graphs, whereas those that are annotated to the same cluster appear solid-colored. The bright red color indicates the proportion of components that were assigned to singleton clusters by the AP or MCL algorithm. All the comparisons were performed with partitions obtained by optimizing the MCL and AP parameters respectively (see Methods). The pie graphs were generated using the GenePro plugin[32] for Cytoscape[31].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2682798&req=5

Figure 1: Original unweighted protein interaction graph and graphs of curated complexes linked through their shared components. (a)Unweighted protein interaction graph comprising 1628 proteins (nodes) and 11 249 interactions (edges) generated from the 408 hand curated complexes of S. cerevisiae[28]. (b, c)two copies of a portion of the graph in (a), where complexes (nodes) are linked to one another whenever they share at least 2 components, and the node size is proportional to the number of unique proteins each complex contains. (b)and (c)have the AP and MCL clusters respectively mapped onto the curated complexes, so that pie charts show proportions of complex components that are annotated to the same AP or MCL cluster. The mapped clusters are computed from versions of the original unweighted network shown in (a) in which 20% of the edges were randomly added and 20% randomly removed. Complexes whose components distribute among many clusters appear as multi-colored pie graphs, whereas those that are annotated to the same cluster appear solid-colored. The bright red color indicates the proportion of components that were assigned to singleton clusters by the AP or MCL algorithm. All the comparisons were performed with partitions obtained by optimizing the MCL and AP parameters respectively (see Methods). The pie graphs were generated using the GenePro plugin[32] for Cytoscape[31].

Mentions: Both algorithms are first applied to partition unweighted protein interaction graphs. The original version of these graphs was built from a set of 408 S. cerevisiae protein complexes hand curated in-house[28] (see Additional File 1). In this graph, nodes represent individual proteins from these complexes, and any two proteins belonging to the same complex are linked by an edge. Figure 1a illustrates this graph as rendered by the Cytoscape[31] network visualization and analysis software. This rather disjoint graph is comprised of 11,249 interactions and 1,628 proteins, where the majority of the proteins are linked only to members of the same complex, forming distinct cliques, and only a small fraction are linked to members of different complexes. This graph is clearly a less challenging test for clustering procedures than protein interaction networks built from experimental data, since those networks include an appreciable level of spurious links (False Positive links). Networks built from experimental data typically feature more links between proteins in different complexes and not all members of a given complex are always linked to one another. To better mimic these more realistic networks we randomly add or remove links to this original network in various proportions, as done by Brohée et al. [15] thereby generating different versions of the original network which include varying levels of noise, representing different proportions of False Positives (FP) and False Negatives (FN) (links deleted from the graph, but present in the original network).


Markov clustering versus affinity propagation for the partitioning of protein interaction graphs.

Vlasblom J, Wodak SJ - BMC Bioinformatics (2009)

Original unweighted protein interaction graph and graphs of curated complexes linked through their shared components. (a)Unweighted protein interaction graph comprising 1628 proteins (nodes) and 11 249 interactions (edges) generated from the 408 hand curated complexes of S. cerevisiae[28]. (b, c)two copies of a portion of the graph in (a), where complexes (nodes) are linked to one another whenever they share at least 2 components, and the node size is proportional to the number of unique proteins each complex contains. (b)and (c)have the AP and MCL clusters respectively mapped onto the curated complexes, so that pie charts show proportions of complex components that are annotated to the same AP or MCL cluster. The mapped clusters are computed from versions of the original unweighted network shown in (a) in which 20% of the edges were randomly added and 20% randomly removed. Complexes whose components distribute among many clusters appear as multi-colored pie graphs, whereas those that are annotated to the same cluster appear solid-colored. The bright red color indicates the proportion of components that were assigned to singleton clusters by the AP or MCL algorithm. All the comparisons were performed with partitions obtained by optimizing the MCL and AP parameters respectively (see Methods). The pie graphs were generated using the GenePro plugin[32] for Cytoscape[31].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2682798&req=5

Figure 1: Original unweighted protein interaction graph and graphs of curated complexes linked through their shared components. (a)Unweighted protein interaction graph comprising 1628 proteins (nodes) and 11 249 interactions (edges) generated from the 408 hand curated complexes of S. cerevisiae[28]. (b, c)two copies of a portion of the graph in (a), where complexes (nodes) are linked to one another whenever they share at least 2 components, and the node size is proportional to the number of unique proteins each complex contains. (b)and (c)have the AP and MCL clusters respectively mapped onto the curated complexes, so that pie charts show proportions of complex components that are annotated to the same AP or MCL cluster. The mapped clusters are computed from versions of the original unweighted network shown in (a) in which 20% of the edges were randomly added and 20% randomly removed. Complexes whose components distribute among many clusters appear as multi-colored pie graphs, whereas those that are annotated to the same cluster appear solid-colored. The bright red color indicates the proportion of components that were assigned to singleton clusters by the AP or MCL algorithm. All the comparisons were performed with partitions obtained by optimizing the MCL and AP parameters respectively (see Methods). The pie graphs were generated using the GenePro plugin[32] for Cytoscape[31].
Mentions: Both algorithms are first applied to partition unweighted protein interaction graphs. The original version of these graphs was built from a set of 408 S. cerevisiae protein complexes hand curated in-house[28] (see Additional File 1). In this graph, nodes represent individual proteins from these complexes, and any two proteins belonging to the same complex are linked by an edge. Figure 1a illustrates this graph as rendered by the Cytoscape[31] network visualization and analysis software. This rather disjoint graph is comprised of 11,249 interactions and 1,628 proteins, where the majority of the proteins are linked only to members of the same complex, forming distinct cliques, and only a small fraction are linked to members of different complexes. This graph is clearly a less challenging test for clustering procedures than protein interaction networks built from experimental data, since those networks include an appreciable level of spurious links (False Positive links). Networks built from experimental data typically feature more links between proteins in different complexes and not all members of a given complex are always linked to one another. To better mimic these more realistic networks we randomly add or remove links to this original network in various proportions, as done by Brohée et al. [15] thereby generating different versions of the original network which include varying levels of noise, representing different proportions of False Positives (FP) and False Negatives (FN) (links deleted from the graph, but present in the original network).

Bottom Line: A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs.The advantage of MCL over AP is dramatic for unweighted protein interaction graphs, as AP displays severe convergence problems on the majority of the unweighted graph versions that we tested, whereas MCL continues to identify meaningful clusters, albeit fewer of them, as the level of noise in the graph increases.MCL thus remains the method of choice for identifying protein complexes from binary interaction networks.

View Article: PubMed Central - HTML - PubMed

Affiliation: Molecular Structure and Function Program, Hospital for Sick Children, Toronto, Ontario, Canada. jim.vlasblom@utoronto.ca

ABSTRACT

Background: Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. This task is commonly executed using clustering procedures, which aim at detecting densely connected regions within the interaction graphs. There exists a wealth of clustering algorithms, some of which have been applied to this problem. One of the most successful clustering procedures in this context has been the Markov Cluster algorithm (MCL), which was recently shown to outperform a number of other procedures, some of which were specifically designed for partitioning protein interactions graphs. A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs.

Results: In this work we compare the performance of the Affinity Propagation (AP) and Markov Clustering (MCL) procedures. To this end we derive an unweighted network of protein-protein interactions from a set of 408 protein complexes from S. cervisiae hand curated in-house, and evaluate the performance of the two clustering algorithms in recalling the annotated complexes. In doing so the parameter space of each algorithm is sampled in order to select optimal values for these parameters, and the robustness of the algorithms is assessed by quantifying the level of complex recall as interactions are randomly added or removed to the network to simulate noise. To evaluate the performance on a weighted protein interaction graph, we also apply the two algorithms to the consolidated protein interaction network of S. cerevisiae, derived from genome scale purification experiments and to versions of this network in which varying proportions of the links have been randomly shuffled.

Conclusion: Our analysis shows that the MCL procedure is significantly more tolerant to noise and behaves more robustly than the AP algorithm. The advantage of MCL over AP is dramatic for unweighted protein interaction graphs, as AP displays severe convergence problems on the majority of the unweighted graph versions that we tested, whereas MCL continues to identify meaningful clusters, albeit fewer of them, as the level of noise in the graph increases. MCL thus remains the method of choice for identifying protein complexes from binary interaction networks.

Show MeSH