Limits...
Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes.

Yong CH, Wong L - Biol. Direct (2015)

Bottom Line: First, many complexes are sparsely connected in the PPI network, and do not form dense clusters that can be derived by clustering algorithms.Third, many complexes are small (composed of two or three distinct proteins), so that traditional topological markers such as density are ineffective.With our approach, protein complexes can be more accurately and comprehensively predicted, allowing a clearer elucidation of the modular machinery of the cell.

View Article: PubMed Central - PubMed

Affiliation: Graduate School for Integrative Sciences and Engineering, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore. yongchernhan@gmail.com.

ABSTRACT

Background: The prediction of protein complexes from high-throughput protein-protein interaction (PPI) data remains an important challenge in bioinformatics. Three groups of complexes have been identified as problematic to discover. First, many complexes are sparsely connected in the PPI network, and do not form dense clusters that can be derived by clustering algorithms. Second, many complexes are embedded within highly-connected regions of the PPI network, which makes it difficult to accurately delimit their boundaries. Third, many complexes are small (composed of two or three distinct proteins), so that traditional topological markers such as density are ineffective.

Results: We have previously proposed three approaches to address these challenges. First, Supervised Weighting of Composite Networks (SWC) integrates diverse data sources with supervised weighting, and successfully fills in missing co-complex edges in sparse complexes to allow them to be predicted. Second, network decomposition (DECOMP) splits the PPI network into spatially- and temporally-coherent subnetworks, allowing complexes embedded within highly-connected regions to be more clearly demarcated. Finally, Size-Specific Supervised Weighting (SSS) integrates diverse data sources with supervised learning to weight edges in a size-specific manner-of being in a small complex versus a large complex-and improves the prediction of small complexes. Here we integrate these three approaches into a single system. We test the integrated approach on the prediction of yeast and human complexes, and show that it outperforms SWC, DECOMP, or SSS when run individually, achieving the highest precision and recall levels.

Conclusion: Three groups of protein complexes remain challenging to predict from PPI data: sparse complexes, embedded complexes, and small complexes. Our previous approaches have addressed each of these challenges individually, through data integration, PPI-network decomposition, and supervised learning. Here we integrate these approaches into a single complex-discovery system, which improves the prediction of all three types of challenging complexes. With our approach, protein complexes can be more accurately and comprehensively predicted, allowing a clearer elucidation of the modular machinery of the cell.

No MeSH data available.


Statistics of human reference complexes
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4522147&req=5

Fig5: Statistics of human reference complexes

Mentions: To investigate the performance of our integrated approach with respect to the three challenges that we highlighted, we stratify the reference complexes in terms of their sizes, extraneous edges, and densities. First, to quantify whether a complex is embedded within a highly-connected region of the PPI network, we derive EXT, the number of external proteins that are highly connected to it, defined as being connected to at least half of the proteins in the complex. Second, to quantify how sparse a complex is, we derive DENS, the density of each complex, defined as the number of PPI edges in the complex divided by the total number of possible edges in the complex. In our analysis, we stratify the complexes into large and small complexes, and further stratify the large complexes into low, medium, and high DENS (corresponding to DENS of [ 0,.35], (.35,.7], and (.7,1] respectively), and low and high EXT (corresponding to EXT ≤3 and >3 respectively), to give seven total strata (one for small complexes, and six for large complexes). Figures 4 and 5 show the size distribution, and the DENS and EXT analysis strata of the large complexes, of the yeast and human complexes. Note that the less-challenging complexes to predict are the large complexes with high DENS and low EXT, and these correspond to around only 15 % and 5 % of complexes in yeast and human respectively; the remaining complexes are challenging in some way, with the majority of them falling into the small-complex category.Fig. 4


Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes.

Yong CH, Wong L - Biol. Direct (2015)

Statistics of human reference complexes
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4522147&req=5

Fig5: Statistics of human reference complexes
Mentions: To investigate the performance of our integrated approach with respect to the three challenges that we highlighted, we stratify the reference complexes in terms of their sizes, extraneous edges, and densities. First, to quantify whether a complex is embedded within a highly-connected region of the PPI network, we derive EXT, the number of external proteins that are highly connected to it, defined as being connected to at least half of the proteins in the complex. Second, to quantify how sparse a complex is, we derive DENS, the density of each complex, defined as the number of PPI edges in the complex divided by the total number of possible edges in the complex. In our analysis, we stratify the complexes into large and small complexes, and further stratify the large complexes into low, medium, and high DENS (corresponding to DENS of [ 0,.35], (.35,.7], and (.7,1] respectively), and low and high EXT (corresponding to EXT ≤3 and >3 respectively), to give seven total strata (one for small complexes, and six for large complexes). Figures 4 and 5 show the size distribution, and the DENS and EXT analysis strata of the large complexes, of the yeast and human complexes. Note that the less-challenging complexes to predict are the large complexes with high DENS and low EXT, and these correspond to around only 15 % and 5 % of complexes in yeast and human respectively; the remaining complexes are challenging in some way, with the majority of them falling into the small-complex category.Fig. 4

Bottom Line: First, many complexes are sparsely connected in the PPI network, and do not form dense clusters that can be derived by clustering algorithms.Third, many complexes are small (composed of two or three distinct proteins), so that traditional topological markers such as density are ineffective.With our approach, protein complexes can be more accurately and comprehensively predicted, allowing a clearer elucidation of the modular machinery of the cell.

View Article: PubMed Central - PubMed

Affiliation: Graduate School for Integrative Sciences and Engineering, National University of Singapore, 28 Medical Drive, Singapore, 117456, Singapore. yongchernhan@gmail.com.

ABSTRACT

Background: The prediction of protein complexes from high-throughput protein-protein interaction (PPI) data remains an important challenge in bioinformatics. Three groups of complexes have been identified as problematic to discover. First, many complexes are sparsely connected in the PPI network, and do not form dense clusters that can be derived by clustering algorithms. Second, many complexes are embedded within highly-connected regions of the PPI network, which makes it difficult to accurately delimit their boundaries. Third, many complexes are small (composed of two or three distinct proteins), so that traditional topological markers such as density are ineffective.

Results: We have previously proposed three approaches to address these challenges. First, Supervised Weighting of Composite Networks (SWC) integrates diverse data sources with supervised weighting, and successfully fills in missing co-complex edges in sparse complexes to allow them to be predicted. Second, network decomposition (DECOMP) splits the PPI network into spatially- and temporally-coherent subnetworks, allowing complexes embedded within highly-connected regions to be more clearly demarcated. Finally, Size-Specific Supervised Weighting (SSS) integrates diverse data sources with supervised learning to weight edges in a size-specific manner-of being in a small complex versus a large complex-and improves the prediction of small complexes. Here we integrate these three approaches into a single system. We test the integrated approach on the prediction of yeast and human complexes, and show that it outperforms SWC, DECOMP, or SSS when run individually, achieving the highest precision and recall levels.

Conclusion: Three groups of protein complexes remain challenging to predict from PPI data: sparse complexes, embedded complexes, and small complexes. Our previous approaches have addressed each of these challenges individually, through data integration, PPI-network decomposition, and supervised learning. Here we integrate these approaches into a single complex-discovery system, which improves the prediction of all three types of challenging complexes. With our approach, protein complexes can be more accurately and comprehensively predicted, allowing a clearer elucidation of the modular machinery of the cell.

No MeSH data available.