Limits...
Probabilistic inference of biological networks via data integration.

Rogers MF, Campbell C, Ying Y - Biomed Res Int (2015)

Bottom Line: There is significant interest in inferring the structure of subcellular networks of interaction.Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy.Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%.

View Article: PubMed Central - PubMed

Affiliation: Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Bristol BS8 1UB, UK.

ABSTRACT
There is significant interest in inferring the structure of subcellular networks of interaction. Here we consider supervised interactive network inference in which a reference set of known network links and nonlinks is used to train a classifier for predicting new links. Many types of data are relevant to inferring functional links between genes, motivating the use of data integration. We use pairwise kernels to predict novel links, along with multiple kernel learning to integrate distinct sources of data into a decision function. We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%.

Show MeSH
Comparison of average rankings for accuracy (a) and AUC (b) for 20 small data sets using unweighted pairwise kernels. The dot for each kernel identifies its mean rank; horizontal bars depict the Nemenyi test critical region for α = 0.05. The tensor product kernel () consistently had the highest ranking while the symmetric direct sum kernel () had the lowest. The differences between the remaining three kernels become clearer when we consider AUC as well as accuracy: the metric learning () kernel has higher rankings than the other two on both measures.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4385617&req=5

fig1: Comparison of average rankings for accuracy (a) and AUC (b) for 20 small data sets using unweighted pairwise kernels. The dot for each kernel identifies its mean rank; horizontal bars depict the Nemenyi test critical region for α = 0.05. The tensor product kernel () consistently had the highest ranking while the symmetric direct sum kernel () had the lowest. The differences between the remaining three kernels become clearer when we consider AUC as well as accuracy: the metric learning () kernel has higher rankings than the other two on both measures.

Mentions: For small data sets, the tensor product kernel () consistently yields the highest accuracy ranking of any pairwise kernel (mean 1.0) while the symmetric direct sum kernel () consistently yields the lowest (Figure 1). The metric learning (), cosine-like (), and Cartesian graph product () pairwise kernels yield intermediate rankings, though the kernel (mean 2.0) was consistently ranked higher than the other two. When we rank the kernels based on AUC score as well as accuracy, we again see that the kernel yields higher performance than or , but here the ranking is higher than that for , making it difficult to identify a clear winner between them. The kernel's high accuracy and AUC rankings are statistically significant (α = 0.01) when compared to all but the kernels, but the differences between and are not statistically significant at α = 0.05. Results for medium and large data sets (not shown) are nearly identical, but the smaller data size yields less statistical power.


Probabilistic inference of biological networks via data integration.

Rogers MF, Campbell C, Ying Y - Biomed Res Int (2015)

Comparison of average rankings for accuracy (a) and AUC (b) for 20 small data sets using unweighted pairwise kernels. The dot for each kernel identifies its mean rank; horizontal bars depict the Nemenyi test critical region for α = 0.05. The tensor product kernel () consistently had the highest ranking while the symmetric direct sum kernel () had the lowest. The differences between the remaining three kernels become clearer when we consider AUC as well as accuracy: the metric learning () kernel has higher rankings than the other two on both measures.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4385617&req=5

fig1: Comparison of average rankings for accuracy (a) and AUC (b) for 20 small data sets using unweighted pairwise kernels. The dot for each kernel identifies its mean rank; horizontal bars depict the Nemenyi test critical region for α = 0.05. The tensor product kernel () consistently had the highest ranking while the symmetric direct sum kernel () had the lowest. The differences between the remaining three kernels become clearer when we consider AUC as well as accuracy: the metric learning () kernel has higher rankings than the other two on both measures.
Mentions: For small data sets, the tensor product kernel () consistently yields the highest accuracy ranking of any pairwise kernel (mean 1.0) while the symmetric direct sum kernel () consistently yields the lowest (Figure 1). The metric learning (), cosine-like (), and Cartesian graph product () pairwise kernels yield intermediate rankings, though the kernel (mean 2.0) was consistently ranked higher than the other two. When we rank the kernels based on AUC score as well as accuracy, we again see that the kernel yields higher performance than or , but here the ranking is higher than that for , making it difficult to identify a clear winner between them. The kernel's high accuracy and AUC rankings are statistically significant (α = 0.01) when compared to all but the kernels, but the differences between and are not statistically significant at α = 0.05. Results for medium and large data sets (not shown) are nearly identical, but the smaller data size yields less statistical power.

Bottom Line: There is significant interest in inferring the structure of subcellular networks of interaction.Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy.Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%.

View Article: PubMed Central - PubMed

Affiliation: Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Bristol BS8 1UB, UK.

ABSTRACT
There is significant interest in inferring the structure of subcellular networks of interaction. Here we consider supervised interactive network inference in which a reference set of known network links and nonlinks is used to train a classifier for predicting new links. Many types of data are relevant to inferring functional links between genes, motivating the use of data integration. We use pairwise kernels to predict novel links, along with multiple kernel learning to integrate distinct sources of data into a decision function. We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%.

Show MeSH