Limits...
Predicting protein functions using incomplete hierarchical labels.

Yu G, Zhu H, Domeniconi C - BMC Bioinformatics (2015)

Bottom Line: Protein function prediction is to assign biological or biochemical functions to proteins, and it is a challenging computational problem characterized by several factors: (1) the number of function labels (annotations) is large; (2) a protein may be associated with multiple labels; (3) the function labels are structured in a hierarchy; and (4) the labels are incomplete.The proposed method (PILL) can serve as a valuable tool for protein function prediction using incomplete labels.The Matlab code of PILL is available upon request.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China. guoxian85@gmail.com.

ABSTRACT

Background: Protein function prediction is to assign biological or biochemical functions to proteins, and it is a challenging computational problem characterized by several factors: (1) the number of function labels (annotations) is large; (2) a protein may be associated with multiple labels; (3) the function labels are structured in a hierarchy; and (4) the labels are incomplete. Current predictive models often assume that the labels of the labeled proteins are complete, i.e. no label is missing. But in real scenarios, we may be aware of only some hierarchical labels of a protein, and we may not know whether additional ones are actually present. The scenario of incomplete hierarchical labels, a challenging and practical problem, is seldom studied in protein function prediction.

Results: In this paper, we propose an algorithm to Predict protein functions using Incomplete hierarchical LabeLs (PILL in short). PILL takes into account the hierarchical and the flat taxonomy similarity between function labels, and defines a Combined Similarity (ComSim) to measure the correlation between labels. PILL estimates the missing labels for a protein based on ComSim and the known labels of the protein, and uses a regularization to exploit the interactions between proteins for function prediction. PILL is shown to outperform other related techniques in replenishing the missing labels and in predicting the functions of completely unlabeled proteins on publicly available PPI datasets annotated with MIPS Functional Catalogue and Gene Ontology labels.

Conclusion: The empirical study shows that it is important to consider the incomplete annotation for protein function prediction. The proposed method (PILL) can serve as a valuable tool for protein function prediction using incomplete labels. The Matlab code of PILL is available upon request.

Show MeSH

Related in: MedlinePlus

The benefit of using function correlation and Guilt by Association rule on the proteins inCollingsPPI annotated with FunCat labels. PILL-FC only uses the function correlation between function labels, PILL-GbA only uses the guilty by association rule, and PILL uses both the function correlation and the guilt by association rule.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4384381&req=5

Fig4: The benefit of using function correlation and Guilt by Association rule on the proteins inCollingsPPI annotated with FunCat labels. PILL-FC only uses the function correlation between function labels, PILL-GbA only uses the guilty by association rule, and PILL uses both the function correlation and the guilt by association rule.

Mentions: We conducted experiments to study the benefit of using function correlations and the guilt by association rule. We define two variants of PILL: (i) PILL-FC just utilizes the estimated , without using the second term (‘Guilt by Association’ rule) in Eq. (8), and (ii) PILL-GbA just uses the second term in Eq. (8) and does not use function correlations to estimate the missing labels. The recorded results (AvgROC and 1-RankLoss) on CollingsPPI with respect to FunCat labels are given in Figure 4. The results on CollingsPPI and KroganPPI with respect to other evaluation metrics are reported in Figure S6-8 of the Additional file 1.Figure 4


Predicting protein functions using incomplete hierarchical labels.

Yu G, Zhu H, Domeniconi C - BMC Bioinformatics (2015)

The benefit of using function correlation and Guilt by Association rule on the proteins inCollingsPPI annotated with FunCat labels. PILL-FC only uses the function correlation between function labels, PILL-GbA only uses the guilty by association rule, and PILL uses both the function correlation and the guilt by association rule.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4384381&req=5

Fig4: The benefit of using function correlation and Guilt by Association rule on the proteins inCollingsPPI annotated with FunCat labels. PILL-FC only uses the function correlation between function labels, PILL-GbA only uses the guilty by association rule, and PILL uses both the function correlation and the guilt by association rule.
Mentions: We conducted experiments to study the benefit of using function correlations and the guilt by association rule. We define two variants of PILL: (i) PILL-FC just utilizes the estimated , without using the second term (‘Guilt by Association’ rule) in Eq. (8), and (ii) PILL-GbA just uses the second term in Eq. (8) and does not use function correlations to estimate the missing labels. The recorded results (AvgROC and 1-RankLoss) on CollingsPPI with respect to FunCat labels are given in Figure 4. The results on CollingsPPI and KroganPPI with respect to other evaluation metrics are reported in Figure S6-8 of the Additional file 1.Figure 4

Bottom Line: Protein function prediction is to assign biological or biochemical functions to proteins, and it is a challenging computational problem characterized by several factors: (1) the number of function labels (annotations) is large; (2) a protein may be associated with multiple labels; (3) the function labels are structured in a hierarchy; and (4) the labels are incomplete.The proposed method (PILL) can serve as a valuable tool for protein function prediction using incomplete labels.The Matlab code of PILL is available upon request.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China. guoxian85@gmail.com.

ABSTRACT

Background: Protein function prediction is to assign biological or biochemical functions to proteins, and it is a challenging computational problem characterized by several factors: (1) the number of function labels (annotations) is large; (2) a protein may be associated with multiple labels; (3) the function labels are structured in a hierarchy; and (4) the labels are incomplete. Current predictive models often assume that the labels of the labeled proteins are complete, i.e. no label is missing. But in real scenarios, we may be aware of only some hierarchical labels of a protein, and we may not know whether additional ones are actually present. The scenario of incomplete hierarchical labels, a challenging and practical problem, is seldom studied in protein function prediction.

Results: In this paper, we propose an algorithm to Predict protein functions using Incomplete hierarchical LabeLs (PILL in short). PILL takes into account the hierarchical and the flat taxonomy similarity between function labels, and defines a Combined Similarity (ComSim) to measure the correlation between labels. PILL estimates the missing labels for a protein based on ComSim and the known labels of the protein, and uses a regularization to exploit the interactions between proteins for function prediction. PILL is shown to outperform other related techniques in replenishing the missing labels and in predicting the functions of completely unlabeled proteins on publicly available PPI datasets annotated with MIPS Functional Catalogue and Gene Ontology labels.

Conclusion: The empirical study shows that it is important to consider the incomplete annotation for protein function prediction. The proposed method (PILL) can serve as a valuable tool for protein function prediction using incomplete labels. The Matlab code of PILL is available upon request.

Show MeSH
Related in: MedlinePlus