Limits...
Predicting pathway membership via domain signatures.

Fröhlich H, Fellmann M, Sültmann H, Poustka A, Beissbarth T - Bioinformatics (2008)

Bottom Line: In contrast, information on contained protein domains can be obtained for a significantly higher number of genes, e.g. from the InterPro database.Moreover, for signaling pathways we reveal that it is even possible to forecast accurately the membership to individual pathway components.The R package gene2pathway is a supplement to this article.

View Article: PubMed Central - PubMed

Affiliation: German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany. h.froehlich@dkfz-heidelberg.de

ABSTRACT

Motivation: Functional characterization of genes is of great importance for the understanding of complex cellular processes. Valuable information for this purpose can be obtained from pathway databases, like KEGG. However, only a small fraction of genes is annotated with pathway information up to now. In contrast, information on contained protein domains can be obtained for a significantly higher number of genes, e.g. from the InterPro database.

Results: We present a classification model, which for a specific gene of interest can predict the mapping to a KEGG pathway, based on its domain signature. The classifier makes explicit use of the hierarchical organization of pathways in the KEGG database. Furthermore, we take into account that a specific gene can be mapped to different pathways at the same time. The classification method produces a scoring of all possible mapping positions of the gene in the KEGG hierarchy. Evaluations of our model, which is a combination of a SVM and ranking perceptron approach, show a high prediction performance. Moreover, for signaling pathways we reveal that it is even possible to forecast accurately the membership to individual pathway components.

Availability: The R package gene2pathway is a supplement to this article.

Show MeSH
Predicted pathway component (shaded) for PLCH2 in the Calcium signaling pathway.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2553439&req=5

Figure 3: Predicted pathway component (shaded) for PLCH2 in the Calcium signaling pathway.

Mentions: In a second step of our analysis we filtered those genes, which were either known to be involved in signal transduction by KEGG annotation (458 genes), or which were predicted by our model to map to the corresponding KEGG hierarchy branch with confidence>99% (164 genes). Comparison of our pathway component predictions for the 458 genes with the original KEGG information, revealed a very high median accuracy of ∼100% with a median F1-value>80% and precision and recall in the same range (Fig. 2B). As an example application of our model, in Figure 3 we depict the predicted connected component for PLCH2 (confidence=100%) in the calcium signaling pathway, for which previously no KEGG annotation was available. The gene has an associated GO function ‘calcium ion binding’ and GO process ‘intracellular signaling cascade’ (The Gene Ontology Consortium, 2004).


Predicting pathway membership via domain signatures.

Fröhlich H, Fellmann M, Sültmann H, Poustka A, Beissbarth T - Bioinformatics (2008)

Predicted pathway component (shaded) for PLCH2 in the Calcium signaling pathway.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2553439&req=5

Figure 3: Predicted pathway component (shaded) for PLCH2 in the Calcium signaling pathway.
Mentions: In a second step of our analysis we filtered those genes, which were either known to be involved in signal transduction by KEGG annotation (458 genes), or which were predicted by our model to map to the corresponding KEGG hierarchy branch with confidence>99% (164 genes). Comparison of our pathway component predictions for the 458 genes with the original KEGG information, revealed a very high median accuracy of ∼100% with a median F1-value>80% and precision and recall in the same range (Fig. 2B). As an example application of our model, in Figure 3 we depict the predicted connected component for PLCH2 (confidence=100%) in the calcium signaling pathway, for which previously no KEGG annotation was available. The gene has an associated GO function ‘calcium ion binding’ and GO process ‘intracellular signaling cascade’ (The Gene Ontology Consortium, 2004).

Bottom Line: In contrast, information on contained protein domains can be obtained for a significantly higher number of genes, e.g. from the InterPro database.Moreover, for signaling pathways we reveal that it is even possible to forecast accurately the membership to individual pathway components.The R package gene2pathway is a supplement to this article.

View Article: PubMed Central - PubMed

Affiliation: German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany. h.froehlich@dkfz-heidelberg.de

ABSTRACT

Motivation: Functional characterization of genes is of great importance for the understanding of complex cellular processes. Valuable information for this purpose can be obtained from pathway databases, like KEGG. However, only a small fraction of genes is annotated with pathway information up to now. In contrast, information on contained protein domains can be obtained for a significantly higher number of genes, e.g. from the InterPro database.

Results: We present a classification model, which for a specific gene of interest can predict the mapping to a KEGG pathway, based on its domain signature. The classifier makes explicit use of the hierarchical organization of pathways in the KEGG database. Furthermore, we take into account that a specific gene can be mapped to different pathways at the same time. The classification method produces a scoring of all possible mapping positions of the gene in the KEGG hierarchy. Evaluations of our model, which is a combination of a SVM and ranking perceptron approach, show a high prediction performance. Moreover, for signaling pathways we reveal that it is even possible to forecast accurately the membership to individual pathway components.

Availability: The R package gene2pathway is a supplement to this article.

Show MeSH