Limits...
Query-based biclustering of gene expression data using Probabilistic Relational Models.

Zhao H, Cloots L, Van den Bulcke T, Wu Y, De Smet R, Storms V, Meysman P, Engelen K, Marchal K - BMC Bioinformatics (2011)

Bottom Line: Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set.We compared ProBic's performance with previously published query-based biclustering algorithms, namely ISA and QDB, from the perspective of bicluster expression quality, robustness of the outcome against noisy seed sets and biological relevance.This comparison learns that ProBic is able to retrieve biologically relevant, high quality biclusters that retain their seed genes and that it is particularly strong in handling noisy seeds.ProBic is a query-based biclustering algorithm developed in a flexible framework, designed to detect biologically relevant, high quality biclusters that retain relevant seed genes even in the presence of noise or when dealing with low quality seed sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Microbial and Molecular Systems, KU Leuven, Leuven 3001, Belgium. hui.zhao@biw.kuleuven.be

ABSTRACT

Background: With the availability of large scale expression compendia it is now possible to view own findings in the light of what is already available and retrieve genes with an expression profile similar to a set of genes of interest (i.e., a query or seed set) for a subset of conditions. To that end, a query-based strategy is needed that maximally exploits the coexpression behaviour of the seed genes to guide the biclustering, but that at the same time is robust against the presence of noisy genes in the seed set as seed genes are often assumed, but not guaranteed to be coexpressed in the queried compendium. Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set.

Results: We applied ProBic on a large scale Escherichia coli compendium to extend partially described regulons with potentially novel members. We compared ProBic's performance with previously published query-based biclustering algorithms, namely ISA and QDB, from the perspective of bicluster expression quality, robustness of the outcome against noisy seed sets and biological relevance.This comparison learns that ProBic is able to retrieve biologically relevant, high quality biclusters that retain their seed genes and that it is particularly strong in handling noisy seeds.

Conclusions: ProBic is a query-based biclustering algorithm developed in a flexible framework, designed to detect biologically relevant, high quality biclusters that retain relevant seed genes even in the presence of noise or when dealing with low quality seed sets.

Show MeSH

Related in: MedlinePlus

Biological relevance of the obtained biclusters. Histogram displaying the percentage of biclusters (derived from a total of 225 different seed sets) that were found to be enriched in ‘functional categories’ that are related to the functions of the original seed genes or TFs, and enriched in ‘motifs’ that represent both the simple and complex regulons from which the seed genes were derived: this indicates to what extent the additionally recruited genes contain similar motifs as the seed genes: ProBic (blue) - QDB (green) - ISA (red).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3044293&req=5

Figure 4: Biological relevance of the obtained biclusters. Histogram displaying the percentage of biclusters (derived from a total of 225 different seed sets) that were found to be enriched in ‘functional categories’ that are related to the functions of the original seed genes or TFs, and enriched in ‘motifs’ that represent both the simple and complex regulons from which the seed genes were derived: this indicates to what extent the additionally recruited genes contain similar motifs as the seed genes: ProBic (blue) - QDB (green) - ISA (red).

Mentions: To test biological relevance of the biclusters obtained using the 225 seed sets, we assessed to what extent functional classes that were found to be enriched amongst the bicluster genes were similar to the classes to which either the TF of the regulon that was used as seed set or at least one of the seed genes belonged. As an independent assessment of how well the obtained biclusters recapitulate the original simple and complex regulons, we calculated whether our obtained biclusters were overrepresented in the regulatory motifs of the corresponding simple/complex regulons. Figure 4: Biological relevance of the obtained biclusters, shows that both ISA and ProBic largely outperform QDB at the level of motif and functional overrepresentation. Biclusters retrieved by ISA and ProBic show a comparable motif enrichment and a slightly better functional enrichment for those derived from ISA than for those obtained by ProBic: for low informative seeds, ProBic mainly finds ‘empty’ biclusters or biclusters with only seed genes, whereas ISA drifts away to larger biclusters no longer containing the seeds (see also Additional File 6: Behavior of the different algorithms towards seed genes). Both situations gave rise to a similar loss in motif enrichment. Drift away biclusters, while no longer containing the seed genes, can still contain genes that are functionally related to the seed genes in which case they will still contribute to the functional overrepresentation.


Query-based biclustering of gene expression data using Probabilistic Relational Models.

Zhao H, Cloots L, Van den Bulcke T, Wu Y, De Smet R, Storms V, Meysman P, Engelen K, Marchal K - BMC Bioinformatics (2011)

Biological relevance of the obtained biclusters. Histogram displaying the percentage of biclusters (derived from a total of 225 different seed sets) that were found to be enriched in ‘functional categories’ that are related to the functions of the original seed genes or TFs, and enriched in ‘motifs’ that represent both the simple and complex regulons from which the seed genes were derived: this indicates to what extent the additionally recruited genes contain similar motifs as the seed genes: ProBic (blue) - QDB (green) - ISA (red).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3044293&req=5

Figure 4: Biological relevance of the obtained biclusters. Histogram displaying the percentage of biclusters (derived from a total of 225 different seed sets) that were found to be enriched in ‘functional categories’ that are related to the functions of the original seed genes or TFs, and enriched in ‘motifs’ that represent both the simple and complex regulons from which the seed genes were derived: this indicates to what extent the additionally recruited genes contain similar motifs as the seed genes: ProBic (blue) - QDB (green) - ISA (red).
Mentions: To test biological relevance of the biclusters obtained using the 225 seed sets, we assessed to what extent functional classes that were found to be enriched amongst the bicluster genes were similar to the classes to which either the TF of the regulon that was used as seed set or at least one of the seed genes belonged. As an independent assessment of how well the obtained biclusters recapitulate the original simple and complex regulons, we calculated whether our obtained biclusters were overrepresented in the regulatory motifs of the corresponding simple/complex regulons. Figure 4: Biological relevance of the obtained biclusters, shows that both ISA and ProBic largely outperform QDB at the level of motif and functional overrepresentation. Biclusters retrieved by ISA and ProBic show a comparable motif enrichment and a slightly better functional enrichment for those derived from ISA than for those obtained by ProBic: for low informative seeds, ProBic mainly finds ‘empty’ biclusters or biclusters with only seed genes, whereas ISA drifts away to larger biclusters no longer containing the seeds (see also Additional File 6: Behavior of the different algorithms towards seed genes). Both situations gave rise to a similar loss in motif enrichment. Drift away biclusters, while no longer containing the seed genes, can still contain genes that are functionally related to the seed genes in which case they will still contribute to the functional overrepresentation.

Bottom Line: Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set.We compared ProBic's performance with previously published query-based biclustering algorithms, namely ISA and QDB, from the perspective of bicluster expression quality, robustness of the outcome against noisy seed sets and biological relevance.This comparison learns that ProBic is able to retrieve biologically relevant, high quality biclusters that retain their seed genes and that it is particularly strong in handling noisy seeds.ProBic is a query-based biclustering algorithm developed in a flexible framework, designed to detect biologically relevant, high quality biclusters that retain relevant seed genes even in the presence of noise or when dealing with low quality seed sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Microbial and Molecular Systems, KU Leuven, Leuven 3001, Belgium. hui.zhao@biw.kuleuven.be

ABSTRACT

Background: With the availability of large scale expression compendia it is now possible to view own findings in the light of what is already available and retrieve genes with an expression profile similar to a set of genes of interest (i.e., a query or seed set) for a subset of conditions. To that end, a query-based strategy is needed that maximally exploits the coexpression behaviour of the seed genes to guide the biclustering, but that at the same time is robust against the presence of noisy genes in the seed set as seed genes are often assumed, but not guaranteed to be coexpressed in the queried compendium. Therefore, we developed ProBic, a query-based biclustering strategy based on Probabilistic Relational Models (PRMs) that exploits the use of prior distributions to extract the information contained within the seed set.

Results: We applied ProBic on a large scale Escherichia coli compendium to extend partially described regulons with potentially novel members. We compared ProBic's performance with previously published query-based biclustering algorithms, namely ISA and QDB, from the perspective of bicluster expression quality, robustness of the outcome against noisy seed sets and biological relevance.This comparison learns that ProBic is able to retrieve biologically relevant, high quality biclusters that retain their seed genes and that it is particularly strong in handling noisy seeds.

Conclusions: ProBic is a query-based biclustering algorithm developed in a flexible framework, designed to detect biologically relevant, high quality biclusters that retain relevant seed genes even in the presence of noise or when dealing with low quality seed sets.

Show MeSH
Related in: MedlinePlus