Limits...
Fuzzy association rules for biological data analysis: a case study on yeast.

Lopez FJ, Blanco A, Garcia F, Cano C, Marin A - BMC Bioinformatics (2008)

Bottom Line: A number of association rules have been found, many of them agreeing with previous research in the area.In addition, a comparison between crisp and fuzzy results proves the fuzzy associations to be more reliable than crisp ones.An integrative approach as the one carried out in this work can unveil significant knowledge which is currently hidden and dispersed through the existing biological databases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and AI, University of Granada, 18071, Granada, Spain. fjavier@decsai.ugr.es

ABSTRACT

Background: Last years' mapping of diverse genomes has generated huge amounts of biological data which are currently dispersed through many databases. Integration of the information available in the various databases is required to unveil possible associations relating already known data. Biological data are often imprecise and noisy. Fuzzy set theory is specially suitable to model imprecise data while association rules are very appropriate to integrate heterogeneous data.

Results: In this work we propose a novel fuzzy methodology based on a fuzzy association rule mining method for biological knowledge extraction. We apply this methodology over a yeast genome dataset containing heterogeneous information regarding structural and functional genome features. A number of association rules have been found, many of them agreeing with previous research in the area. In addition, a comparison between crisp and fuzzy results proves the fuzzy associations to be more reliable than crisp ones.

Conclusion: An integrative approach as the one carried out in this work can unveil significant knowledge which is currently hidden and dispersed through the existing biological databases. It is shown that fuzzy association rules can model this knowledge in an intuitive way by using linguistic labels and few easy-understandable parameters.

Show MeSH
Biclusters 3 & 4. This figure shows the gene expression pattern represented by biclusters 3 (A) and 4 (B).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2277399&req=5

Figure 3: Biclusters 3 & 4. This figure shows the gene expression pattern represented by biclusters 3 (A) and 4 (B).

Mentions: The first four biclusters in Table 7 represent gene expression profiles obtained from the Cell Cycle microarray experiments. Association rules in Table 7 state that bicluster 1 is formed by genes which products are located into the nucleus and in some non-membrane-bound organelles (the definition of non-membrane-bound organelle includes ribosomes, the cytoskeleton and chromosomes). This bicluster was obtained by the EDA biclustering algorithm and Figure 2A depicts the expression pattern it represents. As can be seen, bicluster 1 contains genes over-expressed at the beginning of the cell cycle and under-expressed at the end. It is clear the periodicity of the expression levels of these genes across the two cell cycles comprised in the microarray experiments dataset. Bicluster 2 was also obtained by the EDA biclustering algorithm. ORFs associated to this bicluster have medium length and high responsiveness and carry out an oxidoreductase function. The expression pattern represented by this cluster can be seen in Figure 2B. The next two rules in Table 7 refer to bicluster 3 which was obtained by the EDA biclustering algorithm. ORFs in bicluster 3 yield proteins which carry out their activities into the nucleus and participate in the DNA metabolism. Looking at Figure 3A we can confirm the correspondence between the biological process DNA metabolism and the expression behavior of the genes belonging to the cluster. These genes are over-expressed in the S phase of cell cycle (samples 2–3 and 10–12), in which DNA replication takes place. Finally, some relations are shown for bicluster 4 (Figure 3B). This bicluster was obtained by the Gene & Sample Shaving biclustering algorithm and represents ORFs which gene expression varies sharply from under-expressed to over-expressed when the change of cell cycle takes place (time points 7 to 10). Rules in Table 7 relate bicluster 4 to short ORFs with a high G+C proportion. This makes sense since as was described above it is known that short ORFs tend be GC rich.


Fuzzy association rules for biological data analysis: a case study on yeast.

Lopez FJ, Blanco A, Garcia F, Cano C, Marin A - BMC Bioinformatics (2008)

Biclusters 3 & 4. This figure shows the gene expression pattern represented by biclusters 3 (A) and 4 (B).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2277399&req=5

Figure 3: Biclusters 3 & 4. This figure shows the gene expression pattern represented by biclusters 3 (A) and 4 (B).
Mentions: The first four biclusters in Table 7 represent gene expression profiles obtained from the Cell Cycle microarray experiments. Association rules in Table 7 state that bicluster 1 is formed by genes which products are located into the nucleus and in some non-membrane-bound organelles (the definition of non-membrane-bound organelle includes ribosomes, the cytoskeleton and chromosomes). This bicluster was obtained by the EDA biclustering algorithm and Figure 2A depicts the expression pattern it represents. As can be seen, bicluster 1 contains genes over-expressed at the beginning of the cell cycle and under-expressed at the end. It is clear the periodicity of the expression levels of these genes across the two cell cycles comprised in the microarray experiments dataset. Bicluster 2 was also obtained by the EDA biclustering algorithm. ORFs associated to this bicluster have medium length and high responsiveness and carry out an oxidoreductase function. The expression pattern represented by this cluster can be seen in Figure 2B. The next two rules in Table 7 refer to bicluster 3 which was obtained by the EDA biclustering algorithm. ORFs in bicluster 3 yield proteins which carry out their activities into the nucleus and participate in the DNA metabolism. Looking at Figure 3A we can confirm the correspondence between the biological process DNA metabolism and the expression behavior of the genes belonging to the cluster. These genes are over-expressed in the S phase of cell cycle (samples 2–3 and 10–12), in which DNA replication takes place. Finally, some relations are shown for bicluster 4 (Figure 3B). This bicluster was obtained by the Gene & Sample Shaving biclustering algorithm and represents ORFs which gene expression varies sharply from under-expressed to over-expressed when the change of cell cycle takes place (time points 7 to 10). Rules in Table 7 relate bicluster 4 to short ORFs with a high G+C proportion. This makes sense since as was described above it is known that short ORFs tend be GC rich.

Bottom Line: A number of association rules have been found, many of them agreeing with previous research in the area.In addition, a comparison between crisp and fuzzy results proves the fuzzy associations to be more reliable than crisp ones.An integrative approach as the one carried out in this work can unveil significant knowledge which is currently hidden and dispersed through the existing biological databases.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and AI, University of Granada, 18071, Granada, Spain. fjavier@decsai.ugr.es

ABSTRACT

Background: Last years' mapping of diverse genomes has generated huge amounts of biological data which are currently dispersed through many databases. Integration of the information available in the various databases is required to unveil possible associations relating already known data. Biological data are often imprecise and noisy. Fuzzy set theory is specially suitable to model imprecise data while association rules are very appropriate to integrate heterogeneous data.

Results: In this work we propose a novel fuzzy methodology based on a fuzzy association rule mining method for biological knowledge extraction. We apply this methodology over a yeast genome dataset containing heterogeneous information regarding structural and functional genome features. A number of association rules have been found, many of them agreeing with previous research in the area. In addition, a comparison between crisp and fuzzy results proves the fuzzy associations to be more reliable than crisp ones.

Conclusion: An integrative approach as the one carried out in this work can unveil significant knowledge which is currently hidden and dispersed through the existing biological databases. It is shown that fuzzy association rules can model this knowledge in an intuitive way by using linguistic labels and few easy-understandable parameters.

Show MeSH