Limits...
Biclustering of gene expression data by correlation-based scatter search.

Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS - BioData Min (2011)

Bottom Line: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria.In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes.In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dpt, Lenguajes y Sistemas Informáticos, ETSII, University of Seville, Avd, Reina Mercedes s/n, 41012, Seville, Spain. janepo@us.es.

ABSTRACT

Background: The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes.

Methods: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes.

Results: The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.

No MeSH data available.


Related in: MedlinePlus

Comparison of different Biclustering algorithms. Comparison of different Biclustering methods from Gasch Yeast data set: percentage of enriched biclusters by GO Biological Process category for each method at different significance levels. (p-values from p = 0.001% to p = 5%).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3037342&req=5

Figure 7: Comparison of different Biclustering algorithms. Comparison of different Biclustering methods from Gasch Yeast data set: percentage of enriched biclusters by GO Biological Process category for each method at different significance levels. (p-values from p = 0.001% to p = 5%).

Mentions: Figure 7 represents the percentage of enrichment biclusters for each method in which one or several GO terms are overrepresented for different levels of significance (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1 and 5). In this Figure, SScorr11 means the proposed Scatter Search with penalization parameters M1 = 1 and M2 = 1. Analogously, SScorr1010 is the Scatter Search with M1 = 10 and M2 = 10. With p-value p = 0.01, the proportion of biclusters significantly enriched by any GO Biological Process category for SScorr11 and SScorr1010 is over 30%, for CC is over 21%, for OPSM over 17%, for BiMax 2% and 0% for the rest. It can be observed that SScorr1010 improves the results of the rest of the methods for small levels of significance except to the CC when p = 0.001 (for instance, see the most restrictive level of significance p = 0.001 for the p-value). However, both Scatter Search algorithms obtained a percentage of significant biclusters greater than CC for p = 0.005 and p = 0.01 and the CC presents a percentage of significant biclusters greater than SScorr11 when p-value ranges from p = 0.05 to p = 5. This is due to the volume of the biclusters since it is easier to find functional enrichment from large groups of genes than from small groups. Table 2 presents information about the size of biclusters obtained by the different methods. Note that biclusters obtained by the CC algorithm have more genes that biclusters for obtained by the algorithms based on Scatter Search. SScorr1010 finds biclusters with more genes than SScorr11 and therefore it improves the results of CC for all levels of significance from p = 0.005 to p = 5. The rest of methods find a less percentage of biclusters with the p-values specified than the proposed method although OPSM presents good results for high levels of significance (p >0.05).


Biclustering of gene expression data by correlation-based scatter search.

Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS - BioData Min (2011)

Comparison of different Biclustering algorithms. Comparison of different Biclustering methods from Gasch Yeast data set: percentage of enriched biclusters by GO Biological Process category for each method at different significance levels. (p-values from p = 0.001% to p = 5%).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3037342&req=5

Figure 7: Comparison of different Biclustering algorithms. Comparison of different Biclustering methods from Gasch Yeast data set: percentage of enriched biclusters by GO Biological Process category for each method at different significance levels. (p-values from p = 0.001% to p = 5%).
Mentions: Figure 7 represents the percentage of enrichment biclusters for each method in which one or several GO terms are overrepresented for different levels of significance (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1 and 5). In this Figure, SScorr11 means the proposed Scatter Search with penalization parameters M1 = 1 and M2 = 1. Analogously, SScorr1010 is the Scatter Search with M1 = 10 and M2 = 10. With p-value p = 0.01, the proportion of biclusters significantly enriched by any GO Biological Process category for SScorr11 and SScorr1010 is over 30%, for CC is over 21%, for OPSM over 17%, for BiMax 2% and 0% for the rest. It can be observed that SScorr1010 improves the results of the rest of the methods for small levels of significance except to the CC when p = 0.001 (for instance, see the most restrictive level of significance p = 0.001 for the p-value). However, both Scatter Search algorithms obtained a percentage of significant biclusters greater than CC for p = 0.005 and p = 0.01 and the CC presents a percentage of significant biclusters greater than SScorr11 when p-value ranges from p = 0.05 to p = 5. This is due to the volume of the biclusters since it is easier to find functional enrichment from large groups of genes than from small groups. Table 2 presents information about the size of biclusters obtained by the different methods. Note that biclusters obtained by the CC algorithm have more genes that biclusters for obtained by the algorithms based on Scatter Search. SScorr1010 finds biclusters with more genes than SScorr11 and therefore it improves the results of CC for all levels of significance from p = 0.005 to p = 5. The rest of methods find a less percentage of biclusters with the p-values specified than the proposed method although OPSM presents good results for high levels of significance (p >0.05).

Bottom Line: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria.In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes.In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dpt, Lenguajes y Sistemas Informáticos, ETSII, University of Seville, Avd, Reina Mercedes s/n, 41012, Seville, Spain. janepo@us.es.

ABSTRACT

Background: The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes.

Methods: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes.

Results: The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.

No MeSH data available.


Related in: MedlinePlus