Limits...
Biclustering of gene expression data by correlation-based scatter search.

Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS - BioData Min (2011)

Bottom Line: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria.In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes.In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dpt, Lenguajes y Sistemas Informáticos, ETSII, University of Seville, Avd, Reina Mercedes s/n, 41012, Seville, Spain. janepo@us.es.

ABSTRACT

Background: The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes.

Methods: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes.

Results: The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.

No MeSH data available.


Related in: MedlinePlus

Results for Lymphoma data set. Several biclusters found by the Algorithm 1 from Lymphoma data set
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3037342&req=5

Figure 4: Results for Lymphoma data set. Several biclusters found by the Algorithm 1 from Lymphoma data set

Mentions: Table 1 shows the information for four biclusters selected among 100 biclusters obtained by the application of the proposed Scatter Search and the average of the 100 biclusters (in bold). For each bicluster an identifier of the bicluster, the number of genes, the number of conditions, the volume, the average correlation, ρ(B), and the standard deviation, σ(B), are presented. The MSR and the variance of gene values are reported too in order to establish a comparison of the quality of biclusters with other algorithms. The variance of gene values measures how different the values of the gene expression level are. Figure 3 and 4 present the four biclusters for Yeast and Lymphoma data set, respectively, which are reported in Table 1. Figure 5 and 6 depict biclusters from GaschYeast data set. The biclusters bi1-GaschYeastN1, bi1-GaschYeastN10, bi1-GaschYeastN11 and bi1-GaschYeastN25 in Figure 5 have been obtained for values M1 = 1 and M2 = 1 and the biclusters bi2-GaschYeastN1, bi2-GaschYeastN4, bi2-GaschYeastN9 and bi2-GaschYeastN27 in Figure 6 for values M1 = 10 and M2 = 10. It can be noted that the greater the penalty values are, the greater the volume of the obtained biclusters is. The motivation for taking the values of the parameters M1 = M2 = 1 is to find biclusters with a low number of genes in order to show visually the shifting and scaling patterns. However, the main goal is to find groups of genes sharing the same GO terms, therefore, it is more adequate to search for biclusters with a high number of genes. Thus, parameters M1 = M2 = 10 have been considered to achieve biclusters with a higher volume.


Biclustering of gene expression data by correlation-based scatter search.

Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS - BioData Min (2011)

Results for Lymphoma data set. Several biclusters found by the Algorithm 1 from Lymphoma data set
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3037342&req=5

Figure 4: Results for Lymphoma data set. Several biclusters found by the Algorithm 1 from Lymphoma data set
Mentions: Table 1 shows the information for four biclusters selected among 100 biclusters obtained by the application of the proposed Scatter Search and the average of the 100 biclusters (in bold). For each bicluster an identifier of the bicluster, the number of genes, the number of conditions, the volume, the average correlation, ρ(B), and the standard deviation, σ(B), are presented. The MSR and the variance of gene values are reported too in order to establish a comparison of the quality of biclusters with other algorithms. The variance of gene values measures how different the values of the gene expression level are. Figure 3 and 4 present the four biclusters for Yeast and Lymphoma data set, respectively, which are reported in Table 1. Figure 5 and 6 depict biclusters from GaschYeast data set. The biclusters bi1-GaschYeastN1, bi1-GaschYeastN10, bi1-GaschYeastN11 and bi1-GaschYeastN25 in Figure 5 have been obtained for values M1 = 1 and M2 = 1 and the biclusters bi2-GaschYeastN1, bi2-GaschYeastN4, bi2-GaschYeastN9 and bi2-GaschYeastN27 in Figure 6 for values M1 = 10 and M2 = 10. It can be noted that the greater the penalty values are, the greater the volume of the obtained biclusters is. The motivation for taking the values of the parameters M1 = M2 = 1 is to find biclusters with a low number of genes in order to show visually the shifting and scaling patterns. However, the main goal is to find groups of genes sharing the same GO terms, therefore, it is more adequate to search for biclusters with a high number of genes. Thus, parameters M1 = M2 = 10 have been considered to achieve biclusters with a higher volume.

Bottom Line: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria.In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes.In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dpt, Lenguajes y Sistemas Informáticos, ETSII, University of Seville, Avd, Reina Mercedes s/n, 41012, Seville, Spain. janepo@us.es.

ABSTRACT

Background: The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes.

Methods: Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes.

Results: The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.

No MeSH data available.


Related in: MedlinePlus