Limits...
CorrelaGenes: a new tool for the interpretation of the human transcriptome.

Cremaschi P, Rovida S, Sacchi L, Lisa A, Calvi F, Montecucco A, Biamonti G, Bione S, Sacchi G - BMC Bioinformatics (2014)

Bottom Line: The significance of the correlation is measured coupling the Lift, a well-known standard ARM index, and the χ(2) p value.The manually curated selection of the comparisons and the developed algorithm constitute a new approach in the field of gene expression profiling studies.The preliminary results of the simulation showed how CorrelaGenes could contribute to the characterization of molecular pathways and biological processes integrating data obtained from other applications and available in public repositories.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists.

Results: By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) database, we developed a new bioinformatics tool aimed at the identification of genes whose expression appeared simultaneously altered in different experimental conditions, thus suggesting co-regulation or coordinated action in the same biological process. To accomplish this task, we used the 978 human GEO Curated DataSets and we manually performed the selection of 2,109 pair-wise comparisons based on their biological rationale. The lists of differentially expressed genes, obtained from the selected comparisons, were stored in a PostgreSQL database and used as data source for the CorrelaGenes tool. Our application uses a customized Association Rule Mining (ARM) algorithm to identify sets of genes showing expression profiles correlated with a gene of interest. The significance of the correlation is measured coupling the Lift, a well-known standard ARM index, and the χ(2) p value. The manually curated selection of the comparisons and the developed algorithm constitute a new approach in the field of gene expression profiling studies. Simulation performed on 100 randomly selected target genes allowed us to evaluate the efficiency of the procedure and to obtain preliminary data demonstrating the consistency of the results.

Conclusions: The preliminary results of the simulation showed how CorrelaGenes could contribute to the characterization of molecular pathways and biological processes integrating data obtained from other applications and available in public repositories.

Show MeSH

Related in: MedlinePlus

Impact of the ARM indexes on the number of genes in the output lists. (A) Box-plot of the number of genes with respect to different thresholds of χ2 p value. (B) Box-plot of the number of genes with respect to different thresholds of Lift.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4016313&req=5

Figure 5: Impact of the ARM indexes on the number of genes in the output lists. (A) Box-plot of the number of genes with respect to different thresholds of χ2 p value. (B) Box-plot of the number of genes with respect to different thresholds of Lift.

Mentions: Different thresholds of the χ2 p value and of the Lift indexes were evaluated (Figure 5). Increasing by a factor of 10 the threshold of the χ2 p value starting from 0.05 and 0.01 resulted in an almost linear reduction of the number of related genes (Figure 5A). On the contrary, even small increases in the Lift index drastically reduced the number of genes in the output lists (Figure 5B). Increasing the Lift index from 1 to 4 resulted in halving the number of related genes while for Lift values greater than 5 the median number of selected genes is always below 40. A box-plot showing the combined effect of χ2 p value and Lift indexes is presented in Additional File 4.


CorrelaGenes: a new tool for the interpretation of the human transcriptome.

Cremaschi P, Rovida S, Sacchi L, Lisa A, Calvi F, Montecucco A, Biamonti G, Bione S, Sacchi G - BMC Bioinformatics (2014)

Impact of the ARM indexes on the number of genes in the output lists. (A) Box-plot of the number of genes with respect to different thresholds of χ2 p value. (B) Box-plot of the number of genes with respect to different thresholds of Lift.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4016313&req=5

Figure 5: Impact of the ARM indexes on the number of genes in the output lists. (A) Box-plot of the number of genes with respect to different thresholds of χ2 p value. (B) Box-plot of the number of genes with respect to different thresholds of Lift.
Mentions: Different thresholds of the χ2 p value and of the Lift indexes were evaluated (Figure 5). Increasing by a factor of 10 the threshold of the χ2 p value starting from 0.05 and 0.01 resulted in an almost linear reduction of the number of related genes (Figure 5A). On the contrary, even small increases in the Lift index drastically reduced the number of genes in the output lists (Figure 5B). Increasing the Lift index from 1 to 4 resulted in halving the number of related genes while for Lift values greater than 5 the median number of selected genes is always below 40. A box-plot showing the combined effect of χ2 p value and Lift indexes is presented in Additional File 4.

Bottom Line: The significance of the correlation is measured coupling the Lift, a well-known standard ARM index, and the χ(2) p value.The manually curated selection of the comparisons and the developed algorithm constitute a new approach in the field of gene expression profiling studies.The preliminary results of the simulation showed how CorrelaGenes could contribute to the characterization of molecular pathways and biological processes integrating data obtained from other applications and available in public repositories.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists.

Results: By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) database, we developed a new bioinformatics tool aimed at the identification of genes whose expression appeared simultaneously altered in different experimental conditions, thus suggesting co-regulation or coordinated action in the same biological process. To accomplish this task, we used the 978 human GEO Curated DataSets and we manually performed the selection of 2,109 pair-wise comparisons based on their biological rationale. The lists of differentially expressed genes, obtained from the selected comparisons, were stored in a PostgreSQL database and used as data source for the CorrelaGenes tool. Our application uses a customized Association Rule Mining (ARM) algorithm to identify sets of genes showing expression profiles correlated with a gene of interest. The significance of the correlation is measured coupling the Lift, a well-known standard ARM index, and the χ(2) p value. The manually curated selection of the comparisons and the developed algorithm constitute a new approach in the field of gene expression profiling studies. Simulation performed on 100 randomly selected target genes allowed us to evaluate the efficiency of the procedure and to obtain preliminary data demonstrating the consistency of the results.

Conclusions: The preliminary results of the simulation showed how CorrelaGenes could contribute to the characterization of molecular pathways and biological processes integrating data obtained from other applications and available in public repositories.

Show MeSH
Related in: MedlinePlus