Limits...
Database integration of 4923 publicly-available samples of breast cancer molecular and clinical data.

Planey CR, Butte AJ - AMIA Jt Summits Transl Sci Proc (2013)

Bottom Line: We use as our implementation example a breast cancer database linking RNA expression measurements (by microarray) and clinical variables, such as survival metrics and tumor size.Such an endeavor involves integrating across different microarray datasets as well as clinical parameters.We demonstrate several pilot examples using this database.

View Article: PubMed Central - PubMed

Affiliation: Stanford Biomedical Informatics; Stanford, CA.

ABSTRACT
We outline a paradigm for meta-microarray database creation and integration with clinical variables. We use as our implementation example a breast cancer database linking RNA expression measurements (by microarray) and clinical variables, such as survival metrics and tumor size. Such an endeavor involves integrating across different microarray datasets as well as clinical parameters. To this end, we created a data curation and processing pipeline, formal database ontology, and SQL schema to optimally query, analyze and visualize data from over 30 publicly available breast cancer microarray studies listed in the Gene Expression Omnibus (GEO). We demonstrate several pilot examples using this database. This methodology serves as a model for future meta-analyses of complex public clinical datasets, in particular those in the field of cancer.

No MeSH data available.


Related in: MedlinePlus

Pipeline for meta-microarray analysis across GEO cohorts.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3814460&req=5

f1-amia_tbi_2013_138: Pipeline for meta-microarray analysis across GEO cohorts.

Mentions: We next followed a general heuristic to pre-process and analyze all the microarray data. The workflow is outlined in Figure 1. After selecting the final 30 datasets, microarray files for each patient were downloaded. Most steps outlined are standard to any microarray analysis; however, a meta-analysis introduces two new steps that are critical to the final expression values: analysis of samples measured across different arrays, and handling of samples collected from different sites but included in the same GEO dataset. Large datasets in GEO, whether from retrospective searches through a hospital’s tissue bank or a clinical trial, often cull together samples measured on different arrays or from different sites.


Database integration of 4923 publicly-available samples of breast cancer molecular and clinical data.

Planey CR, Butte AJ - AMIA Jt Summits Transl Sci Proc (2013)

Pipeline for meta-microarray analysis across GEO cohorts.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3814460&req=5

f1-amia_tbi_2013_138: Pipeline for meta-microarray analysis across GEO cohorts.
Mentions: We next followed a general heuristic to pre-process and analyze all the microarray data. The workflow is outlined in Figure 1. After selecting the final 30 datasets, microarray files for each patient were downloaded. Most steps outlined are standard to any microarray analysis; however, a meta-analysis introduces two new steps that are critical to the final expression values: analysis of samples measured across different arrays, and handling of samples collected from different sites but included in the same GEO dataset. Large datasets in GEO, whether from retrospective searches through a hospital’s tissue bank or a clinical trial, often cull together samples measured on different arrays or from different sites.

Bottom Line: We use as our implementation example a breast cancer database linking RNA expression measurements (by microarray) and clinical variables, such as survival metrics and tumor size.Such an endeavor involves integrating across different microarray datasets as well as clinical parameters.We demonstrate several pilot examples using this database.

View Article: PubMed Central - PubMed

Affiliation: Stanford Biomedical Informatics; Stanford, CA.

ABSTRACT
We outline a paradigm for meta-microarray database creation and integration with clinical variables. We use as our implementation example a breast cancer database linking RNA expression measurements (by microarray) and clinical variables, such as survival metrics and tumor size. Such an endeavor involves integrating across different microarray datasets as well as clinical parameters. To this end, we created a data curation and processing pipeline, formal database ontology, and SQL schema to optimally query, analyze and visualize data from over 30 publicly available breast cancer microarray studies listed in the Gene Expression Omnibus (GEO). We demonstrate several pilot examples using this database. This methodology serves as a model for future meta-analyses of complex public clinical datasets, in particular those in the field of cancer.

No MeSH data available.


Related in: MedlinePlus