Limits...
Robust microarray meta-analysis identifies differentially expressed genes for clinical prediction.

Phan JH, Young AN, Wang MD - ScientificWorldJournal (2012)

Bottom Line: Combining multiple microarray datasets increases sample size and leads to improved reproducibility in identification of informative genes and subsequent clinical prediction.Although microarrays have increased the rate of genomic data collection, sample size is still a major issue when identifying informative genetic biomarkers.Results indicate that rank average meta-analysis consistently performs well compared to five other meta-analysis methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA.

ABSTRACT
Combining multiple microarray datasets increases sample size and leads to improved reproducibility in identification of informative genes and subsequent clinical prediction. Although microarrays have increased the rate of genomic data collection, sample size is still a major issue when identifying informative genetic biomarkers. Because of this, feature selection methods often suffer from false discoveries, resulting in poorly performing predictive models. We develop a simple meta-analysis-based feature selection method that captures the knowledge in each individual dataset and combines the results using a simple rank average. In a comprehensive study that measures robustness in terms of clinical application (i.e., breast, renal, and pancreatic cancer), microarray platform heterogeneity, and classifier (i.e., logistic regression, diagonal LDA, and linear SVM), we compare the rank average meta-analysis method to five other meta-analysis methods. Results indicate that rank average meta-analysis consistently performs well compared to five other meta-analysis methods.

Show MeSH

Related in: MedlinePlus

Procedure for comparing the predictive performance of six microarray meta-analysis-based FS methods. (a) Features are selected from microarray datasets using the rank average meta-analysis method (pink box), several other meta-analysis methods (orange boxes: mDEDS, rank products, Choi, and Wang), and a naive method (blue box) that aggregates samples into a larger dataset. Rank average meta-analysis chooses a single feature selection (FS) method from among several basic FS methods (SAM, fold change, rank sum, t-test, mRMRD, and mRMRQ) for each individual dataset that optimizes prediction performance (via cross-validation) over the top 20 features. A simple weighted average of gene ranks from all individual datasets produces the final set of rank average meta-analysis features. The rank products, Choi, and Wang methods use one basic FS method to select features from multiple datasets while the mDEDS method uses all six basic FS methods. (b) Features are selected from two or more datasets from each group to build a classifier (pink boxes), which is trained with samples from only one dataset (yellow boxes). The performance of the classifier is assessed using independent datasets (datasets not used for training or feature selection, green boxes). The predictive performance of a microarray meta-analysis-based FS method is an average over all permutations of training and validation datasets (blue boxes). In the example, datasets 1–4 consist of one-channel Affymetrix arrays while dataset 5 (in the case of heterogeneous data) consists of two-channel arrays.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3539384&req=5

fig2: Procedure for comparing the predictive performance of six microarray meta-analysis-based FS methods. (a) Features are selected from microarray datasets using the rank average meta-analysis method (pink box), several other meta-analysis methods (orange boxes: mDEDS, rank products, Choi, and Wang), and a naive method (blue box) that aggregates samples into a larger dataset. Rank average meta-analysis chooses a single feature selection (FS) method from among several basic FS methods (SAM, fold change, rank sum, t-test, mRMRD, and mRMRQ) for each individual dataset that optimizes prediction performance (via cross-validation) over the top 20 features. A simple weighted average of gene ranks from all individual datasets produces the final set of rank average meta-analysis features. The rank products, Choi, and Wang methods use one basic FS method to select features from multiple datasets while the mDEDS method uses all six basic FS methods. (b) Features are selected from two or more datasets from each group to build a classifier (pink boxes), which is trained with samples from only one dataset (yellow boxes). The performance of the classifier is assessed using independent datasets (datasets not used for training or feature selection, green boxes). The predictive performance of a microarray meta-analysis-based FS method is an average over all permutations of training and validation datasets (blue boxes). In the example, datasets 1–4 consist of one-channel Affymetrix arrays while dataset 5 (in the case of heterogeneous data) consists of two-channel arrays.

Mentions: We use classification performance to assess meta-analysis-based FS methods with the assumption that improved FS leads to higher prediction performance when classifying samples from an independent dataset. We assess prediction performance using independent training and testing datasets because of the small sample size of some of the datasets and because we want to reflect clinical scenarios in which predictive models would likely be derived from data collected from a separate batch of patients. We compare our proposed rank average meta-analysis method to other meta-analysis methods including: (1) the rank products method [13], (2) the mDEDS method [14], (3) Choi et al.'s method of interstudy variability [10], (4) Wang et al.'s method of weighting differential expression by variance [11], and (5) a naive method that aggregates samples from multiple datasets. The rank products, mDEDS, Choi, and Wang methods can be applied to multiple datasets as well as to single datasets. For each method and each dataset group, we compute single-dataset performance, combined homogeneous-dataset performance (from two to four datasets combined), and combined heterogeneous-dataset performance (Figure 2(a)).


Robust microarray meta-analysis identifies differentially expressed genes for clinical prediction.

Phan JH, Young AN, Wang MD - ScientificWorldJournal (2012)

Procedure for comparing the predictive performance of six microarray meta-analysis-based FS methods. (a) Features are selected from microarray datasets using the rank average meta-analysis method (pink box), several other meta-analysis methods (orange boxes: mDEDS, rank products, Choi, and Wang), and a naive method (blue box) that aggregates samples into a larger dataset. Rank average meta-analysis chooses a single feature selection (FS) method from among several basic FS methods (SAM, fold change, rank sum, t-test, mRMRD, and mRMRQ) for each individual dataset that optimizes prediction performance (via cross-validation) over the top 20 features. A simple weighted average of gene ranks from all individual datasets produces the final set of rank average meta-analysis features. The rank products, Choi, and Wang methods use one basic FS method to select features from multiple datasets while the mDEDS method uses all six basic FS methods. (b) Features are selected from two or more datasets from each group to build a classifier (pink boxes), which is trained with samples from only one dataset (yellow boxes). The performance of the classifier is assessed using independent datasets (datasets not used for training or feature selection, green boxes). The predictive performance of a microarray meta-analysis-based FS method is an average over all permutations of training and validation datasets (blue boxes). In the example, datasets 1–4 consist of one-channel Affymetrix arrays while dataset 5 (in the case of heterogeneous data) consists of two-channel arrays.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3539384&req=5

fig2: Procedure for comparing the predictive performance of six microarray meta-analysis-based FS methods. (a) Features are selected from microarray datasets using the rank average meta-analysis method (pink box), several other meta-analysis methods (orange boxes: mDEDS, rank products, Choi, and Wang), and a naive method (blue box) that aggregates samples into a larger dataset. Rank average meta-analysis chooses a single feature selection (FS) method from among several basic FS methods (SAM, fold change, rank sum, t-test, mRMRD, and mRMRQ) for each individual dataset that optimizes prediction performance (via cross-validation) over the top 20 features. A simple weighted average of gene ranks from all individual datasets produces the final set of rank average meta-analysis features. The rank products, Choi, and Wang methods use one basic FS method to select features from multiple datasets while the mDEDS method uses all six basic FS methods. (b) Features are selected from two or more datasets from each group to build a classifier (pink boxes), which is trained with samples from only one dataset (yellow boxes). The performance of the classifier is assessed using independent datasets (datasets not used for training or feature selection, green boxes). The predictive performance of a microarray meta-analysis-based FS method is an average over all permutations of training and validation datasets (blue boxes). In the example, datasets 1–4 consist of one-channel Affymetrix arrays while dataset 5 (in the case of heterogeneous data) consists of two-channel arrays.
Mentions: We use classification performance to assess meta-analysis-based FS methods with the assumption that improved FS leads to higher prediction performance when classifying samples from an independent dataset. We assess prediction performance using independent training and testing datasets because of the small sample size of some of the datasets and because we want to reflect clinical scenarios in which predictive models would likely be derived from data collected from a separate batch of patients. We compare our proposed rank average meta-analysis method to other meta-analysis methods including: (1) the rank products method [13], (2) the mDEDS method [14], (3) Choi et al.'s method of interstudy variability [10], (4) Wang et al.'s method of weighting differential expression by variance [11], and (5) a naive method that aggregates samples from multiple datasets. The rank products, mDEDS, Choi, and Wang methods can be applied to multiple datasets as well as to single datasets. For each method and each dataset group, we compute single-dataset performance, combined homogeneous-dataset performance (from two to four datasets combined), and combined heterogeneous-dataset performance (Figure 2(a)).

Bottom Line: Combining multiple microarray datasets increases sample size and leads to improved reproducibility in identification of informative genes and subsequent clinical prediction.Although microarrays have increased the rate of genomic data collection, sample size is still a major issue when identifying informative genetic biomarkers.Results indicate that rank average meta-analysis consistently performs well compared to five other meta-analysis methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA.

ABSTRACT
Combining multiple microarray datasets increases sample size and leads to improved reproducibility in identification of informative genes and subsequent clinical prediction. Although microarrays have increased the rate of genomic data collection, sample size is still a major issue when identifying informative genetic biomarkers. Because of this, feature selection methods often suffer from false discoveries, resulting in poorly performing predictive models. We develop a simple meta-analysis-based feature selection method that captures the knowledge in each individual dataset and combines the results using a simple rank average. In a comprehensive study that measures robustness in terms of clinical application (i.e., breast, renal, and pancreatic cancer), microarray platform heterogeneity, and classifier (i.e., logistic regression, diagonal LDA, and linear SVM), we compare the rank average meta-analysis method to five other meta-analysis methods. Results indicate that rank average meta-analysis consistently performs well compared to five other meta-analysis methods.

Show MeSH
Related in: MedlinePlus