Limits...
Seq-ing improved gene expression estimates from microarrays using machine learning.

Korir PK, Geeleher P, Seoighe C - BMC Bioinformatics (2015)

Bottom Line: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale.This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible.This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated.

View Article: PubMed Central - PubMed

Affiliation: School of Biochemistry and Cell Biology, University College Cork, Western Road, Cork, Ireland. paul.korir@gmail.com.

ABSTRACT

Background: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale. Nevertheless, microarrays remain in widespread use, demonstrated by the ever-growing numbers of samples deposited in public repositories.

Results: We propose a novel approach to microarray analysis that attains many of the advantages of RNA-Seq. This method, called Machine Learning of Transcript Expression (MaLTE), leverages samples for which both microarray and RNA-Seq data are available, using a Random Forest to learn the relationship between the fluorescence intensity of sets of microarray probes and RNA-Seq transcript expression estimates. We trained MaLTE on data from the Genotype-Tissue Expression (GTEx) project, consisting of Affymetrix gene arrays and RNA-Seq from over 700 samples across a broad range of human tissues.

Conclusion: This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible. This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated.

No MeSH data available.


Cross-sample correlation with RNA-Seq. For each gene, the cross-sample correlation was determined between the gene expression values estimated from the microarrays using MaLTE, median-polish and PLIER. Density plots show the distribution across genes of (a) Pearson and (b) Spearman correlation coefficients. Mean and median values of the correlation coefficients are provided in parentheses next to the method name in the legend. Vertical lines show mean cross-sample correlation for MaLTE (solid), median-polish (dashed) and PLIER (dotted)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4559919&req=5

Fig3: Cross-sample correlation with RNA-Seq. For each gene, the cross-sample correlation was determined between the gene expression values estimated from the microarrays using MaLTE, median-polish and PLIER. Density plots show the distribution across genes of (a) Pearson and (b) Spearman correlation coefficients. Mean and median values of the correlation coefficients are provided in parentheses next to the method name in the legend. Vertical lines show mean cross-sample correlation for MaLTE (solid), median-polish (dashed) and PLIER (dotted)

Mentions: The correlation of microarray and RNA-Seq estimates of gene expression has been investigated previously in several studies [17, 32, 33]. Because not all genes vary substantially across samples, while within individual samples mRNA abundance ranges over several orders of magnitude [34], cross-sample correlations tend to be lower than within-sample correlations. MaLTE significantly outperformed median-polish and PLIER in cross-sample correlation (Fig. 3). For example, mean cross-sample Pearson correlation (), in test data was 0.76 for MaLTE compared to 0.72 for median-polish (p<1×10−322, from a Wilcoxon rank sum test of cross-sample correlations) and 0.68 for PLIER (Fig. 3a). Mean Spearman cross-sample correlations () obtained from MaLTE were also much higher (0.69 compared to 0.64 for median-polish and 0.61 for PLIER; Fig. 3b).Fig. 3


Seq-ing improved gene expression estimates from microarrays using machine learning.

Korir PK, Geeleher P, Seoighe C - BMC Bioinformatics (2015)

Cross-sample correlation with RNA-Seq. For each gene, the cross-sample correlation was determined between the gene expression values estimated from the microarrays using MaLTE, median-polish and PLIER. Density plots show the distribution across genes of (a) Pearson and (b) Spearman correlation coefficients. Mean and median values of the correlation coefficients are provided in parentheses next to the method name in the legend. Vertical lines show mean cross-sample correlation for MaLTE (solid), median-polish (dashed) and PLIER (dotted)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4559919&req=5

Fig3: Cross-sample correlation with RNA-Seq. For each gene, the cross-sample correlation was determined between the gene expression values estimated from the microarrays using MaLTE, median-polish and PLIER. Density plots show the distribution across genes of (a) Pearson and (b) Spearman correlation coefficients. Mean and median values of the correlation coefficients are provided in parentheses next to the method name in the legend. Vertical lines show mean cross-sample correlation for MaLTE (solid), median-polish (dashed) and PLIER (dotted)
Mentions: The correlation of microarray and RNA-Seq estimates of gene expression has been investigated previously in several studies [17, 32, 33]. Because not all genes vary substantially across samples, while within individual samples mRNA abundance ranges over several orders of magnitude [34], cross-sample correlations tend to be lower than within-sample correlations. MaLTE significantly outperformed median-polish and PLIER in cross-sample correlation (Fig. 3). For example, mean cross-sample Pearson correlation (), in test data was 0.76 for MaLTE compared to 0.72 for median-polish (p<1×10−322, from a Wilcoxon rank sum test of cross-sample correlations) and 0.68 for PLIER (Fig. 3a). Mean Spearman cross-sample correlations () obtained from MaLTE were also much higher (0.69 compared to 0.64 for median-polish and 0.61 for PLIER; Fig. 3b).Fig. 3

Bottom Line: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale.This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible.This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated.

View Article: PubMed Central - PubMed

Affiliation: School of Biochemistry and Cell Biology, University College Cork, Western Road, Cork, Ireland. paul.korir@gmail.com.

ABSTRACT

Background: Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale. Nevertheless, microarrays remain in widespread use, demonstrated by the ever-growing numbers of samples deposited in public repositories.

Results: We propose a novel approach to microarray analysis that attains many of the advantages of RNA-Seq. This method, called Machine Learning of Transcript Expression (MaLTE), leverages samples for which both microarray and RNA-Seq data are available, using a Random Forest to learn the relationship between the fluorescence intensity of sets of microarray probes and RNA-Seq transcript expression estimates. We trained MaLTE on data from the Genotype-Tissue Expression (GTEx) project, consisting of Affymetrix gene arrays and RNA-Seq from over 700 samples across a broad range of human tissues.

Conclusion: This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible. This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated.

No MeSH data available.