Limits...
Microarrays, deep sequencing and the true measure of the transcriptome.

Malone JH, Oliver B - BMC Biol. (2011)

Bottom Line: Microarrays first made the analysis of the transcriptome possible, and have produced much important information.Today, however, researchers are increasingly turning to direct high-throughput sequencing -- RNA-Seq -- which has considerable advantages for examining transcriptome fine structure -- for example in the detection of allele-specific expression and splice junctions.We conclude that microarrays remain useful and accurate tools for measuring expression levels, and RNA-Seq complements and extends microarray measurements.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Cellular and Developmental Biology, National Institute of Digestive, Diabetes, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA. malonej@niddk.nih.gov

ABSTRACT
Microarrays first made the analysis of the transcriptome possible, and have produced much important information. Today, however, researchers are increasingly turning to direct high-throughput sequencing -- RNA-Seq -- which has considerable advantages for examining transcriptome fine structure -- for example in the detection of allele-specific expression and splice junctions. In this article, we discuss the relative merits of the two techniques, the inherent biases in each, and whether all of the vast body of array work needs to be revisited using the newer technology. We conclude that microarrays remain useful and accurate tools for measuring expression levels, and RNA-Seq complements and extends microarray measurements.

Show MeSH

Related in: MedlinePlus

Comparison of array and RNA-Seq data for measuring differential gene expression in the heads of male and female D. pseudoobscura. (a) Results for female heads; (b) results for male heads. We used custom designed Nimblegen arrays to an early release of the D. pseudoobscura annotation. This array consists of 50-mer probes selected without bias to gene position, and with an average of 10 probes per gene model. A full description of this array platform can be found in the GEO under platform number GPL4631. Robust Multi-array Averaging (RMA) [50] was used to normalize array experiments and normalization improves the correlation between arrays and sequencing results. A full description of the analysis and all sequencing data can be found in [51]. Colored circles are genes identified as differentially expressed between females and males by microarray analysis with four biological replicates. In this case, one of the four biological replicates was prepared for sequencing by fragmenting RNA using alkaline hydrolysis and constructing a cDNA library for sequencing. For these analyses, we generated about 6 million 36 base pair reads from the Illumina GA I platform and the number of reads per kilobase per million mapped reads (RPKM) was calculated by counting the number of unique mapping reads from the default Illumina mapper (ELAND but the same pattern holds for Bowtie) to the same coding sequence models that were used for constructing probes for the microarray. The correlation between fluorescence intensity as a surrogate for gene expression and the RPKM metric as obtained by mRNA-Seq is high (Pearson's r = 0.90-0.91; Spearman's rho = 0.90-0.91) and slightly higher for just the genes identified as differentially expressed by microarrays (Pearson's r = 0.89-0.92; Spearman's rho = 0.90-0.94). In the case of fold change (c) measurements (female/male), the congruence is reasonable for the entire data set (Pearson's r = 0.62; Spearman's rho = 0.54) but high in the case of the fold change measurements for the genes with sex-biased expression (Pearson's r = 0.92; Spearman's rho = 0.89).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3104486&req=5

Figure 3: Comparison of array and RNA-Seq data for measuring differential gene expression in the heads of male and female D. pseudoobscura. (a) Results for female heads; (b) results for male heads. We used custom designed Nimblegen arrays to an early release of the D. pseudoobscura annotation. This array consists of 50-mer probes selected without bias to gene position, and with an average of 10 probes per gene model. A full description of this array platform can be found in the GEO under platform number GPL4631. Robust Multi-array Averaging (RMA) [50] was used to normalize array experiments and normalization improves the correlation between arrays and sequencing results. A full description of the analysis and all sequencing data can be found in [51]. Colored circles are genes identified as differentially expressed between females and males by microarray analysis with four biological replicates. In this case, one of the four biological replicates was prepared for sequencing by fragmenting RNA using alkaline hydrolysis and constructing a cDNA library for sequencing. For these analyses, we generated about 6 million 36 base pair reads from the Illumina GA I platform and the number of reads per kilobase per million mapped reads (RPKM) was calculated by counting the number of unique mapping reads from the default Illumina mapper (ELAND but the same pattern holds for Bowtie) to the same coding sequence models that were used for constructing probes for the microarray. The correlation between fluorescence intensity as a surrogate for gene expression and the RPKM metric as obtained by mRNA-Seq is high (Pearson's r = 0.90-0.91; Spearman's rho = 0.90-0.91) and slightly higher for just the genes identified as differentially expressed by microarrays (Pearson's r = 0.89-0.92; Spearman's rho = 0.90-0.94). In the case of fold change (c) measurements (female/male), the congruence is reasonable for the entire data set (Pearson's r = 0.62; Spearman's rho = 0.54) but high in the case of the fold change measurements for the genes with sex-biased expression (Pearson's r = 0.92; Spearman's rho = 0.89).

Mentions: A key first question is whether, when used to ask exactly the same question, both techniques give the same answer. Comparing expression metrics from array intensities to RNA-Seq density shows a strong congruence (Figure 3). The relationship is not quite linear, as there appears to be a slight compression in the array data at the high end, but the vast majority of the expression values are similar between the methods. Scatter increases at low expression, which is not surprising, as background correction methods for arrays are complicated when signal levels approach noise levels. Similarly, RNA-Seq is a sampling method and stochastic events become a source of error in the quantification of rare transcripts [47]. There is, however, one consistent difference in our comparisons in Drosophila. There is a large range of expression values at the low end on arrays that that are undetectable by RNA-Seq. We cannot explain this difference, but whatever the cause, it does not affect the measurement of differential expression at expression levels that are detectable by RNA-Seq (Figure 3). In our experiment, we used biological replicate samples for the arrays and applied moderated t-tests to detect those genes that were differentially expressed between females and males. In the analysis in Figure 3, our goal was to compare expression measurements between the platforms. The genes showing sex-biased expression (red and blue dots in Figure 3) are in outstanding agreement between microarrays and RNA-Seq. We have observed similar congruence in the extremely deep RNA-Seq data in modENCODE D. melanogaster female and male samples [37]. Annotated sex-biased genes based on the extensive array-based literature [48] and the deeply sequenced modENCODE samples report the same biology.


Microarrays, deep sequencing and the true measure of the transcriptome.

Malone JH, Oliver B - BMC Biol. (2011)

Comparison of array and RNA-Seq data for measuring differential gene expression in the heads of male and female D. pseudoobscura. (a) Results for female heads; (b) results for male heads. We used custom designed Nimblegen arrays to an early release of the D. pseudoobscura annotation. This array consists of 50-mer probes selected without bias to gene position, and with an average of 10 probes per gene model. A full description of this array platform can be found in the GEO under platform number GPL4631. Robust Multi-array Averaging (RMA) [50] was used to normalize array experiments and normalization improves the correlation between arrays and sequencing results. A full description of the analysis and all sequencing data can be found in [51]. Colored circles are genes identified as differentially expressed between females and males by microarray analysis with four biological replicates. In this case, one of the four biological replicates was prepared for sequencing by fragmenting RNA using alkaline hydrolysis and constructing a cDNA library for sequencing. For these analyses, we generated about 6 million 36 base pair reads from the Illumina GA I platform and the number of reads per kilobase per million mapped reads (RPKM) was calculated by counting the number of unique mapping reads from the default Illumina mapper (ELAND but the same pattern holds for Bowtie) to the same coding sequence models that were used for constructing probes for the microarray. The correlation between fluorescence intensity as a surrogate for gene expression and the RPKM metric as obtained by mRNA-Seq is high (Pearson's r = 0.90-0.91; Spearman's rho = 0.90-0.91) and slightly higher for just the genes identified as differentially expressed by microarrays (Pearson's r = 0.89-0.92; Spearman's rho = 0.90-0.94). In the case of fold change (c) measurements (female/male), the congruence is reasonable for the entire data set (Pearson's r = 0.62; Spearman's rho = 0.54) but high in the case of the fold change measurements for the genes with sex-biased expression (Pearson's r = 0.92; Spearman's rho = 0.89).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3104486&req=5

Figure 3: Comparison of array and RNA-Seq data for measuring differential gene expression in the heads of male and female D. pseudoobscura. (a) Results for female heads; (b) results for male heads. We used custom designed Nimblegen arrays to an early release of the D. pseudoobscura annotation. This array consists of 50-mer probes selected without bias to gene position, and with an average of 10 probes per gene model. A full description of this array platform can be found in the GEO under platform number GPL4631. Robust Multi-array Averaging (RMA) [50] was used to normalize array experiments and normalization improves the correlation between arrays and sequencing results. A full description of the analysis and all sequencing data can be found in [51]. Colored circles are genes identified as differentially expressed between females and males by microarray analysis with four biological replicates. In this case, one of the four biological replicates was prepared for sequencing by fragmenting RNA using alkaline hydrolysis and constructing a cDNA library for sequencing. For these analyses, we generated about 6 million 36 base pair reads from the Illumina GA I platform and the number of reads per kilobase per million mapped reads (RPKM) was calculated by counting the number of unique mapping reads from the default Illumina mapper (ELAND but the same pattern holds for Bowtie) to the same coding sequence models that were used for constructing probes for the microarray. The correlation between fluorescence intensity as a surrogate for gene expression and the RPKM metric as obtained by mRNA-Seq is high (Pearson's r = 0.90-0.91; Spearman's rho = 0.90-0.91) and slightly higher for just the genes identified as differentially expressed by microarrays (Pearson's r = 0.89-0.92; Spearman's rho = 0.90-0.94). In the case of fold change (c) measurements (female/male), the congruence is reasonable for the entire data set (Pearson's r = 0.62; Spearman's rho = 0.54) but high in the case of the fold change measurements for the genes with sex-biased expression (Pearson's r = 0.92; Spearman's rho = 0.89).
Mentions: A key first question is whether, when used to ask exactly the same question, both techniques give the same answer. Comparing expression metrics from array intensities to RNA-Seq density shows a strong congruence (Figure 3). The relationship is not quite linear, as there appears to be a slight compression in the array data at the high end, but the vast majority of the expression values are similar between the methods. Scatter increases at low expression, which is not surprising, as background correction methods for arrays are complicated when signal levels approach noise levels. Similarly, RNA-Seq is a sampling method and stochastic events become a source of error in the quantification of rare transcripts [47]. There is, however, one consistent difference in our comparisons in Drosophila. There is a large range of expression values at the low end on arrays that that are undetectable by RNA-Seq. We cannot explain this difference, but whatever the cause, it does not affect the measurement of differential expression at expression levels that are detectable by RNA-Seq (Figure 3). In our experiment, we used biological replicate samples for the arrays and applied moderated t-tests to detect those genes that were differentially expressed between females and males. In the analysis in Figure 3, our goal was to compare expression measurements between the platforms. The genes showing sex-biased expression (red and blue dots in Figure 3) are in outstanding agreement between microarrays and RNA-Seq. We have observed similar congruence in the extremely deep RNA-Seq data in modENCODE D. melanogaster female and male samples [37]. Annotated sex-biased genes based on the extensive array-based literature [48] and the deeply sequenced modENCODE samples report the same biology.

Bottom Line: Microarrays first made the analysis of the transcriptome possible, and have produced much important information.Today, however, researchers are increasingly turning to direct high-throughput sequencing -- RNA-Seq -- which has considerable advantages for examining transcriptome fine structure -- for example in the detection of allele-specific expression and splice junctions.We conclude that microarrays remain useful and accurate tools for measuring expression levels, and RNA-Seq complements and extends microarray measurements.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Cellular and Developmental Biology, National Institute of Digestive, Diabetes, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA. malonej@niddk.nih.gov

ABSTRACT
Microarrays first made the analysis of the transcriptome possible, and have produced much important information. Today, however, researchers are increasingly turning to direct high-throughput sequencing -- RNA-Seq -- which has considerable advantages for examining transcriptome fine structure -- for example in the detection of allele-specific expression and splice junctions. In this article, we discuss the relative merits of the two techniques, the inherent biases in each, and whether all of the vast body of array work needs to be revisited using the newer technology. We conclude that microarrays remain useful and accurate tools for measuring expression levels, and RNA-Seq complements and extends microarray measurements.

Show MeSH
Related in: MedlinePlus