Limits...
Revisiting adverse effects of cross-hybridization in Affymetrix gene expression data: do they matter for correlation analysis?

Klebanov L, Chen L, Yakovlev A - Biol. Direct (2007)

Bottom Line: The authors of that paper came to the conclusion that the process of multiple targeting in short oligonucleotide microarrays induces spurious correlations and this effect may deteriorate the inference on correlation coefficients.The design of their study and supporting simulations cast serious doubt upon the validity of this conclusion.As the problem stands now, there is no compelling reason to believe that multiple targeting causes a large-scale effect on the correlation structure of Affymetrix gene expression data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Avenue, Rochester, Box 630, New York 14642, USA. levkleb@yahoo.com

ABSTRACT

Background: This work was undertaken in response to a recently published paper by Okoniewski and Miller (BMC Bioinformatics 2006, 7: Article 276). The authors of that paper came to the conclusion that the process of multiple targeting in short oligonucleotide microarrays induces spurious correlations and this effect may deteriorate the inference on correlation coefficients. The design of their study and supporting simulations cast serious doubt upon the validity of this conclusion. The work by Okoniewski and Miller drove us to revisit the issue by means of experimentation with biological data and probabilistic modeling of cross-hybridization effects.

Results: We have identified two serious flaws in the study by Okoniewski and Miller: (1) The data used in their paper are not amenable to correlation analysis; (2) The proposed simulation model is inadequate for studying the effects of cross-hybridization. Using two other data sets, we have shown that removing multiply targeted probe sets does not lead to a shift in the histogram of sample correlation coefficients towards smaller values. A more realistic approach to mathematical modeling of cross-hybridization demonstrates that this process is by far more complex than the simplistic model considered by the authors. A diversity of correlation effects (such as the induction of positive or negative correlations) caused by cross-hybridization can be expected in theory but there are natural limitations on the ability to provide quantitative insights into such effects due to the fact that they are not directly observable.

Conclusion: The proposed stochastic model is instrumental in studying general regularities in hybridization interaction between probe sets in microarray data. As the problem stands now, there is no compelling reason to believe that multiple targeting causes a large-scale effect on the correlation structure of Affymetrix gene expression data. Our analysis suggests that the observed long-range correlations in microarray data are of a biological nature rather than a technological flaw.

Show MeSH

Related in: MedlinePlus

Variation coefficients for expression levels of miRNAs in SKBr3 breast cancer cells.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2211459&req=5

Figure 5: Variation coefficients for expression levels of miRNAs in SKBr3 breast cancer cells.

Mentions: Remark 2. Regulatory interactions between genes participating in biochemical pathways or networks may (or may not) cause only a short-range correlation or the so-called clumpy dependence [21]. Superimposed on this causal dependence are the effects caused by different species of noncoding RNA implicated in regulation of large sets of genes. The global term "noncoding RNA" (ncRNA) refers to a large class of transcripts that do not encode a protein product. An important subclass of functional ncRNAs is represented by microRNAs (miRNAs). These small, typically 21-25nt long, transcripts have been subject of intense studies in recent years. Using either miRNA transfection into cultured cells [22] or miRNA antagonists in vivo [23], it has been shown that a particular miRNA may affect hundreds of genes by interfering with their transcripts. This is a large-scale effect but it is still doubtful whether the observed long-range correlation between gene expression levels, involving thousands of genes, can be exhaustively explained by this mechanism. The two major modes of miRNA action are mRNA cleavage and translational inhibition. In the latter case, all untranslated mRNAs are eventually fated to degradation as well. While there is some similarity between such effects and those of cross-hybridization (binding to a common transcript), they call for a different stochastic model that would allow for the cognate mRNA degradation. Nevertheless, it is interesting to see whether the expression levels of miRNA are subject to a much higher variation than those mRNAs presented in Figure 4. Figure 5 displays variation coefficients of expression levels for different miRNA in SKBr3 breast cancer cells (untreated controls, n = 38) produced by spotted oligonucleotide microarrays. The data were retrieved from the Gene Expression Omnibus Database (see [24], GSE3798). It is clear that some classes of miRNA are much more variable than any of the protein-coding mRNAs in Figure 4, despite the fact that the inter-sample variability is generally expected to be lower in vitro than in vivo. This suggests that, when trying to explain the nature of long-range correlations in gene expression data involving gigantic sets of genes, one should look more closely at the "dark matter" of ncRNAs and confounding effects caused by heterogeneity of biological tissues or/and subjects [13,14,25] rather than at technical flaws of microarray technology.


Revisiting adverse effects of cross-hybridization in Affymetrix gene expression data: do they matter for correlation analysis?

Klebanov L, Chen L, Yakovlev A - Biol. Direct (2007)

Variation coefficients for expression levels of miRNAs in SKBr3 breast cancer cells.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2211459&req=5

Figure 5: Variation coefficients for expression levels of miRNAs in SKBr3 breast cancer cells.
Mentions: Remark 2. Regulatory interactions between genes participating in biochemical pathways or networks may (or may not) cause only a short-range correlation or the so-called clumpy dependence [21]. Superimposed on this causal dependence are the effects caused by different species of noncoding RNA implicated in regulation of large sets of genes. The global term "noncoding RNA" (ncRNA) refers to a large class of transcripts that do not encode a protein product. An important subclass of functional ncRNAs is represented by microRNAs (miRNAs). These small, typically 21-25nt long, transcripts have been subject of intense studies in recent years. Using either miRNA transfection into cultured cells [22] or miRNA antagonists in vivo [23], it has been shown that a particular miRNA may affect hundreds of genes by interfering with their transcripts. This is a large-scale effect but it is still doubtful whether the observed long-range correlation between gene expression levels, involving thousands of genes, can be exhaustively explained by this mechanism. The two major modes of miRNA action are mRNA cleavage and translational inhibition. In the latter case, all untranslated mRNAs are eventually fated to degradation as well. While there is some similarity between such effects and those of cross-hybridization (binding to a common transcript), they call for a different stochastic model that would allow for the cognate mRNA degradation. Nevertheless, it is interesting to see whether the expression levels of miRNA are subject to a much higher variation than those mRNAs presented in Figure 4. Figure 5 displays variation coefficients of expression levels for different miRNA in SKBr3 breast cancer cells (untreated controls, n = 38) produced by spotted oligonucleotide microarrays. The data were retrieved from the Gene Expression Omnibus Database (see [24], GSE3798). It is clear that some classes of miRNA are much more variable than any of the protein-coding mRNAs in Figure 4, despite the fact that the inter-sample variability is generally expected to be lower in vitro than in vivo. This suggests that, when trying to explain the nature of long-range correlations in gene expression data involving gigantic sets of genes, one should look more closely at the "dark matter" of ncRNAs and confounding effects caused by heterogeneity of biological tissues or/and subjects [13,14,25] rather than at technical flaws of microarray technology.

Bottom Line: The authors of that paper came to the conclusion that the process of multiple targeting in short oligonucleotide microarrays induces spurious correlations and this effect may deteriorate the inference on correlation coefficients.The design of their study and supporting simulations cast serious doubt upon the validity of this conclusion.As the problem stands now, there is no compelling reason to believe that multiple targeting causes a large-scale effect on the correlation structure of Affymetrix gene expression data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Avenue, Rochester, Box 630, New York 14642, USA. levkleb@yahoo.com

ABSTRACT

Background: This work was undertaken in response to a recently published paper by Okoniewski and Miller (BMC Bioinformatics 2006, 7: Article 276). The authors of that paper came to the conclusion that the process of multiple targeting in short oligonucleotide microarrays induces spurious correlations and this effect may deteriorate the inference on correlation coefficients. The design of their study and supporting simulations cast serious doubt upon the validity of this conclusion. The work by Okoniewski and Miller drove us to revisit the issue by means of experimentation with biological data and probabilistic modeling of cross-hybridization effects.

Results: We have identified two serious flaws in the study by Okoniewski and Miller: (1) The data used in their paper are not amenable to correlation analysis; (2) The proposed simulation model is inadequate for studying the effects of cross-hybridization. Using two other data sets, we have shown that removing multiply targeted probe sets does not lead to a shift in the histogram of sample correlation coefficients towards smaller values. A more realistic approach to mathematical modeling of cross-hybridization demonstrates that this process is by far more complex than the simplistic model considered by the authors. A diversity of correlation effects (such as the induction of positive or negative correlations) caused by cross-hybridization can be expected in theory but there are natural limitations on the ability to provide quantitative insights into such effects due to the fact that they are not directly observable.

Conclusion: The proposed stochastic model is instrumental in studying general regularities in hybridization interaction between probe sets in microarray data. As the problem stands now, there is no compelling reason to believe that multiple targeting causes a large-scale effect on the correlation structure of Affymetrix gene expression data. Our analysis suggests that the observed long-range correlations in microarray data are of a biological nature rather than a technological flaw.

Show MeSH
Related in: MedlinePlus