Limits...
Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses.

van Wieringen WN, Unger K, Leday GG, Krijgsman O, de Menezes RX, Ylstra B, van de Wiel MA - BMC Bioinformatics (2012)

Bottom Line: Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl

ABSTRACT

Background: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.

Results: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.

Conclusions: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.

Show MeSH
Each panel depicts the relation between the matched gene expression and DNA copy number data (as produced by one of the matching procedure) for the A_23_P168211 probe from the TCGA data set (with Agilent expression arrays). The red line is the best fitting piece-wise linear spline (as obtained from the method described in [30]). The pink area represent the 95% confidence intervals for the fitted relationship. The vertical dashed lines separate the samples with a normal from those with a gain, and those with a gain from those with an amplification.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475006&req=5

Figure 8: Each panel depicts the relation between the matched gene expression and DNA copy number data (as produced by one of the matching procedure) for the A_23_P168211 probe from the TCGA data set (with Agilent expression arrays). The red line is the best fitting piece-wise linear spline (as obtained from the method described in [30]). The pink area represent the 95% confidence intervals for the fitted relationship. The vertical dashed lines separate the samples with a normal from those with a gain, and those with a gain from those with an amplification.

Mentions: The consequences of choosing a matching procedure reveal themselves also in DNA copy number data, as matching procedures either select different features or utilize different ways of summarizing data from multiple features. The vast majority of genes have DNA copy number signatures that vary little to nothing between the matching procedures (Figure 7). As a result, the p-values and Spearman’s rank correlations differ too, but again little. Occasionally, however, there is a data point that is affected in a more serious manner by the choice of matching procedure. Figure 8 shows that the distanceAny method has one data point (indicated by the orange circle) that deviates from its counterpart in the other matched DNA copy number signatures. In this particular case, it is due to the large window size chosen, and the problem vanishes if the window size is decreased.


Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses.

van Wieringen WN, Unger K, Leday GG, Krijgsman O, de Menezes RX, Ylstra B, van de Wiel MA - BMC Bioinformatics (2012)

Each panel depicts the relation between the matched gene expression and DNA copy number data (as produced by one of the matching procedure) for the A_23_P168211 probe from the TCGA data set (with Agilent expression arrays). The red line is the best fitting piece-wise linear spline (as obtained from the method described in [30]). The pink area represent the 95% confidence intervals for the fitted relationship. The vertical dashed lines separate the samples with a normal from those with a gain, and those with a gain from those with an amplification.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475006&req=5

Figure 8: Each panel depicts the relation between the matched gene expression and DNA copy number data (as produced by one of the matching procedure) for the A_23_P168211 probe from the TCGA data set (with Agilent expression arrays). The red line is the best fitting piece-wise linear spline (as obtained from the method described in [30]). The pink area represent the 95% confidence intervals for the fitted relationship. The vertical dashed lines separate the samples with a normal from those with a gain, and those with a gain from those with an amplification.
Mentions: The consequences of choosing a matching procedure reveal themselves also in DNA copy number data, as matching procedures either select different features or utilize different ways of summarizing data from multiple features. The vast majority of genes have DNA copy number signatures that vary little to nothing between the matching procedures (Figure 7). As a result, the p-values and Spearman’s rank correlations differ too, but again little. Occasionally, however, there is a data point that is affected in a more serious manner by the choice of matching procedure. Figure 8 shows that the distanceAny method has one data point (indicated by the orange circle) that deviates from its counterpart in the other matched DNA copy number signatures. In this particular case, it is due to the large window size chosen, and the problem vanishes if the window size is decreased.

Bottom Line: Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl

ABSTRACT

Background: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.

Results: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.

Conclusions: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.

Show MeSH