Limits...
Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses.

van Wieringen WN, Unger K, Leday GG, Krijgsman O, de Menezes RX, Ylstra B, van de Wiel MA - BMC Bioinformatics (2012)

Bottom Line: Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl

ABSTRACT

Background: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.

Results: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.

Conclusions: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.

Show MeSH
OverlapPlus matching: after the overlap approach, the DNA copy number of unmatched genes is ‘estimated’ by interpolation of the gene dosage of the closest array CGH features. In both panels no feature overlaps with the gene. In the top panel the overlapPlus approach would interpolate the DNA copy number data between feature j-1 and j, as there is no breakpoint between them. In the bottom panel, however, the features j-1 and j are separated by a breakpoint, and the overlapPlus procedure will not interpolate.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475006&req=5

Figure 6: OverlapPlus matching: after the overlap approach, the DNA copy number of unmatched genes is ‘estimated’ by interpolation of the gene dosage of the closest array CGH features. In both panels no feature overlaps with the gene. In the top panel the overlapPlus approach would interpolate the DNA copy number data between feature j-1 and j, as there is no breakpoint between them. In the bottom panel, however, the features j-1 and j are separated by a breakpoint, and the overlapPlus procedure will not interpolate.

Mentions: As the name suggests the overlapPlus matching procedure extends the overlap approach. Hereto overlapPlus alters the objective of feature matching. No longer are features of both platforms to be matched. Instead the new aim is to assign to each gene on the expression array the correct corresponding DNA copy number. This is achieved by first applying the overlap matching procedure. Then, DNA copy number information is interpolated to genomic areas not covered by the DNA copy number platform in order to assign to genes that map to these uncovered regions an “estimate” of their gene dosage (Figure 6). The interpolation is warranted by the discrete nature of the underlying biological phenomenon. This interpolation principle has (among others) been proposed by Autio et al. [13]. Table 6 details the steps of the overlapPlus algorithm.


Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses.

van Wieringen WN, Unger K, Leday GG, Krijgsman O, de Menezes RX, Ylstra B, van de Wiel MA - BMC Bioinformatics (2012)

OverlapPlus matching: after the overlap approach, the DNA copy number of unmatched genes is ‘estimated’ by interpolation of the gene dosage of the closest array CGH features. In both panels no feature overlaps with the gene. In the top panel the overlapPlus approach would interpolate the DNA copy number data between feature j-1 and j, as there is no breakpoint between them. In the bottom panel, however, the features j-1 and j are separated by a breakpoint, and the overlapPlus procedure will not interpolate.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475006&req=5

Figure 6: OverlapPlus matching: after the overlap approach, the DNA copy number of unmatched genes is ‘estimated’ by interpolation of the gene dosage of the closest array CGH features. In both panels no feature overlaps with the gene. In the top panel the overlapPlus approach would interpolate the DNA copy number data between feature j-1 and j, as there is no breakpoint between them. In the bottom panel, however, the features j-1 and j are separated by a breakpoint, and the overlapPlus procedure will not interpolate.
Mentions: As the name suggests the overlapPlus matching procedure extends the overlap approach. Hereto overlapPlus alters the objective of feature matching. No longer are features of both platforms to be matched. Instead the new aim is to assign to each gene on the expression array the correct corresponding DNA copy number. This is achieved by first applying the overlap matching procedure. Then, DNA copy number information is interpolated to genomic areas not covered by the DNA copy number platform in order to assign to genes that map to these uncovered regions an “estimate” of their gene dosage (Figure 6). The interpolation is warranted by the discrete nature of the underlying biological phenomenon. This interpolation principle has (among others) been proposed by Autio et al. [13]. Table 6 details the steps of the overlapPlus algorithm.

Bottom Line: Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl

ABSTRACT

Background: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.

Results: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.

Conclusions: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.

Show MeSH