Limits...
Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses.

van Wieringen WN, Unger K, Leday GG, Krijgsman O, de Menezes RX, Ylstra B, van de Wiel MA - BMC Bioinformatics (2012)

Bottom Line: Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl

ABSTRACT

Background: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.

Results: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.

Conclusions: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.

Show MeSH
OverlapAny matching: the DNA copy number data of features (mapping to the same chromosome as the gene) is averaged with weights proportional to their percentage of overlap with the gene’s sequence. In the top panel the percentage of overlap between the features’ and the gene’s are represented by the horizontal solid arrows. In the bottom panel, the features’ DNA copy number data are averaged, the width of the arrows reflect the weights (proportional to the percentage of overlap) of each feature’s contribution to the average.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475006&req=5

Figure 5: OverlapAny matching: the DNA copy number data of features (mapping to the same chromosome as the gene) is averaged with weights proportional to their percentage of overlap with the gene’s sequence. In the top panel the percentage of overlap between the features’ and the gene’s are represented by the horizontal solid arrows. In the bottom panel, the features’ DNA copy number data are averaged, the width of the arrows reflect the weights (proportional to the percentage of overlap) of each feature’s contribution to the average.

Mentions: A gene may span a genomic region that is interrogated by multiple DNA copy number features. The overlap matching procedure then chooses an arbitrary feature that has its DNA copy number data assigned to the gene. Potentially relevant information on the DNA copy number of the gene is then ignored. Following the distanceAny matching approach, the data of all features with some overlap to the gene’s sequence is taken into account (via a weighting scheme) by the overlapAny approach (Figure 5). Contrasting the distanceAny method, the weights are proportional to the features’ percentage of overlap. Table 5 describes the steps of the overlapAny algorithm.


Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses.

van Wieringen WN, Unger K, Leday GG, Krijgsman O, de Menezes RX, Ylstra B, van de Wiel MA - BMC Bioinformatics (2012)

OverlapAny matching: the DNA copy number data of features (mapping to the same chromosome as the gene) is averaged with weights proportional to their percentage of overlap with the gene’s sequence. In the top panel the percentage of overlap between the features’ and the gene’s are represented by the horizontal solid arrows. In the bottom panel, the features’ DNA copy number data are averaged, the width of the arrows reflect the weights (proportional to the percentage of overlap) of each feature’s contribution to the average.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475006&req=5

Figure 5: OverlapAny matching: the DNA copy number data of features (mapping to the same chromosome as the gene) is averaged with weights proportional to their percentage of overlap with the gene’s sequence. In the top panel the percentage of overlap between the features’ and the gene’s are represented by the horizontal solid arrows. In the bottom panel, the features’ DNA copy number data are averaged, the width of the arrows reflect the weights (proportional to the percentage of overlap) of each feature’s contribution to the average.
Mentions: A gene may span a genomic region that is interrogated by multiple DNA copy number features. The overlap matching procedure then chooses an arbitrary feature that has its DNA copy number data assigned to the gene. Potentially relevant information on the DNA copy number of the gene is then ignored. Following the distanceAny matching approach, the data of all features with some overlap to the gene’s sequence is taken into account (via a weighting scheme) by the overlapAny approach (Figure 5). Contrasting the distanceAny method, the weights are proportional to the features’ percentage of overlap. Table 5 describes the steps of the overlapAny algorithm.

Bottom Line: Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl

ABSTRACT

Background: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.

Results: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.

Conclusions: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.

Show MeSH