Limits...
Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses.

van Wieringen WN, Unger K, Leday GG, Krijgsman O, de Menezes RX, Ylstra B, van de Wiel MA - BMC Bioinformatics (2012)

Bottom Line: Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl

ABSTRACT

Background: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.

Results: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.

Conclusions: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.

Show MeSH
Distance Any matching: the DNA copy number data of features (mapping to the same chromosome as the gene) is averaged with weights reciprocal to their distance to the gene. In the top panel the distances between the features’ and the gene’s midpoints are represented by the horizontal solid arrows. In the bottom panel, the features’ DNA copy number data are averaged, the width of the arrows reflect the weights (reciprocal to the distances) of each feature’s contribution to the average.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475006&req=5

Figure 3: Distance Any matching: the DNA copy number data of features (mapping to the same chromosome as the gene) is averaged with weights reciprocal to their distance to the gene. In the top panel the distances between the features’ and the gene’s midpoints are represented by the horizontal solid arrows. In the bottom panel, the features’ DNA copy number data are averaged, the width of the arrows reflect the weights (reciprocal to the distances) of each feature’s contribution to the average.

Mentions: Several features may be comparably close to the gene to be matched. The distance matching procedure however works in accordance to the winner-takes-all principle: the closest, even though only marginally closer than the runner-up, is assigned to the gene. This resembles the philosophy of a greedy algorithm. A more democratic approach would allow all features (not only one) to contribute, possibly in various degrees, to the matching (Figure 3). The distanceAny approach does exactly this, and takes into account the runners-up. Hereto, the distanceAny matching procedure assigns a weighted average of the DNA copy number features to the gene. When running over the genome, this is similar in spirit to a moving average. Weights may be chosen reciprocal to the distance, and possibly be limited to a neighborhood of the gene. The details of the distanceAny algorithm are contained in Table 3.


Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses.

van Wieringen WN, Unger K, Leday GG, Krijgsman O, de Menezes RX, Ylstra B, van de Wiel MA - BMC Bioinformatics (2012)

Distance Any matching: the DNA copy number data of features (mapping to the same chromosome as the gene) is averaged with weights reciprocal to their distance to the gene. In the top panel the distances between the features’ and the gene’s midpoints are represented by the horizontal solid arrows. In the bottom panel, the features’ DNA copy number data are averaged, the width of the arrows reflect the weights (reciprocal to the distances) of each feature’s contribution to the average.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475006&req=5

Figure 3: Distance Any matching: the DNA copy number data of features (mapping to the same chromosome as the gene) is averaged with weights reciprocal to their distance to the gene. In the top panel the distances between the features’ and the gene’s midpoints are represented by the horizontal solid arrows. In the bottom panel, the features’ DNA copy number data are averaged, the width of the arrows reflect the weights (reciprocal to the distances) of each feature’s contribution to the average.
Mentions: Several features may be comparably close to the gene to be matched. The distance matching procedure however works in accordance to the winner-takes-all principle: the closest, even though only marginally closer than the runner-up, is assigned to the gene. This resembles the philosophy of a greedy algorithm. A more democratic approach would allow all features (not only one) to contribute, possibly in various degrees, to the matching (Figure 3). The distanceAny approach does exactly this, and takes into account the runners-up. Hereto, the distanceAny matching procedure assigns a weighted average of the DNA copy number features to the gene. When running over the genome, this is similar in spirit to a moving average. Weights may be chosen reciprocal to the distance, and possibly be limited to a neighborhood of the gene. The details of the distanceAny algorithm are contained in Table 3.

Bottom Line: Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands. w.vanwieringen@vumc.nl

ABSTRACT

Background: An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms.

Results: We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes.

Conclusions: Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.

Show MeSH