Limits...
Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis.

Neerincx PB, Casel P, Prickett D, Nie H, Watson M, Leunissen JA, Groenen MA, Klopp C - BMC Proc (2009)

Bottom Line: In this manuscript we compare their annotation strategies and results.The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms.This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Bioinformatics, Wageningen University and Research centre (WUR), P,O, Box 569, 6700 AN Wageningen, The Netherlands. pieter.neerincx@gmail.com

ABSTRACT

Background: Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies.

Results: IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines.For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms.

Conclusion: In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.

No MeSH data available.


Related in: MedlinePlus

Ensembl Annotation Assigned to Oligos: Coverage & Consensus. Venn diagram representing oligos linked to Ensembl gene IDs by the 3 annotation pipelines (A). Colours represent oligos linked to at least one Ensembl gene by all 3 pipelines (417:black), not linked to any Ensembl genes by any of the 3 pipelines (245:white), linked to at least one Ensembl gene only by IMAD (26:red), only by OligoRAP (2:blue), only by sigReannot (13:yellow), by IMAD & OligoRAP (3:purple), OligoRAP & sigReannot (6:green) or by IMAD & sigReannot (79:orange). When an oligo is linked to at least one Ensembl gene by all 3 pipelines this not necessarily means it is linked to the same Ensembl genes, which is depicted as consensus in a pie chart (B). Agreement between all 3 pipelines is subdivided in agreement on the presence or on the absence of links to Ensembl genes. Where only 2 pipelines agree this is not subdivided and hence represents a mix of consensus on presence or absence of annotation. Pipeline's initials indicate the corresponding pipelines share consensus; a dash instead of an initial indicates the corresponding pipelines lack consensus. Reasons for a lack of consensus are sorted by impact (C) and were counted per oligo: ++ = extra hits were found because of this reason, -- = hits were missing because of this reason. If an oligo had multiple hits, multiple reasons can apply, but multiple occurrences of the same reason were counted only once.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2712739&req=5

Figure 4: Ensembl Annotation Assigned to Oligos: Coverage & Consensus. Venn diagram representing oligos linked to Ensembl gene IDs by the 3 annotation pipelines (A). Colours represent oligos linked to at least one Ensembl gene by all 3 pipelines (417:black), not linked to any Ensembl genes by any of the 3 pipelines (245:white), linked to at least one Ensembl gene only by IMAD (26:red), only by OligoRAP (2:blue), only by sigReannot (13:yellow), by IMAD & OligoRAP (3:purple), OligoRAP & sigReannot (6:green) or by IMAD & sigReannot (79:orange). When an oligo is linked to at least one Ensembl gene by all 3 pipelines this not necessarily means it is linked to the same Ensembl genes, which is depicted as consensus in a pie chart (B). Agreement between all 3 pipelines is subdivided in agreement on the presence or on the absence of links to Ensembl genes. Where only 2 pipelines agree this is not subdivided and hence represents a mix of consensus on presence or absence of annotation. Pipeline's initials indicate the corresponding pipelines share consensus; a dash instead of an initial indicates the corresponding pipelines lack consensus. Reasons for a lack of consensus are sorted by impact (C) and were counted per oligo: ++ = extra hits were found because of this reason, -- = hits were missing because of this reason. If an oligo had multiple hits, multiple reasons can apply, but multiple occurrences of the same reason were counted only once.

Mentions: A subset of 791 oligos was selected from the experimental data provided for the workshop to assess the effect of the different annotation strategies on coverage. These oligos were selected, because they showed differential gene expression signals. Hence these probes clearly bind transcripts and any orphan oligos in the updated annotation produced by sigReannot, OligoRAP and IMAD indicate false negatives due to incomplete data sources, incomplete annotation strategies or both. The focus for this comparison is on Ensembl gene ID assignments as all three pipelines provide these and hence they can be easily compared. Figure 4A shows a Venn diagram representing the amount of oligos covered with at least one Ensembl gene ID versus probes without any links to Ensembl genes. Slightly more than half (52.7%) of the oligos is linked to at least one Ensembl gene by all three pipelines. Unfortunately with 31.0% the second largest group consists of the oligos, which could not be linked to any Ensembl gene by any pipeline. Although IMAD and OligoRAP can fetch annotation for additional sources to boost annotation coverage, this tends to be less informative, because – apart from assembly gaps – had there been a lot of high quality annotation available for a hit, this would have resulted in an Ensembl gene model. When there was not enough convincing experimental evidence for an Ensembl gene model this often means the hit is only covered by just a few or even a single EST. For the remaining 16.3% of the oligos the coverage differs. The bulk of these (10.0%) are represent oligos linked to Ensembl only by IMAD+sigReannot. Probes linked to Ensembl only by IMAD or only by sigReannot correspond to 3.3% and 1.6%, respectively. Finally, OligoRAP appears to be the most conservative in linking oligos to Ensembl genes as the amounts of oligos linked to Ensembl only by OligoRAP, only by IMAD+OligoRAP and only by OligoRAP+sigReannot are all less than 1%.


Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis.

Neerincx PB, Casel P, Prickett D, Nie H, Watson M, Leunissen JA, Groenen MA, Klopp C - BMC Proc (2009)

Ensembl Annotation Assigned to Oligos: Coverage & Consensus. Venn diagram representing oligos linked to Ensembl gene IDs by the 3 annotation pipelines (A). Colours represent oligos linked to at least one Ensembl gene by all 3 pipelines (417:black), not linked to any Ensembl genes by any of the 3 pipelines (245:white), linked to at least one Ensembl gene only by IMAD (26:red), only by OligoRAP (2:blue), only by sigReannot (13:yellow), by IMAD & OligoRAP (3:purple), OligoRAP & sigReannot (6:green) or by IMAD & sigReannot (79:orange). When an oligo is linked to at least one Ensembl gene by all 3 pipelines this not necessarily means it is linked to the same Ensembl genes, which is depicted as consensus in a pie chart (B). Agreement between all 3 pipelines is subdivided in agreement on the presence or on the absence of links to Ensembl genes. Where only 2 pipelines agree this is not subdivided and hence represents a mix of consensus on presence or absence of annotation. Pipeline's initials indicate the corresponding pipelines share consensus; a dash instead of an initial indicates the corresponding pipelines lack consensus. Reasons for a lack of consensus are sorted by impact (C) and were counted per oligo: ++ = extra hits were found because of this reason, -- = hits were missing because of this reason. If an oligo had multiple hits, multiple reasons can apply, but multiple occurrences of the same reason were counted only once.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2712739&req=5

Figure 4: Ensembl Annotation Assigned to Oligos: Coverage & Consensus. Venn diagram representing oligos linked to Ensembl gene IDs by the 3 annotation pipelines (A). Colours represent oligos linked to at least one Ensembl gene by all 3 pipelines (417:black), not linked to any Ensembl genes by any of the 3 pipelines (245:white), linked to at least one Ensembl gene only by IMAD (26:red), only by OligoRAP (2:blue), only by sigReannot (13:yellow), by IMAD & OligoRAP (3:purple), OligoRAP & sigReannot (6:green) or by IMAD & sigReannot (79:orange). When an oligo is linked to at least one Ensembl gene by all 3 pipelines this not necessarily means it is linked to the same Ensembl genes, which is depicted as consensus in a pie chart (B). Agreement between all 3 pipelines is subdivided in agreement on the presence or on the absence of links to Ensembl genes. Where only 2 pipelines agree this is not subdivided and hence represents a mix of consensus on presence or absence of annotation. Pipeline's initials indicate the corresponding pipelines share consensus; a dash instead of an initial indicates the corresponding pipelines lack consensus. Reasons for a lack of consensus are sorted by impact (C) and were counted per oligo: ++ = extra hits were found because of this reason, -- = hits were missing because of this reason. If an oligo had multiple hits, multiple reasons can apply, but multiple occurrences of the same reason were counted only once.
Mentions: A subset of 791 oligos was selected from the experimental data provided for the workshop to assess the effect of the different annotation strategies on coverage. These oligos were selected, because they showed differential gene expression signals. Hence these probes clearly bind transcripts and any orphan oligos in the updated annotation produced by sigReannot, OligoRAP and IMAD indicate false negatives due to incomplete data sources, incomplete annotation strategies or both. The focus for this comparison is on Ensembl gene ID assignments as all three pipelines provide these and hence they can be easily compared. Figure 4A shows a Venn diagram representing the amount of oligos covered with at least one Ensembl gene ID versus probes without any links to Ensembl genes. Slightly more than half (52.7%) of the oligos is linked to at least one Ensembl gene by all three pipelines. Unfortunately with 31.0% the second largest group consists of the oligos, which could not be linked to any Ensembl gene by any pipeline. Although IMAD and OligoRAP can fetch annotation for additional sources to boost annotation coverage, this tends to be less informative, because – apart from assembly gaps – had there been a lot of high quality annotation available for a hit, this would have resulted in an Ensembl gene model. When there was not enough convincing experimental evidence for an Ensembl gene model this often means the hit is only covered by just a few or even a single EST. For the remaining 16.3% of the oligos the coverage differs. The bulk of these (10.0%) are represent oligos linked to Ensembl only by IMAD+sigReannot. Probes linked to Ensembl only by IMAD or only by sigReannot correspond to 3.3% and 1.6%, respectively. Finally, OligoRAP appears to be the most conservative in linking oligos to Ensembl genes as the amounts of oligos linked to Ensembl only by OligoRAP, only by IMAD+OligoRAP and only by OligoRAP+sigReannot are all less than 1%.

Bottom Line: In this manuscript we compare their annotation strategies and results.The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms.This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Bioinformatics, Wageningen University and Research centre (WUR), P,O, Box 569, 6700 AN Wageningen, The Netherlands. pieter.neerincx@gmail.com

ABSTRACT

Background: Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies.

Results: IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines.For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms.

Conclusion: In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.

No MeSH data available.


Related in: MedlinePlus