Limits...
Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis.

Neerincx PB, Casel P, Prickett D, Nie H, Watson M, Leunissen JA, Groenen MA, Klopp C - BMC Proc (2009)

Bottom Line: In this manuscript we compare their annotation strategies and results.The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms.This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Bioinformatics, Wageningen University and Research centre (WUR), P,O, Box 569, 6700 AN Wageningen, The Netherlands. pieter.neerincx@gmail.com

ABSTRACT

Background: Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies.

Results: IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines.For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms.

Conclusion: In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.

No MeSH data available.


Related in: MedlinePlus

Comparison of Target Specificity Classes (TSCs). Overview of how the TSCs – as defined by the 3 pipelines – (partially) overlap or are divided into smaller sub-categories (A). O = OligoRAP, S = sigReannot, I = IMAD. Numbers indicate the corresponding TSCs. LCS = Longest Contiguous Stretch. IMAD does not differentiate between different hit types. OligoRAP and sigReannot differentiate between High Quality alignments (HQ hits, called "hits" by sigReannot and "primary hits" by OligoRAP) and Low Quality alignments (LQ hits, called "noise" by sigReannot and "secondary hits" by OligoRAP). Figure B shows how more detailed TSCs can be grouped into 3 base TSCs for comparison of the results: one hit (I1 = O1+O2 = S1+S3+S4), multiple hits (I2 = O3+O4+O5 = S2+S5+S7) or no hits at all (I3 = O6 = S6).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2712739&req=5

Figure 3: Comparison of Target Specificity Classes (TSCs). Overview of how the TSCs – as defined by the 3 pipelines – (partially) overlap or are divided into smaller sub-categories (A). O = OligoRAP, S = sigReannot, I = IMAD. Numbers indicate the corresponding TSCs. LCS = Longest Contiguous Stretch. IMAD does not differentiate between different hit types. OligoRAP and sigReannot differentiate between High Quality alignments (HQ hits, called "hits" by sigReannot and "primary hits" by OligoRAP) and Low Quality alignments (LQ hits, called "noise" by sigReannot and "secondary hits" by OligoRAP). Figure B shows how more detailed TSCs can be grouped into 3 base TSCs for comparison of the results: one hit (I1 = O1+O2 = S1+S3+S4), multiple hits (I2 = O3+O4+O5 = S2+S5+S7) or no hits at all (I3 = O6 = S6).

Mentions: Based on the amount and type of hits oligos can be assigned to target specificity classes (TSCs). An overview of how TSCs overlap or differ between the 3 pipelines is given in Figure 3A. IMAD focuses on the big picture the way most biologists are interested in oligo annotation: Are my oligos gene specific or not? This results in three TSCs: gene-specific, non-specific and orphan oligos. OligoRAP and sigReannot on the other hand provide more resolution by differentiating between high quality (HQ) and low quality (LQ) alignments resulting in 7 and 6 TSCs, respectively. OligoRAP uses two thresholds per filter – one for LQ and one for HQ hits – to assign oligos to TSCs. A different approach is used by sigReannot as the percentage sequence identity is used exclusively to filter for HQ hits and the length of the longest contiguous stretch for LQ hits. Figure 3B shows how the more specialised TSCs of OligoRAP and sigReannot can be combined into the more generic TSCs of IMAD for easier comparison of the results produced by IMAD, OligoRAP and sigReannot.


Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis.

Neerincx PB, Casel P, Prickett D, Nie H, Watson M, Leunissen JA, Groenen MA, Klopp C - BMC Proc (2009)

Comparison of Target Specificity Classes (TSCs). Overview of how the TSCs – as defined by the 3 pipelines – (partially) overlap or are divided into smaller sub-categories (A). O = OligoRAP, S = sigReannot, I = IMAD. Numbers indicate the corresponding TSCs. LCS = Longest Contiguous Stretch. IMAD does not differentiate between different hit types. OligoRAP and sigReannot differentiate between High Quality alignments (HQ hits, called "hits" by sigReannot and "primary hits" by OligoRAP) and Low Quality alignments (LQ hits, called "noise" by sigReannot and "secondary hits" by OligoRAP). Figure B shows how more detailed TSCs can be grouped into 3 base TSCs for comparison of the results: one hit (I1 = O1+O2 = S1+S3+S4), multiple hits (I2 = O3+O4+O5 = S2+S5+S7) or no hits at all (I3 = O6 = S6).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2712739&req=5

Figure 3: Comparison of Target Specificity Classes (TSCs). Overview of how the TSCs – as defined by the 3 pipelines – (partially) overlap or are divided into smaller sub-categories (A). O = OligoRAP, S = sigReannot, I = IMAD. Numbers indicate the corresponding TSCs. LCS = Longest Contiguous Stretch. IMAD does not differentiate between different hit types. OligoRAP and sigReannot differentiate between High Quality alignments (HQ hits, called "hits" by sigReannot and "primary hits" by OligoRAP) and Low Quality alignments (LQ hits, called "noise" by sigReannot and "secondary hits" by OligoRAP). Figure B shows how more detailed TSCs can be grouped into 3 base TSCs for comparison of the results: one hit (I1 = O1+O2 = S1+S3+S4), multiple hits (I2 = O3+O4+O5 = S2+S5+S7) or no hits at all (I3 = O6 = S6).
Mentions: Based on the amount and type of hits oligos can be assigned to target specificity classes (TSCs). An overview of how TSCs overlap or differ between the 3 pipelines is given in Figure 3A. IMAD focuses on the big picture the way most biologists are interested in oligo annotation: Are my oligos gene specific or not? This results in three TSCs: gene-specific, non-specific and orphan oligos. OligoRAP and sigReannot on the other hand provide more resolution by differentiating between high quality (HQ) and low quality (LQ) alignments resulting in 7 and 6 TSCs, respectively. OligoRAP uses two thresholds per filter – one for LQ and one for HQ hits – to assign oligos to TSCs. A different approach is used by sigReannot as the percentage sequence identity is used exclusively to filter for HQ hits and the length of the longest contiguous stretch for LQ hits. Figure 3B shows how the more specialised TSCs of OligoRAP and sigReannot can be combined into the more generic TSCs of IMAD for easier comparison of the results produced by IMAD, OligoRAP and sigReannot.

Bottom Line: In this manuscript we compare their annotation strategies and results.The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms.This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Bioinformatics, Wageningen University and Research centre (WUR), P,O, Box 569, 6700 AN Wageningen, The Netherlands. pieter.neerincx@gmail.com

ABSTRACT

Background: Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies.

Results: IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines.For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms.

Conclusion: In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.

No MeSH data available.


Related in: MedlinePlus