Limits...
A statistical framework for consolidating "sibling" probe sets for Affymetrix GeneChip data.

Li H, Zhu D, Cook M - BMC Genomics (2008)

Bottom Line: We found that consolidation of sibling probe sets of the former type results in large increase in the number of differentially expressed genes under various statistical criteria.Consolidating sibling probe sets by pooling data from each greatly improves the estimates of a gene expression level and results in identification of more biologically relevant genes.Sibling probe sets that do not qualify for consolidation may represent annotation errors or other artifacts, or may correspond to differentially processed transcripts of the same gene that require further analysis.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Center, Stowers Institute for Medical Research, 1000 E 50th St, Kansas City, MO 64110, USA. hul@stowers-institute.org

ABSTRACT

Background: Affymetrix GeneChip typically contains multiple probe sets per gene, defined as sibling probe sets in this study. These probe sets may or may not behave similar across treatments. The most appropriate way of consolidating sibling probe sets suitable for analysis is an open problem. We propose the Analysis of Variance (ANOVA) framework to decide which sibling probe sets can be consolidated.

Results: The ANOVA model allows us to separate the sibling probe sets into two types: those behave similarly across treatments and those behave differently across treatments. We found that consolidation of sibling probe sets of the former type results in large increase in the number of differentially expressed genes under various statistical criteria. The approach to selecting sibling probe sets suitable for consolidating is implemented in R language and freely available from http://research.stowers-institute.org/hul/affy/.

Conclusion: Our ANOVA analysis of sibling probe sets provides a statistical framework for selecting sibling probe sets for consolidation. Consolidating sibling probe sets by pooling data from each greatly improves the estimates of a gene expression level and results in identification of more biologically relevant genes. Sibling probe sets that do not qualify for consolidation may represent annotation errors or other artifacts, or may correspond to differentially processed transcripts of the same gene that require further analysis.

Show MeSH
The algorithm flowchart. The figure demonstrates the outline of identification and consolidation of qualified sibling probe sets based on statistical tests. We are interested in studying the differentially expressed genes across treatments. The analysis starts from properly normalized and summarized expression scores for each probe set. For genes that are represented by multiple probe sets (sibling probe sets), insignificant interaction effect (trt*ps) between treatment (trt) and probesets (ps) suggests consolidating sibling probe sets and P-values of the treatment effect are obtained from the two-way ANOVA model. For the gene corresponding to a single probe set and those probe sets that are not eligible for consolidating, i.e. significant interaction effect (trt*ps), P-values of the treatment effect are reported from the one-way ANOVA model. Then P-values are combined as a final result for screening differentially expressed genes across treatments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2397416&req=5

Figure 2: The algorithm flowchart. The figure demonstrates the outline of identification and consolidation of qualified sibling probe sets based on statistical tests. We are interested in studying the differentially expressed genes across treatments. The analysis starts from properly normalized and summarized expression scores for each probe set. For genes that are represented by multiple probe sets (sibling probe sets), insignificant interaction effect (trt*ps) between treatment (trt) and probesets (ps) suggests consolidating sibling probe sets and P-values of the treatment effect are obtained from the two-way ANOVA model. For the gene corresponding to a single probe set and those probe sets that are not eligible for consolidating, i.e. significant interaction effect (trt*ps), P-values of the treatment effect are reported from the one-way ANOVA model. Then P-values are combined as a final result for screening differentially expressed genes across treatments.

Mentions: The outline of automatic identification and consolidation of qualified sibling probe sets based on statistically supported evidence is shown in Fig. 2. We start our analysis from properly normalized and summarized expression scores for each probe set, e.g. RMA score [21], GCRMA score [22] or Model-Based Expression Index (MBEI) [23]. We ask whether the differential expression over treatments among sibling probe sets follow the same trend or not in a two-way ANOVA model, which includes treatment (τ), probe set (ψ), as well as their interaction effect (τψ). Non-significant interaction effect indicates that the sibling probe sets have the same trend of differential expression over treatments. As shown in the top row of the Fig. 3, several probe sets show similar expression profile (slopes) between wild type and treatment (knock-out) and will be consolidated. Consequently, the P-value of treatment effect should be reported based on the two-way ANOVA model (Eq. 1) since it accounts for all measures from sibling probe sets for the same gene. Significant interaction effect indicates that the expression profiles from the probe sets are different in slopes shown in the middle and bottom rows of Fig. 3. These sibling probe sets are more appropriately treated as independent probe sets although they share same gene symbol. For independent probe sets or single probe sets, we compare differential expression over treatments using one-way ANOVA model (Eq. 3). In this case, P-values of treatment effect are reported from one-way ANOVA model.


A statistical framework for consolidating "sibling" probe sets for Affymetrix GeneChip data.

Li H, Zhu D, Cook M - BMC Genomics (2008)

The algorithm flowchart. The figure demonstrates the outline of identification and consolidation of qualified sibling probe sets based on statistical tests. We are interested in studying the differentially expressed genes across treatments. The analysis starts from properly normalized and summarized expression scores for each probe set. For genes that are represented by multiple probe sets (sibling probe sets), insignificant interaction effect (trt*ps) between treatment (trt) and probesets (ps) suggests consolidating sibling probe sets and P-values of the treatment effect are obtained from the two-way ANOVA model. For the gene corresponding to a single probe set and those probe sets that are not eligible for consolidating, i.e. significant interaction effect (trt*ps), P-values of the treatment effect are reported from the one-way ANOVA model. Then P-values are combined as a final result for screening differentially expressed genes across treatments.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2397416&req=5

Figure 2: The algorithm flowchart. The figure demonstrates the outline of identification and consolidation of qualified sibling probe sets based on statistical tests. We are interested in studying the differentially expressed genes across treatments. The analysis starts from properly normalized and summarized expression scores for each probe set. For genes that are represented by multiple probe sets (sibling probe sets), insignificant interaction effect (trt*ps) between treatment (trt) and probesets (ps) suggests consolidating sibling probe sets and P-values of the treatment effect are obtained from the two-way ANOVA model. For the gene corresponding to a single probe set and those probe sets that are not eligible for consolidating, i.e. significant interaction effect (trt*ps), P-values of the treatment effect are reported from the one-way ANOVA model. Then P-values are combined as a final result for screening differentially expressed genes across treatments.
Mentions: The outline of automatic identification and consolidation of qualified sibling probe sets based on statistically supported evidence is shown in Fig. 2. We start our analysis from properly normalized and summarized expression scores for each probe set, e.g. RMA score [21], GCRMA score [22] or Model-Based Expression Index (MBEI) [23]. We ask whether the differential expression over treatments among sibling probe sets follow the same trend or not in a two-way ANOVA model, which includes treatment (τ), probe set (ψ), as well as their interaction effect (τψ). Non-significant interaction effect indicates that the sibling probe sets have the same trend of differential expression over treatments. As shown in the top row of the Fig. 3, several probe sets show similar expression profile (slopes) between wild type and treatment (knock-out) and will be consolidated. Consequently, the P-value of treatment effect should be reported based on the two-way ANOVA model (Eq. 1) since it accounts for all measures from sibling probe sets for the same gene. Significant interaction effect indicates that the expression profiles from the probe sets are different in slopes shown in the middle and bottom rows of Fig. 3. These sibling probe sets are more appropriately treated as independent probe sets although they share same gene symbol. For independent probe sets or single probe sets, we compare differential expression over treatments using one-way ANOVA model (Eq. 3). In this case, P-values of treatment effect are reported from one-way ANOVA model.

Bottom Line: We found that consolidation of sibling probe sets of the former type results in large increase in the number of differentially expressed genes under various statistical criteria.Consolidating sibling probe sets by pooling data from each greatly improves the estimates of a gene expression level and results in identification of more biologically relevant genes.Sibling probe sets that do not qualify for consolidation may represent annotation errors or other artifacts, or may correspond to differentially processed transcripts of the same gene that require further analysis.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Center, Stowers Institute for Medical Research, 1000 E 50th St, Kansas City, MO 64110, USA. hul@stowers-institute.org

ABSTRACT

Background: Affymetrix GeneChip typically contains multiple probe sets per gene, defined as sibling probe sets in this study. These probe sets may or may not behave similar across treatments. The most appropriate way of consolidating sibling probe sets suitable for analysis is an open problem. We propose the Analysis of Variance (ANOVA) framework to decide which sibling probe sets can be consolidated.

Results: The ANOVA model allows us to separate the sibling probe sets into two types: those behave similarly across treatments and those behave differently across treatments. We found that consolidation of sibling probe sets of the former type results in large increase in the number of differentially expressed genes under various statistical criteria. The approach to selecting sibling probe sets suitable for consolidating is implemented in R language and freely available from http://research.stowers-institute.org/hul/affy/.

Conclusion: Our ANOVA analysis of sibling probe sets provides a statistical framework for selecting sibling probe sets for consolidation. Consolidating sibling probe sets by pooling data from each greatly improves the estimates of a gene expression level and results in identification of more biologically relevant genes. Sibling probe sets that do not qualify for consolidation may represent annotation errors or other artifacts, or may correspond to differentially processed transcripts of the same gene that require further analysis.

Show MeSH