Limits...
Improving the analysis of designed studies by combining statistical modelling with study design information.

Thissen U, Wopereis S, van den Berg SA, Bobeldijk I, Kleemann R, Kooistra T, van Dijk KW, van Ommen B, Smilde AK - BMC Bioinformatics (2009)

Bottom Line: Knowledge about the study design can be used to decompose the total data into data blocks that are associated with specific effects.Subsequent statistical analysis can be improved by this decomposition if these are applied on selected combinations of effects.It was shown that ANOVA-PLS leads to a better statistical model that is more reliable and better interpretable compared to standard PLS analysis.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dutch nutrigenomics consortium of the Top Institute Food and Nutrition (TIFN), Wageningen, The Netherlands. uwe.thissen@tno.nl

ABSTRACT

Background: In the fields of life sciences, so-called designed studies are used for studying complex biological systems. The data derived from these studies comply with a study design aimed at generating relevant information while diminishing unwanted variation (noise). Knowledge about the study design can be used to decompose the total data into data blocks that are associated with specific effects. Subsequent statistical analysis can be improved by this decomposition if these are applied on selected combinations of effects.

Results: The benefit of this approach was demonstrated with an analysis that combines multivariate PLS (Partial Least Squares) regression with data decomposition from ANOVA (Analysis of Variance): ANOVA-PLS. As a case, a nutritional intervention study is used on Apoliprotein E3-Leiden (APOE3Leiden) transgenic mice to study the relation between liver lipidomics and a plasma inflammation marker, Serum Amyloid A. The ANOVA-PLS performance was compared to PLS regression on the non-decomposed data with respect to the quality of the modelled relation, model reliability, and interpretability.

Conclusion: It was shown that ANOVA-PLS leads to a better statistical model that is more reliable and better interpretable compared to standard PLS analysis. From a following biological interpretation, more relevant metabolites were derived from the model. The concept of combining data composition with a subsequent statistical analysis, as in ANOVA-PLS, is however not limited to PLS regression in metabolomics but can be applied for many statistical methods and many different types of data.

Show MeSH

Related in: MedlinePlus

Regression coefficients of models 1 and 4. The regression coefficients are shown for models 1 and 4, respectively. The bars show the mean regression coefficients ± 2 × standard deviation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2657790&req=5

Figure 7: Regression coefficients of models 1 and 4. The regression coefficients are shown for models 1 and 4, respectively. The bars show the mean regression coefficients ± 2 × standard deviation.

Mentions: For the two models, the performances and the final regression coefficients are shown by Figures 6 and 7, respectively. Together with Table 2 it is shown that the similarity is very alike between the two models. However, some differences exist. The confidence intervals of model 4 are smaller than those of model 1 (overall standard deviation of 0.048 versus 0.076). Furthermore, model 1 leads to 8 significant metabolites (metabolites with RSD smaller than 50%) while model 4 leads to 15 significant metabolites. In total, 16 unique metabolites were found significant in either of the two models of which the models have 7 in common. In addition, the following metabolites were never found significant for any of the two models: 14 of 19 FFAs (F16:0, F16:1, F16:2, F18:0, F18:2, F18:3, F18:4, F20:0, F20:1, F20:2, F20:3, F20:4, F22:3, F22:4), 4 of 6 LPCs (L16:0, L18:0, L18:2, L18:3), 6 of 10 PhCs (P32:0, P32:1, P34:2, P36:3, P36:4, P38:4) and 19 of 24 TGs (T44:0, T44:1, T46:0, T46:1, T48:0, T48:1, T48:2, T48:3, T50:5, T50:2, T50:4, T52:2, T52:3, T52:4, T54:2, T54:3, T54:4, T54:5, T56:5). The differences between the significant and insignificant metabolites were not caused by trivialities such as molecular chemical differences as the molecule sizes or the number of saturated bindings. Table 3 shows the significant metabolites for the best model (model 4) and the overlap with model 1.


Improving the analysis of designed studies by combining statistical modelling with study design information.

Thissen U, Wopereis S, van den Berg SA, Bobeldijk I, Kleemann R, Kooistra T, van Dijk KW, van Ommen B, Smilde AK - BMC Bioinformatics (2009)

Regression coefficients of models 1 and 4. The regression coefficients are shown for models 1 and 4, respectively. The bars show the mean regression coefficients ± 2 × standard deviation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2657790&req=5

Figure 7: Regression coefficients of models 1 and 4. The regression coefficients are shown for models 1 and 4, respectively. The bars show the mean regression coefficients ± 2 × standard deviation.
Mentions: For the two models, the performances and the final regression coefficients are shown by Figures 6 and 7, respectively. Together with Table 2 it is shown that the similarity is very alike between the two models. However, some differences exist. The confidence intervals of model 4 are smaller than those of model 1 (overall standard deviation of 0.048 versus 0.076). Furthermore, model 1 leads to 8 significant metabolites (metabolites with RSD smaller than 50%) while model 4 leads to 15 significant metabolites. In total, 16 unique metabolites were found significant in either of the two models of which the models have 7 in common. In addition, the following metabolites were never found significant for any of the two models: 14 of 19 FFAs (F16:0, F16:1, F16:2, F18:0, F18:2, F18:3, F18:4, F20:0, F20:1, F20:2, F20:3, F20:4, F22:3, F22:4), 4 of 6 LPCs (L16:0, L18:0, L18:2, L18:3), 6 of 10 PhCs (P32:0, P32:1, P34:2, P36:3, P36:4, P38:4) and 19 of 24 TGs (T44:0, T44:1, T46:0, T46:1, T48:0, T48:1, T48:2, T48:3, T50:5, T50:2, T50:4, T52:2, T52:3, T52:4, T54:2, T54:3, T54:4, T54:5, T56:5). The differences between the significant and insignificant metabolites were not caused by trivialities such as molecular chemical differences as the molecule sizes or the number of saturated bindings. Table 3 shows the significant metabolites for the best model (model 4) and the overlap with model 1.

Bottom Line: Knowledge about the study design can be used to decompose the total data into data blocks that are associated with specific effects.Subsequent statistical analysis can be improved by this decomposition if these are applied on selected combinations of effects.It was shown that ANOVA-PLS leads to a better statistical model that is more reliable and better interpretable compared to standard PLS analysis.

View Article: PubMed Central - HTML - PubMed

Affiliation: Dutch nutrigenomics consortium of the Top Institute Food and Nutrition (TIFN), Wageningen, The Netherlands. uwe.thissen@tno.nl

ABSTRACT

Background: In the fields of life sciences, so-called designed studies are used for studying complex biological systems. The data derived from these studies comply with a study design aimed at generating relevant information while diminishing unwanted variation (noise). Knowledge about the study design can be used to decompose the total data into data blocks that are associated with specific effects. Subsequent statistical analysis can be improved by this decomposition if these are applied on selected combinations of effects.

Results: The benefit of this approach was demonstrated with an analysis that combines multivariate PLS (Partial Least Squares) regression with data decomposition from ANOVA (Analysis of Variance): ANOVA-PLS. As a case, a nutritional intervention study is used on Apoliprotein E3-Leiden (APOE3Leiden) transgenic mice to study the relation between liver lipidomics and a plasma inflammation marker, Serum Amyloid A. The ANOVA-PLS performance was compared to PLS regression on the non-decomposed data with respect to the quality of the modelled relation, model reliability, and interpretability.

Conclusion: It was shown that ANOVA-PLS leads to a better statistical model that is more reliable and better interpretable compared to standard PLS analysis. From a following biological interpretation, more relevant metabolites were derived from the model. The concept of combining data composition with a subsequent statistical analysis, as in ANOVA-PLS, is however not limited to PLS regression in metabolomics but can be applied for many statistical methods and many different types of data.

Show MeSH
Related in: MedlinePlus