Limits...
Probability genotype imputation method and integrated weighted lasso for QTL identification.

Demetrashvili N, Van den Heuvel ER, Wit EC - BMC Genet. (2013)

Bottom Line: The results confirm previously identified regions, however several new markers are also found.Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method.This means that under realistic missing data settings this methodology can be used for QTL identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, Groningen 9747 AG, The Netherlands. n.demetrashvili@rug.nl.

ABSTRACT

Background: Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest.

Results: Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax.

Conclusions: Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

Show MeSH

Related in: MedlinePlus

Plots of weighted lasso regression coefficients vs. number of steps in weighted lasso for traits Gmax, T50, T10, U8416, and AUC100.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4126192&req=5

Figure 6: Plots of weighted lasso regression coefficients vs. number of steps in weighted lasso for traits Gmax, T50, T10, U8416, and AUC100.

Mentions: We adjusted all 7014 observations of every trait using regression model (9). Number of regression parameters ϕ0,…,ϕm is high since some of the input variables are categorical. In total we estimated m=15 parameters. Then, the residuals were modeled using wlasso, as described in previous sections. The marker effects, demonstrated by regression coefficients in wlasso, are presented in Figure 6. These values will be used in the simulation studies below. The BICs for all five traits are shown in Figure 7. For each trait, those markers are selected by wlasso as indicated by the minimal BIC value. Thus, on the basis of BIC 29, 10, 15, 11 and 22 markers are selected for Gmax, T50, T10, U8416, and AUC100 respectively. Clearly, the number and selection of markers differ for each trait. They represent in total 39 out of the 69 available markers.


Probability genotype imputation method and integrated weighted lasso for QTL identification.

Demetrashvili N, Van den Heuvel ER, Wit EC - BMC Genet. (2013)

Plots of weighted lasso regression coefficients vs. number of steps in weighted lasso for traits Gmax, T50, T10, U8416, and AUC100.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4126192&req=5

Figure 6: Plots of weighted lasso regression coefficients vs. number of steps in weighted lasso for traits Gmax, T50, T10, U8416, and AUC100.
Mentions: We adjusted all 7014 observations of every trait using regression model (9). Number of regression parameters ϕ0,…,ϕm is high since some of the input variables are categorical. In total we estimated m=15 parameters. Then, the residuals were modeled using wlasso, as described in previous sections. The marker effects, demonstrated by regression coefficients in wlasso, are presented in Figure 6. These values will be used in the simulation studies below. The BICs for all five traits are shown in Figure 7. For each trait, those markers are selected by wlasso as indicated by the minimal BIC value. Thus, on the basis of BIC 29, 10, 15, 11 and 22 markers are selected for Gmax, T50, T10, U8416, and AUC100 respectively. Clearly, the number and selection of markers differ for each trait. They represent in total 39 out of the 69 available markers.

Bottom Line: The results confirm previously identified regions, however several new markers are also found.Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method.This means that under realistic missing data settings this methodology can be used for QTL identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, Groningen 9747 AG, The Netherlands. n.demetrashvili@rug.nl.

ABSTRACT

Background: Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest.

Results: Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax.

Conclusions: Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

Show MeSH
Related in: MedlinePlus