Limits...
Probability genotype imputation method and integrated weighted lasso for QTL identification.

Demetrashvili N, Van den Heuvel ER, Wit EC - BMC Genet. (2013)

Bottom Line: The results confirm previously identified regions, however several new markers are also found.Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method.This means that under realistic missing data settings this methodology can be used for QTL identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, Groningen 9747 AG, The Netherlands. n.demetrashvili@rug.nl.

ABSTRACT

Background: Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest.

Results: Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax.

Conclusions: Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

Show MeSH

Related in: MedlinePlus

ROC curves comparison across 5 models for evenly-spaced markers with MCAR mechanism whenσ2= 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4126192&req=5

Figure 19: ROC curves comparison across 5 models for evenly-spaced markers with MCAR mechanism whenσ2= 1.

Mentions: Our wlasso is compared with the classical lasso and an adaptive lasso with three weighting schemes [18]. In an adaptive lasso, we estimated the weight vector as , where is a vector of ordinary least square estimates and γ=0.5,1,2 [18]. All five models were applied to the simulated data described above. For lasso and adaptive lasso, the imputed probabilities were rounded towards zeros and ones after the imputation (ignoring our weighting procedure). Thus, an input matrix to lasso and adaptive lasso contained genotype values {0,1}. We investigated settings for equally-spaced and clustered markers with both, MCAR and MAR mechanisms. Again, the results were summarized using ROC curves. The ROC curves of the five models for both, evenly-spaced and clustered markers with MCAR mechanism when the residual error variance σ2=1 are shown in Figures 19 and 20. Similar plots for MAR are presented in Figures 21 and 22. Clearly, the wlasso is more accurate than other four alternatives. This accuracy is maintained across the investigated variances of all magnitudes (σ2=0.5,1,2,3), see Additional files 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15. An obvious advantage of wlasso is observed for both, evenly-spaced and clustered markers with MAR mechanism (see Additional files 6, 7, 10, 11, 14, and 15). For clustered markers with MCAR, the wlasso has lost an obvious advantage but still remains at the same accuracy level as lasso when the residual error variance (σ2=2,3) increases (see Additional files 9 and 13). Though, for evenly-spaced markers with MCAR and large residual error variances (σ2=2,3), the wlasso maintains noticeably higher accuracy levels than the other four approaches (see Additional files 8 and 12).


Probability genotype imputation method and integrated weighted lasso for QTL identification.

Demetrashvili N, Van den Heuvel ER, Wit EC - BMC Genet. (2013)

ROC curves comparison across 5 models for evenly-spaced markers with MCAR mechanism whenσ2= 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4126192&req=5

Figure 19: ROC curves comparison across 5 models for evenly-spaced markers with MCAR mechanism whenσ2= 1.
Mentions: Our wlasso is compared with the classical lasso and an adaptive lasso with three weighting schemes [18]. In an adaptive lasso, we estimated the weight vector as , where is a vector of ordinary least square estimates and γ=0.5,1,2 [18]. All five models were applied to the simulated data described above. For lasso and adaptive lasso, the imputed probabilities were rounded towards zeros and ones after the imputation (ignoring our weighting procedure). Thus, an input matrix to lasso and adaptive lasso contained genotype values {0,1}. We investigated settings for equally-spaced and clustered markers with both, MCAR and MAR mechanisms. Again, the results were summarized using ROC curves. The ROC curves of the five models for both, evenly-spaced and clustered markers with MCAR mechanism when the residual error variance σ2=1 are shown in Figures 19 and 20. Similar plots for MAR are presented in Figures 21 and 22. Clearly, the wlasso is more accurate than other four alternatives. This accuracy is maintained across the investigated variances of all magnitudes (σ2=0.5,1,2,3), see Additional files 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15. An obvious advantage of wlasso is observed for both, evenly-spaced and clustered markers with MAR mechanism (see Additional files 6, 7, 10, 11, 14, and 15). For clustered markers with MCAR, the wlasso has lost an obvious advantage but still remains at the same accuracy level as lasso when the residual error variance (σ2=2,3) increases (see Additional files 9 and 13). Though, for evenly-spaced markers with MCAR and large residual error variances (σ2=2,3), the wlasso maintains noticeably higher accuracy levels than the other four approaches (see Additional files 8 and 12).

Bottom Line: The results confirm previously identified regions, however several new markers are also found.Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method.This means that under realistic missing data settings this methodology can be used for QTL identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, Groningen 9747 AG, The Netherlands. n.demetrashvili@rug.nl.

ABSTRACT

Background: Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest.

Results: Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax.

Conclusions: Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

Show MeSH
Related in: MedlinePlus