Limits...
Probability genotype imputation method and integrated weighted lasso for QTL identification.

Demetrashvili N, Van den Heuvel ER, Wit EC - BMC Genet. (2013)

Bottom Line: The results confirm previously identified regions, however several new markers are also found.Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method.This means that under realistic missing data settings this methodology can be used for QTL identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, Groningen 9747 AG, The Netherlands. n.demetrashvili@rug.nl.

ABSTRACT

Background: Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest.

Results: Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax.

Conclusions: Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

Show MeSH

Related in: MedlinePlus

ROC curves comparison across 4 models for evenly-spaced markers with MAR mechanism whenσ2= 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4126192&req=5

Figure 17: ROC curves comparison across 4 models for evenly-spaced markers with MAR mechanism whenσ2= 1.

Mentions: The nearest marker imputation and multiple regression are frequently employed methods for QTL analysis. Our imputation was compared with the nearest marker imputation and wlasso was compared with the multiple regression. As a result we examined four models: (1) our imputation and wlasso, (2) our imputation and multiple regression, (3) nearest marker imputation and wlasso, (4) nearest marker imputation and multiple regression. Whenever wlasso was included in a simulation we studied the TPR and FPR across a range of the tuning parameter λ. Whenever multiple regression was included in a simulation we used the full significance level range [0,1] by increments of 0.015. Such increment results in 67 steps which are of the same order as number of steps in the wlasso. All four models were applied to simulated data described above and were studied for equally-spaced and clustered markers with MCAR and MAR mechanisms. The results were summarized using ROC curves. The ROC curves across four models for clustered markers when the residual error variance is small (σ2=0.5) are shown in Figures 13 and 14. Our model 1 outperforms others under all scenarios. We also show similar plots for larger residual error variance (σ2=3) when the markers are evenly-spaced (see Figures 15 and 16). Clearly, as the residual error variance increases, our Model 1 has more pronounced sensitivity and (1-specificity) than other models have. We also demonstrate the results for intermediate variance (σ2=1) when markers are evenly-spaced and clustered (see Figures 17 and 18). Interestingly, for smaller residual error variance (σ2≤1), the ROC curves of Model 2 are slightly above the curves of Model 3. This implies that the probabilistic imputation method with multiple regression has slightly higher accuracy than nearest marker imputation with wlasso. For larger residual error variance (σ2>1), the ROC curves of models 2 and 3 are approximately on top of each other, implying that the improvement for both wlasso and the probabilistic imputation method is roughly the same. However, just comparing our imputation method with nearest marker imputation for wlasso (Model 1 vs. Model 3) and for multiple regression (Model 2 vs. Model 4) demonstrates that our imputation method outperforms nearest marker imputation.


Probability genotype imputation method and integrated weighted lasso for QTL identification.

Demetrashvili N, Van den Heuvel ER, Wit EC - BMC Genet. (2013)

ROC curves comparison across 4 models for evenly-spaced markers with MAR mechanism whenσ2= 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4126192&req=5

Figure 17: ROC curves comparison across 4 models for evenly-spaced markers with MAR mechanism whenσ2= 1.
Mentions: The nearest marker imputation and multiple regression are frequently employed methods for QTL analysis. Our imputation was compared with the nearest marker imputation and wlasso was compared with the multiple regression. As a result we examined four models: (1) our imputation and wlasso, (2) our imputation and multiple regression, (3) nearest marker imputation and wlasso, (4) nearest marker imputation and multiple regression. Whenever wlasso was included in a simulation we studied the TPR and FPR across a range of the tuning parameter λ. Whenever multiple regression was included in a simulation we used the full significance level range [0,1] by increments of 0.015. Such increment results in 67 steps which are of the same order as number of steps in the wlasso. All four models were applied to simulated data described above and were studied for equally-spaced and clustered markers with MCAR and MAR mechanisms. The results were summarized using ROC curves. The ROC curves across four models for clustered markers when the residual error variance is small (σ2=0.5) are shown in Figures 13 and 14. Our model 1 outperforms others under all scenarios. We also show similar plots for larger residual error variance (σ2=3) when the markers are evenly-spaced (see Figures 15 and 16). Clearly, as the residual error variance increases, our Model 1 has more pronounced sensitivity and (1-specificity) than other models have. We also demonstrate the results for intermediate variance (σ2=1) when markers are evenly-spaced and clustered (see Figures 17 and 18). Interestingly, for smaller residual error variance (σ2≤1), the ROC curves of Model 2 are slightly above the curves of Model 3. This implies that the probabilistic imputation method with multiple regression has slightly higher accuracy than nearest marker imputation with wlasso. For larger residual error variance (σ2>1), the ROC curves of models 2 and 3 are approximately on top of each other, implying that the improvement for both wlasso and the probabilistic imputation method is roughly the same. However, just comparing our imputation method with nearest marker imputation for wlasso (Model 1 vs. Model 3) and for multiple regression (Model 2 vs. Model 4) demonstrates that our imputation method outperforms nearest marker imputation.

Bottom Line: The results confirm previously identified regions, however several new markers are also found.Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method.This means that under realistic missing data settings this methodology can be used for QTL identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, Groningen 9747 AG, The Netherlands. n.demetrashvili@rug.nl.

ABSTRACT

Background: Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest.

Results: Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax.

Conclusions: Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

Show MeSH
Related in: MedlinePlus