Limits...
Probability genotype imputation method and integrated weighted lasso for QTL identification.

Demetrashvili N, Van den Heuvel ER, Wit EC - BMC Genet. (2013)

Bottom Line: The results confirm previously identified regions, however several new markers are also found.Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method.This means that under realistic missing data settings this methodology can be used for QTL identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, Groningen 9747 AG, The Netherlands. n.demetrashvili@rug.nl.

ABSTRACT

Background: Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest.

Results: Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax.

Conclusions: Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

Show MeSH

Related in: MedlinePlus

Plot of LRT statistic (on log(10) scale) of markers associated with T50 vs. genetic distance; dashed lines separate 5 chromosomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4126192&req=5

Figure 8: Plot of LRT statistic (on log(10) scale) of markers associated with T50 vs. genetic distance; dashed lines separate 5 chromosomes.

Mentions: We computed the LRT under the full and a restricted models on a 10-base logarithm scale for markers selected by wlasso. We present the 39 marker names, genetic distances and respective LRT statistic in Table 1. The larger the LRT statistic, the stronger the evidence in favor of a QTL effect at a particular location of the chromosome. The LRT statistic below 0.01 was substituted by 0. We would like to emphasize that detecting peaks that contain several markers increased our confidence in the region being associated with a trait. For example, the beginning of chromosome 3 (with markers MSAT399, ATHCHIB2, MSAT305754) and middle of chromosome 4 (MSAT415, CIW7, MSAT418) give strong evidences of these regions to be associated with Gmax (we see a set of genes with a high LRT statistic). Loci at chromosomes 3 and 5 are strongly indicative for all five traits. We also visualized the LRT statistic across genetic distance and presented the plot for T50 in Figure 8. Five peaks with eight markers are seen for T50 peak1-MSAT399, ATHCHIB2; peak2-MSAT332; peak3-MSAT49, MSAT468; peak4-MSAT514, NGA139; peak5-MSAT520037. We have the highest confidence in peak3 since it contains multiple markers having the large LRT statistic. All peaks except the one at chromosome 4 have been detected by biologists as well [12]. Thus, we have found one additional peak for T50.


Probability genotype imputation method and integrated weighted lasso for QTL identification.

Demetrashvili N, Van den Heuvel ER, Wit EC - BMC Genet. (2013)

Plot of LRT statistic (on log(10) scale) of markers associated with T50 vs. genetic distance; dashed lines separate 5 chromosomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4126192&req=5

Figure 8: Plot of LRT statistic (on log(10) scale) of markers associated with T50 vs. genetic distance; dashed lines separate 5 chromosomes.
Mentions: We computed the LRT under the full and a restricted models on a 10-base logarithm scale for markers selected by wlasso. We present the 39 marker names, genetic distances and respective LRT statistic in Table 1. The larger the LRT statistic, the stronger the evidence in favor of a QTL effect at a particular location of the chromosome. The LRT statistic below 0.01 was substituted by 0. We would like to emphasize that detecting peaks that contain several markers increased our confidence in the region being associated with a trait. For example, the beginning of chromosome 3 (with markers MSAT399, ATHCHIB2, MSAT305754) and middle of chromosome 4 (MSAT415, CIW7, MSAT418) give strong evidences of these regions to be associated with Gmax (we see a set of genes with a high LRT statistic). Loci at chromosomes 3 and 5 are strongly indicative for all five traits. We also visualized the LRT statistic across genetic distance and presented the plot for T50 in Figure 8. Five peaks with eight markers are seen for T50 peak1-MSAT399, ATHCHIB2; peak2-MSAT332; peak3-MSAT49, MSAT468; peak4-MSAT514, NGA139; peak5-MSAT520037. We have the highest confidence in peak3 since it contains multiple markers having the large LRT statistic. All peaks except the one at chromosome 4 have been detected by biologists as well [12]. Thus, we have found one additional peak for T50.

Bottom Line: The results confirm previously identified regions, however several new markers are also found.Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method.This means that under realistic missing data settings this methodology can be used for QTL identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, Groningen 9747 AG, The Netherlands. n.demetrashvili@rug.nl.

ABSTRACT

Background: Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings "sparsity" and "causal inference". The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest.

Results: Simulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax.

Conclusions: Our imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

Show MeSH
Related in: MedlinePlus