Limits...
Empirical evaluations of analytical issues arising from predicting HLA alleles using multiple SNPs.

Zhang XC, Li SS, Wang H, Hansen JA, Zhao LP - BMC Genet. (2011)

Bottom Line: Specifically, we have found that utilizing imputed in addition to genotyped SNPs generally yields comparable if not better performance in prediction accuracies.Further, when the training set includes multi-ethnic populations, the resulting models are reliable and perform well for the same subpopulations across all HLA genes.In contrast, the predictive models built from single ethnic populations have superior performance within the same ethnic population, but are not likely to perform well in other ethnic populations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA 98109, USA.

ABSTRACT

Background: Numerous immune-mediated diseases have been associated with the class I and II HLA genes located within the major histocompatibility complex (MHC) consisting of highly polymorphic alleles encoded by the HLA-A, -B, -C, -DRB1, -DQB1 and -DPB1 loci. Genotyping for HLA alleles is complex and relatively expensive. Recent studies have demonstrated the feasibility of predicting HLA alleles, using MHC SNPs inside and outside of HLA that are typically included in SNP arrays and are commonly available in genome-wide association studies (GWAS). We have recently described a novel method that is complementary to the previous methods, for accurately predicting HLA alleles using unphased flanking SNPs genotypes. In this manuscript, we address several practical issues relevant to the application of this methodology.

Results: Applying this new methodology to three large independent study cohorts, we have evaluated the performance of the predictive models in ethnically diverse populations. Specifically, we have found that utilizing imputed in addition to genotyped SNPs generally yields comparable if not better performance in prediction accuracies. Our evaluation also supports the idea that predictive models trained on one population are transferable to other populations of the same ethnicity. Further, when the training set includes multi-ethnic populations, the resulting models are reliable and perform well for the same subpopulations across all HLA genes. In contrast, the predictive models built from single ethnic populations have superior performance within the same ethnic population, but are not likely to perform well in other ethnic populations.

Conclusions: The empirical explorations reported here provide further evidence in support of the application of this approach for predicting HLA alleles with GWAS-derived SNP data. Utilizing all available samples, we have built "state of the art" predictive models for HLA-A, -B, -C, -DRB1, -DQB1 and -DPB1. The HLA allele predictive models, along with the program used to carry out the prediction, are available on our website.

Show MeSH
Comparison of cross-platform prediction accuracies among four genotyping arrays. Each square panel shows the accuracies of HLA predictive models built using SNPs observed and imputed from one genotyping array and validated using SNPs observed and imputed from another array. The confidence threshold was set at CT = 0.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3111398&req=5

Figure 3: Comparison of cross-platform prediction accuracies among four genotyping arrays. Each square panel shows the accuracies of HLA predictive models built using SNPs observed and imputed from one genotyping array and validated using SNPs observed and imputed from another array. The confidence threshold was set at CT = 0.

Mentions: Results for assessing the performance of predictive models, trained using genotypes observed and imputed from one platform but applied to genotype data observed and imputed from another platform, are shown in Figure 3, which has five panels corresponding to five HLA genes. Within each panel, we show HLA prediction accuracies validated on four technologies (x-axis), when the corresponding model is trained on each of four different data sets (y-axis). The diagonal line shows prediction accuracies when training and validation sets are genotyped on the same technology. As the colored bar shows, red color indicates higher accuracy and yellow color indicates relatively lower accuracy. From visual inspection, when training and validation sets are genotyped on different technologies, predictive models trained using Illumina 550K SNPs seem to slightly outperform models using other platforms for the majority of HLA genes, at both intermediate and high resolutions. However, the accuracies of prediction at each HLA gene are generally comparable among the four genotyping technologies. Detailed accuracy estimates are shown in the Additional file 1: Table S4.


Empirical evaluations of analytical issues arising from predicting HLA alleles using multiple SNPs.

Zhang XC, Li SS, Wang H, Hansen JA, Zhao LP - BMC Genet. (2011)

Comparison of cross-platform prediction accuracies among four genotyping arrays. Each square panel shows the accuracies of HLA predictive models built using SNPs observed and imputed from one genotyping array and validated using SNPs observed and imputed from another array. The confidence threshold was set at CT = 0.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3111398&req=5

Figure 3: Comparison of cross-platform prediction accuracies among four genotyping arrays. Each square panel shows the accuracies of HLA predictive models built using SNPs observed and imputed from one genotyping array and validated using SNPs observed and imputed from another array. The confidence threshold was set at CT = 0.
Mentions: Results for assessing the performance of predictive models, trained using genotypes observed and imputed from one platform but applied to genotype data observed and imputed from another platform, are shown in Figure 3, which has five panels corresponding to five HLA genes. Within each panel, we show HLA prediction accuracies validated on four technologies (x-axis), when the corresponding model is trained on each of four different data sets (y-axis). The diagonal line shows prediction accuracies when training and validation sets are genotyped on the same technology. As the colored bar shows, red color indicates higher accuracy and yellow color indicates relatively lower accuracy. From visual inspection, when training and validation sets are genotyped on different technologies, predictive models trained using Illumina 550K SNPs seem to slightly outperform models using other platforms for the majority of HLA genes, at both intermediate and high resolutions. However, the accuracies of prediction at each HLA gene are generally comparable among the four genotyping technologies. Detailed accuracy estimates are shown in the Additional file 1: Table S4.

Bottom Line: Specifically, we have found that utilizing imputed in addition to genotyped SNPs generally yields comparable if not better performance in prediction accuracies.Further, when the training set includes multi-ethnic populations, the resulting models are reliable and perform well for the same subpopulations across all HLA genes.In contrast, the predictive models built from single ethnic populations have superior performance within the same ethnic population, but are not likely to perform well in other ethnic populations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA 98109, USA.

ABSTRACT

Background: Numerous immune-mediated diseases have been associated with the class I and II HLA genes located within the major histocompatibility complex (MHC) consisting of highly polymorphic alleles encoded by the HLA-A, -B, -C, -DRB1, -DQB1 and -DPB1 loci. Genotyping for HLA alleles is complex and relatively expensive. Recent studies have demonstrated the feasibility of predicting HLA alleles, using MHC SNPs inside and outside of HLA that are typically included in SNP arrays and are commonly available in genome-wide association studies (GWAS). We have recently described a novel method that is complementary to the previous methods, for accurately predicting HLA alleles using unphased flanking SNPs genotypes. In this manuscript, we address several practical issues relevant to the application of this methodology.

Results: Applying this new methodology to three large independent study cohorts, we have evaluated the performance of the predictive models in ethnically diverse populations. Specifically, we have found that utilizing imputed in addition to genotyped SNPs generally yields comparable if not better performance in prediction accuracies. Our evaluation also supports the idea that predictive models trained on one population are transferable to other populations of the same ethnicity. Further, when the training set includes multi-ethnic populations, the resulting models are reliable and perform well for the same subpopulations across all HLA genes. In contrast, the predictive models built from single ethnic populations have superior performance within the same ethnic population, but are not likely to perform well in other ethnic populations.

Conclusions: The empirical explorations reported here provide further evidence in support of the application of this approach for predicting HLA alleles with GWAS-derived SNP data. Utilizing all available samples, we have built "state of the art" predictive models for HLA-A, -B, -C, -DRB1, -DQB1 and -DPB1. The HLA allele predictive models, along with the program used to carry out the prediction, are available on our website.

Show MeSH