Limits...
A hidden Markov model for haplotype inference for present-absent data of clustered genes using identified haplotypes and haplotype patterns.

Wu J, Chen GB, Zhi D, Liu N, Zhang K - Front Genet (2014)

Bottom Line: Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown.Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing information.Our simulation results showed that the incorporation of identified haplotypes and partial haplotype patterns can improve accuracy for haplotype inference.

View Article: PubMed Central - PubMed

Affiliation: Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA.

ABSTRACT
The majority of killer cell immunoglobin-like receptor (KIR) genes are detected as either present or absent using locus-specific genotyping technology. Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown. Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing information. Meantime, many haplotypes and partial haplotype patterns have been previously identified due to tight linkage disequilibrium (LD) among these clustered genes thus can be incorporated to facilitate haplotype inference. In this paper, we developed a hidden Markov model (HMM) based method that can incorporate identified haplotypes or partial haplotype patterns for haplotype inference from present-absent data of clustered genes (e.g., KIR genes). We compared its performance with an expectation maximization (EM) based method previously developed in terms of haplotype assignments and haplotype frequency estimation through extensive simulations for KIR genes. The simulation results showed that the new HMM based method outperformed the previous method when some incorrect haplotypes were included as identified haplotypes and/or the standard deviation of haplotype frequencies were small. We also compared the performance of our method with two methods that do not use previously identified haplotypes and haplotype patterns, including an EM based method, HPALORE, and a HMM based method, MaCH. Our simulation results showed that the incorporation of identified haplotypes and partial haplotype patterns can improve accuracy for haplotype inference. The new software package HaploHMM is available and can be downloaded at http://www.soph.uab.edu/ssg/files/People/KZhang/HaploHMM/haplohmm-index.html.

No MeSH data available.


Related in: MedlinePlus

Average values of six measures (IH, SAD, SE, IE, IME, and ISE) over 500 replicates with the sample size of 100 and the assumption of HWE under different haplotype frequency distributions. The x-axis represents the standard deviation of haplotype frequencies used in simulations. Results were obtained when no incorrect haplotypes were included as identified haplotypes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4129397&req=5

Figure 1: Average values of six measures (IH, SAD, SE, IE, IME, and ISE) over 500 replicates with the sample size of 100 and the assumption of HWE under different haplotype frequency distributions. The x-axis represents the standard deviation of haplotype frequencies used in simulations. Results were obtained when no incorrect haplotypes were included as identified haplotypes.

Mentions: Figures 1, 2 show the average values of six measures over 500 replicates with the sample size of 100 and the assumption of HWE under different haplotype frequency distributions (Table 2). It can be seen that the average IH values from HaploHMM and HaploIHP ranged from 0.72 to 0.88 and were much higher for the average values from MaCH and HAPLORE which ranged from 0.38 to 0.57. The average IH values decreased slightly with the increasing of standard deviation of haplotype frequencies used in simulations. When the correct haplotypes were used as identified haplotypes, HaploIHP always had better performance than HaploHMM. When some incorrect haplotypes were included as identified haplotypes, HaploHMM had the slightly larger IH values when the standard deviation of haplotype frequencies was larger than 0.11. It is worth noting that the EM based method (HaploIHP) had the larger IH values than those of the HMM based greedy method (HaploHMM) when the identified haplotypes and haplotype patterns were used while the EM based method (HAPLORE) had the smaller IH values than those of the HMM based method (MaCH) when the identified haplotypes and haplotype patterns were used. This is because that HaploIHP uses the identified haplotypes and haplotype patterns to reduce the number of compatible haplotypes in the EM thus results in more accurate estimation of haplotypes, HAPLORE results in many haplotypes with small frequency due to a large number of compatible haplotypes from the missing data thus has the smaller IH values. In terms of SAD, the sum of differences between true haplotype frequencies and estimated haplotype frequencies from HaploHMM ranged from 0.25 to 0.57 and were always bigger than those from HaploIHP. This is not unexpected since HaploHMM only used 200 haplotypes from the last round of HMM iteration to estimate haplotype frequencies while HaploIHP used the EM algorithm. The average values of SAD from HaploIHP and HaploHMM were smaller than those from MaCH and HAPLORE.


A hidden Markov model for haplotype inference for present-absent data of clustered genes using identified haplotypes and haplotype patterns.

Wu J, Chen GB, Zhi D, Liu N, Zhang K - Front Genet (2014)

Average values of six measures (IH, SAD, SE, IE, IME, and ISE) over 500 replicates with the sample size of 100 and the assumption of HWE under different haplotype frequency distributions. The x-axis represents the standard deviation of haplotype frequencies used in simulations. Results were obtained when no incorrect haplotypes were included as identified haplotypes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4129397&req=5

Figure 1: Average values of six measures (IH, SAD, SE, IE, IME, and ISE) over 500 replicates with the sample size of 100 and the assumption of HWE under different haplotype frequency distributions. The x-axis represents the standard deviation of haplotype frequencies used in simulations. Results were obtained when no incorrect haplotypes were included as identified haplotypes.
Mentions: Figures 1, 2 show the average values of six measures over 500 replicates with the sample size of 100 and the assumption of HWE under different haplotype frequency distributions (Table 2). It can be seen that the average IH values from HaploHMM and HaploIHP ranged from 0.72 to 0.88 and were much higher for the average values from MaCH and HAPLORE which ranged from 0.38 to 0.57. The average IH values decreased slightly with the increasing of standard deviation of haplotype frequencies used in simulations. When the correct haplotypes were used as identified haplotypes, HaploIHP always had better performance than HaploHMM. When some incorrect haplotypes were included as identified haplotypes, HaploHMM had the slightly larger IH values when the standard deviation of haplotype frequencies was larger than 0.11. It is worth noting that the EM based method (HaploIHP) had the larger IH values than those of the HMM based greedy method (HaploHMM) when the identified haplotypes and haplotype patterns were used while the EM based method (HAPLORE) had the smaller IH values than those of the HMM based method (MaCH) when the identified haplotypes and haplotype patterns were used. This is because that HaploIHP uses the identified haplotypes and haplotype patterns to reduce the number of compatible haplotypes in the EM thus results in more accurate estimation of haplotypes, HAPLORE results in many haplotypes with small frequency due to a large number of compatible haplotypes from the missing data thus has the smaller IH values. In terms of SAD, the sum of differences between true haplotype frequencies and estimated haplotype frequencies from HaploHMM ranged from 0.25 to 0.57 and were always bigger than those from HaploIHP. This is not unexpected since HaploHMM only used 200 haplotypes from the last round of HMM iteration to estimate haplotype frequencies while HaploIHP used the EM algorithm. The average values of SAD from HaploIHP and HaploHMM were smaller than those from MaCH and HAPLORE.

Bottom Line: Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown.Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing information.Our simulation results showed that the incorporation of identified haplotypes and partial haplotype patterns can improve accuracy for haplotype inference.

View Article: PubMed Central - PubMed

Affiliation: Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA.

ABSTRACT
The majority of killer cell immunoglobin-like receptor (KIR) genes are detected as either present or absent using locus-specific genotyping technology. Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown. Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing information. Meantime, many haplotypes and partial haplotype patterns have been previously identified due to tight linkage disequilibrium (LD) among these clustered genes thus can be incorporated to facilitate haplotype inference. In this paper, we developed a hidden Markov model (HMM) based method that can incorporate identified haplotypes or partial haplotype patterns for haplotype inference from present-absent data of clustered genes (e.g., KIR genes). We compared its performance with an expectation maximization (EM) based method previously developed in terms of haplotype assignments and haplotype frequency estimation through extensive simulations for KIR genes. The simulation results showed that the new HMM based method outperformed the previous method when some incorrect haplotypes were included as identified haplotypes and/or the standard deviation of haplotype frequencies were small. We also compared the performance of our method with two methods that do not use previously identified haplotypes and haplotype patterns, including an EM based method, HPALORE, and a HMM based method, MaCH. Our simulation results showed that the incorporation of identified haplotypes and partial haplotype patterns can improve accuracy for haplotype inference. The new software package HaploHMM is available and can be downloaded at http://www.soph.uab.edu/ssg/files/People/KZhang/HaploHMM/haplohmm-index.html.

No MeSH data available.


Related in: MedlinePlus