Limits...
ISHAPE: new rapid and accurate software for haplotyping.

Delaneau O, Coulonges C, Boelle PY, Nelson G, Spadoni JL, Zagury JF - BMC Bioinformatics (2007)

Bottom Line: For adjacent SNPs Ishape2 is superior to the other software both in terms of speed and accuracy.For SNPs spaced by 5 Kb, Ishape2 yields similar results to Phase2.1 in terms of accuracy, and both outperform the other software.These results show that the Ishape heuristic approach for haplotyping is very competitive in terms of accuracy and speed and deserves to be evaluated extensively for possible future widespread use.

View Article: PubMed Central - HTML - PubMed

Affiliation: Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France. olivier.delaneau@gmail.com <olivier.delaneau@gmail.com>

ABSTRACT

Background: We have developed a new haplotyping program based on the combination of an iterative multiallelic EM algorithm (IEM), bootstrap resampling and a pseudo Gibbs sampler. The use of the IEM-bootstrap procedure considerably reduces the space of possible haplotype configurations to be explored, greatly reducing computation time, while the adaptation of the Gibbs sampler with a recombination model on this restricted space maintains high accuracy. On large SNP datasets (>30 SNPs), we used a segmented approach based on a specific partition-ligation strategy. We compared this software, Ishape (Iterative Segmented HAPlotyping by Em), with reference programs such as Phase, Fastphase, and PL-EM. Analogously with Phase, there are 2 versions of Ishape: Ishape1 which uses a simple coalescence model for the pseudo Gibbs sampler step, and Ishape2 which uses a recombination model instead.

Results: We tested the program on 2 types of real SNP datasets derived from Hapmap: adjacent SNPs (high LD) and SNPs spaced by 5 Kb (lower level of LD). In both cases, we tested 100 replicates for each size: 10, 20, 30, 40, 50, 60, and 80 SNPs. For adjacent SNPs Ishape2 is superior to the other software both in terms of speed and accuracy. For SNPs spaced by 5 Kb, Ishape2 yields similar results to Phase2.1 in terms of accuracy, and both outperform the other software. In terms of speed, Ishape2 runs about 4 times faster than Phase2.1 with 10 SNPs, and about 10 times faster with 80 SNPs. For the case of 5kb-spaced SNPs, Fastphase may run faster with more than 100 SNPs.

Conclusion: These results show that the Ishape heuristic approach for haplotyping is very competitive in terms of accuracy and speed and deserves to be evaluated extensively for possible future widespread use.

Show MeSH
Measure of the impact of the number of bootstrap samples on the size and the relevance of the candidate haplotypes space for the APOE gene. Black line and left scale are for ICR (capture rate of true haplotypes configurations). Dashed line and right scale are for ANCR (average number of candidates per genotype).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1919397&req=5

Figure 2: Measure of the impact of the number of bootstrap samples on the size and the relevance of the candidate haplotypes space for the APOE gene. Black line and left scale are for ICR (capture rate of true haplotypes configurations). Dashed line and right scale are for ANCR (average number of candidates per genotype).

Mentions: First, we evaluated the impact of the number of bootstrap resamplings on the size and the relevance of the space of the candidate haplotypes produced. To characterize the relationships between these points, we applied our algorithm several times with a growing number of bootstrap resamplings (2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048 and 4096) on two datasets, GH1 and APOE (see Materials and Methods section) with different levels of missing data (0%, 2%, 5% and 10%). For each, we replicated the test 100 times and measured the ICR and the number of candidate haplotype configurations. Figure 2 summarizes the results obtained for the APOE gene. We observed a limit in the size of the haplotype space and ability of the bootstrap to capture the true haplotypes configurations was reached after 256 samplings whatever the percentage of missing data (see ICR curve in Figure 2). On our test platform (see Materials and Methods), 512 bootstrap resamplings took 0.3 and 0.7 seconds for 0% and 10% missing data (respectively).


ISHAPE: new rapid and accurate software for haplotyping.

Delaneau O, Coulonges C, Boelle PY, Nelson G, Spadoni JL, Zagury JF - BMC Bioinformatics (2007)

Measure of the impact of the number of bootstrap samples on the size and the relevance of the candidate haplotypes space for the APOE gene. Black line and left scale are for ICR (capture rate of true haplotypes configurations). Dashed line and right scale are for ANCR (average number of candidates per genotype).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1919397&req=5

Figure 2: Measure of the impact of the number of bootstrap samples on the size and the relevance of the candidate haplotypes space for the APOE gene. Black line and left scale are for ICR (capture rate of true haplotypes configurations). Dashed line and right scale are for ANCR (average number of candidates per genotype).
Mentions: First, we evaluated the impact of the number of bootstrap resamplings on the size and the relevance of the space of the candidate haplotypes produced. To characterize the relationships between these points, we applied our algorithm several times with a growing number of bootstrap resamplings (2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048 and 4096) on two datasets, GH1 and APOE (see Materials and Methods section) with different levels of missing data (0%, 2%, 5% and 10%). For each, we replicated the test 100 times and measured the ICR and the number of candidate haplotype configurations. Figure 2 summarizes the results obtained for the APOE gene. We observed a limit in the size of the haplotype space and ability of the bootstrap to capture the true haplotypes configurations was reached after 256 samplings whatever the percentage of missing data (see ICR curve in Figure 2). On our test platform (see Materials and Methods), 512 bootstrap resamplings took 0.3 and 0.7 seconds for 0% and 10% missing data (respectively).

Bottom Line: For adjacent SNPs Ishape2 is superior to the other software both in terms of speed and accuracy.For SNPs spaced by 5 Kb, Ishape2 yields similar results to Phase2.1 in terms of accuracy, and both outperform the other software.These results show that the Ishape heuristic approach for haplotyping is very competitive in terms of accuracy and speed and deserves to be evaluated extensively for possible future widespread use.

View Article: PubMed Central - HTML - PubMed

Affiliation: Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, Paris, France. olivier.delaneau@gmail.com <olivier.delaneau@gmail.com>

ABSTRACT

Background: We have developed a new haplotyping program based on the combination of an iterative multiallelic EM algorithm (IEM), bootstrap resampling and a pseudo Gibbs sampler. The use of the IEM-bootstrap procedure considerably reduces the space of possible haplotype configurations to be explored, greatly reducing computation time, while the adaptation of the Gibbs sampler with a recombination model on this restricted space maintains high accuracy. On large SNP datasets (>30 SNPs), we used a segmented approach based on a specific partition-ligation strategy. We compared this software, Ishape (Iterative Segmented HAPlotyping by Em), with reference programs such as Phase, Fastphase, and PL-EM. Analogously with Phase, there are 2 versions of Ishape: Ishape1 which uses a simple coalescence model for the pseudo Gibbs sampler step, and Ishape2 which uses a recombination model instead.

Results: We tested the program on 2 types of real SNP datasets derived from Hapmap: adjacent SNPs (high LD) and SNPs spaced by 5 Kb (lower level of LD). In both cases, we tested 100 replicates for each size: 10, 20, 30, 40, 50, 60, and 80 SNPs. For adjacent SNPs Ishape2 is superior to the other software both in terms of speed and accuracy. For SNPs spaced by 5 Kb, Ishape2 yields similar results to Phase2.1 in terms of accuracy, and both outperform the other software. In terms of speed, Ishape2 runs about 4 times faster than Phase2.1 with 10 SNPs, and about 10 times faster with 80 SNPs. For the case of 5kb-spaced SNPs, Fastphase may run faster with more than 100 SNPs.

Conclusion: These results show that the Ishape heuristic approach for haplotyping is very competitive in terms of accuracy and speed and deserves to be evaluated extensively for possible future widespread use.

Show MeSH