Limits...
Cloud computing-based TagSNP selection algorithm for human genome data.

Hung CL, Chen WP, Hua GJ, Zheng H, Tsai SJ, Lin YL - Int J Mol Sci (2015)

Bottom Line: They provide the highest-resolution genetic fingerprint for identifying disease associations and human features.Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population.Haplotype block structures are used in association-based methods to map disease genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Communication Engineering, Providence University, Taichung 43301, Taiwan. clhung@pu.edu.tw.

ABSTRACT
Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.

Show MeSH
Performance comparison between sequential and MapReduce haplotype block selection (block size = 500 bp). (a) Number of Patterns is 40; (b) Number of Patterns is 80; (c) Number of Patterns is 100 and (d) Number of Patterns is 120.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4307292&req=5

ijms-16-01096-f005: Performance comparison between sequential and MapReduce haplotype block selection (block size = 500 bp). (a) Number of Patterns is 40; (b) Number of Patterns is 80; (c) Number of Patterns is 100 and (d) Number of Patterns is 120.

Mentions: To assess the performance of the proposed Hadoop MapReduce algorithm, we compared the computational time required to process various sequence data and different numbers of map/reduce operations. The performances of both the sequential and the proposed algorithm depend on the number and length of the patterns. Patil et al. [9] proposed that haplotype blocks reside within 300-bp and 500-bp regions. Therefore, we assumed block sizes of 300 bp and 500 bp. The diversity scores are based on the corresponding block sizes and are denoted as {δ(1, 1), δ(1, 2), …, δ(1, 500), δ(2, 2), …, δ(2, 501), δ(3, 3), …, δ(L, L)}. Figure 4 and Figure 5 compare the performances of the sequential algorithm and our MapReduce framework-based algorithm for block sizes of 300 bp and 500 bp, respectively. The computational time increases with increasing pattern number and sequence length. Our algorithm processes the 300-bp block more rapidly than the 500-bp block. More patterns and longer sequence lengths incur a higher computational cost. These results are consistent with the algorithm analysis presented in the previous section.


Cloud computing-based TagSNP selection algorithm for human genome data.

Hung CL, Chen WP, Hua GJ, Zheng H, Tsai SJ, Lin YL - Int J Mol Sci (2015)

Performance comparison between sequential and MapReduce haplotype block selection (block size = 500 bp). (a) Number of Patterns is 40; (b) Number of Patterns is 80; (c) Number of Patterns is 100 and (d) Number of Patterns is 120.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4307292&req=5

ijms-16-01096-f005: Performance comparison between sequential and MapReduce haplotype block selection (block size = 500 bp). (a) Number of Patterns is 40; (b) Number of Patterns is 80; (c) Number of Patterns is 100 and (d) Number of Patterns is 120.
Mentions: To assess the performance of the proposed Hadoop MapReduce algorithm, we compared the computational time required to process various sequence data and different numbers of map/reduce operations. The performances of both the sequential and the proposed algorithm depend on the number and length of the patterns. Patil et al. [9] proposed that haplotype blocks reside within 300-bp and 500-bp regions. Therefore, we assumed block sizes of 300 bp and 500 bp. The diversity scores are based on the corresponding block sizes and are denoted as {δ(1, 1), δ(1, 2), …, δ(1, 500), δ(2, 2), …, δ(2, 501), δ(3, 3), …, δ(L, L)}. Figure 4 and Figure 5 compare the performances of the sequential algorithm and our MapReduce framework-based algorithm for block sizes of 300 bp and 500 bp, respectively. The computational time increases with increasing pattern number and sequence length. Our algorithm processes the 300-bp block more rapidly than the 500-bp block. More patterns and longer sequence lengths incur a higher computational cost. These results are consistent with the algorithm analysis presented in the previous section.

Bottom Line: They provide the highest-resolution genetic fingerprint for identifying disease associations and human features.Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population.Haplotype block structures are used in association-based methods to map disease genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Communication Engineering, Providence University, Taichung 43301, Taiwan. clhung@pu.edu.tw.

ABSTRACT
Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.

Show MeSH