Limits...
Construction and analysis of high-density linkage map using high-throughput sequencing data.

Liu D, Ma C, Hong W, Huang L, Liu M, Liu H, Zeng H, Deng D, Xin H, Song J, Xu C, Sun X, Hou X, Wang X, Zheng H - PLoS ONE (2014)

Bottom Line: HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm.The singleton rate was less than one-ninth of that generated by JoinMap4.1.It will facilitate genome assembling, comparative genomic analysis, and QTL studies.

View Article: PubMed Central - PubMed

Affiliation: Biomarker Technologies Corporation, Beijing, China.

ABSTRACT
Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/.

Show MeSH

Related in: MedlinePlus

Modules of HighMap algorithm.A: The single-linkage clustering algorithm was used to partition the marker loci into linkage groups based on a pairwise modified independence LOD score for the recombination frequency. B and B': The ordering module combines Gibbs sampling, spatial sampling, and simulated annealing algorithm to order markers and estimate map distances. C: The error correction module identified singletons according to parental contribution of genotypes and eliminated them from the data using k-nearest neighbor algorithm. To order markers correctly, the processes of ordering and error correction were carried out iteratively. D: Heat maps and haplotype maps were constructed to evaluate map quality.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4048240&req=5

pone-0098855-g001: Modules of HighMap algorithm.A: The single-linkage clustering algorithm was used to partition the marker loci into linkage groups based on a pairwise modified independence LOD score for the recombination frequency. B and B': The ordering module combines Gibbs sampling, spatial sampling, and simulated annealing algorithm to order markers and estimate map distances. C: The error correction module identified singletons according to parental contribution of genotypes and eliminated them from the data using k-nearest neighbor algorithm. To order markers correctly, the processes of ordering and error correction were carried out iteratively. D: Heat maps and haplotype maps were constructed to evaluate map quality.

Mentions: Here we report a new strategy, the iterative ordering and error correction, to construct high-density genetic maps. We referred to the error correction strategy of SMOOTH [22], and used a k-nearest neighbor algorithm to correct genotyping errors and impute genotyping missing [34]. We employed the enhanced algorithm of Gibbs sampling, spatial sampling and simulated annealing (GSS) [27], [35] to order markers. GSS marker ordering algorithm is computationally efficient [27], but it generates inflated map distances, and has unstable map quality, especially for the data high in genotyping errors. To ensure stability of map quality, we enhanced GSS by using the summation of adjacent recombination fractions (SARF) as objective function and adopted Blocked Gibbs sampler after trying different Gibbs sampling methods and different objective functions in simulated annealing. HighMap consists of four modules, designed for linkage grouping, marker ordering, error genotyping correction and map evaluation, respectively (Figure 1). The map evaluation module provides heat mapsand haplotype maps for intuitive displays of map quality [36].


Construction and analysis of high-density linkage map using high-throughput sequencing data.

Liu D, Ma C, Hong W, Huang L, Liu M, Liu H, Zeng H, Deng D, Xin H, Song J, Xu C, Sun X, Hou X, Wang X, Zheng H - PLoS ONE (2014)

Modules of HighMap algorithm.A: The single-linkage clustering algorithm was used to partition the marker loci into linkage groups based on a pairwise modified independence LOD score for the recombination frequency. B and B': The ordering module combines Gibbs sampling, spatial sampling, and simulated annealing algorithm to order markers and estimate map distances. C: The error correction module identified singletons according to parental contribution of genotypes and eliminated them from the data using k-nearest neighbor algorithm. To order markers correctly, the processes of ordering and error correction were carried out iteratively. D: Heat maps and haplotype maps were constructed to evaluate map quality.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4048240&req=5

pone-0098855-g001: Modules of HighMap algorithm.A: The single-linkage clustering algorithm was used to partition the marker loci into linkage groups based on a pairwise modified independence LOD score for the recombination frequency. B and B': The ordering module combines Gibbs sampling, spatial sampling, and simulated annealing algorithm to order markers and estimate map distances. C: The error correction module identified singletons according to parental contribution of genotypes and eliminated them from the data using k-nearest neighbor algorithm. To order markers correctly, the processes of ordering and error correction were carried out iteratively. D: Heat maps and haplotype maps were constructed to evaluate map quality.
Mentions: Here we report a new strategy, the iterative ordering and error correction, to construct high-density genetic maps. We referred to the error correction strategy of SMOOTH [22], and used a k-nearest neighbor algorithm to correct genotyping errors and impute genotyping missing [34]. We employed the enhanced algorithm of Gibbs sampling, spatial sampling and simulated annealing (GSS) [27], [35] to order markers. GSS marker ordering algorithm is computationally efficient [27], but it generates inflated map distances, and has unstable map quality, especially for the data high in genotyping errors. To ensure stability of map quality, we enhanced GSS by using the summation of adjacent recombination fractions (SARF) as objective function and adopted Blocked Gibbs sampler after trying different Gibbs sampling methods and different objective functions in simulated annealing. HighMap consists of four modules, designed for linkage grouping, marker ordering, error genotyping correction and map evaluation, respectively (Figure 1). The map evaluation module provides heat mapsand haplotype maps for intuitive displays of map quality [36].

Bottom Line: HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm.The singleton rate was less than one-ninth of that generated by JoinMap4.1.It will facilitate genome assembling, comparative genomic analysis, and QTL studies.

View Article: PubMed Central - PubMed

Affiliation: Biomarker Technologies Corporation, Beijing, China.

ABSTRACT
Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/.

Show MeSH
Related in: MedlinePlus