Limits...
Phasing for medical sequencing using rare variants and large haplotype reference panels.

Sharp K, Kretzschmar W, Delaneau O, Marchini J - Bioinformatics (2016)

Bottom Line: Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed.For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage.These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of Oxford, Oxford, UK.

No MeSH data available.


Related in: MedlinePlus

Properties of using rare variants for state selection. (a) Effect on switch error rate of varying the minimum minor allele count used for selecting individuals from whom to copy in SHAPEITR. Horizontal axes: minimum minor allele count (bottom) and corresponding frequency in panel (top) used for selection. Solid lines: mean switch error rates for SHAPEITR; dashed (and dashdot) lines: mean switch error rates for SHAPEIT2 with (and without) MCMC. Colours indicate whether  (red) or  (blue) copying states were used. In both cases, errors refer to phasing the whole of chromosome 20 and were averaged over both trio parents and 20 runs. (b) Distribution of maximum allele counts used for matching in a single window when choosing K = 400 copying states. Horizontal axis: maximum minor allele count (bottom) and corresponding frequency in the reference panel (top) of a site used for matching. Vertical axis: frequency averaged over both trio parents and 20 different runs. Each bar represents a bin of width  corresponding to an allele count of 20
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4920110&req=5

btw065-F3: Properties of using rare variants for state selection. (a) Effect on switch error rate of varying the minimum minor allele count used for selecting individuals from whom to copy in SHAPEITR. Horizontal axes: minimum minor allele count (bottom) and corresponding frequency in panel (top) used for selection. Solid lines: mean switch error rates for SHAPEITR; dashed (and dashdot) lines: mean switch error rates for SHAPEIT2 with (and without) MCMC. Colours indicate whether (red) or (blue) copying states were used. In both cases, errors refer to phasing the whole of chromosome 20 and were averaged over both trio parents and 20 runs. (b) Distribution of maximum allele counts used for matching in a single window when choosing K = 400 copying states. Horizontal axis: maximum minor allele count (bottom) and corresponding frequency in the reference panel (top) of a site used for matching. Vertical axis: frequency averaged over both trio parents and 20 different runs. Each bar represents a bin of width corresponding to an allele count of 20

Mentions: The method of copying state selection used by SHAPEITR is based on the premise that alleles shared between a reference haplotype and an unphased genotype are more phase-informative when they are more rare. Figure 3a supports this presumption. As the minimum allele count used for selecting copying states in SHAPEITR (solid lines) is increased from 1 to 20, it is evident that the improvement in accuracy from using SHAPEITR is steadily eroded. Performance does remain better than SHAPEIT2 without using MCMC which reflects a much better initial choice of copying states. However, the results for K = 400 indicate that, when SHAPEIT2 uses MCMC iterations to update this choice, copying states chosen based on sharing of rare alleles with a minor allele count of ∼6 or greater are already no more informative of phase than the Hamming distance metric employed by SHAPEIT2. As expected, performance for K = 800 is more robust to loss of information from the lowest frequency alleles; typically the number of sites used by our algorithm for copy state selection is greater for larger K. While, on average, the sites corresponding to higher frequency alleles are less informative, their greater number (for K = 800) gives greater coverage within the window.Fig. 3.


Phasing for medical sequencing using rare variants and large haplotype reference panels.

Sharp K, Kretzschmar W, Delaneau O, Marchini J - Bioinformatics (2016)

Properties of using rare variants for state selection. (a) Effect on switch error rate of varying the minimum minor allele count used for selecting individuals from whom to copy in SHAPEITR. Horizontal axes: minimum minor allele count (bottom) and corresponding frequency in panel (top) used for selection. Solid lines: mean switch error rates for SHAPEITR; dashed (and dashdot) lines: mean switch error rates for SHAPEIT2 with (and without) MCMC. Colours indicate whether  (red) or  (blue) copying states were used. In both cases, errors refer to phasing the whole of chromosome 20 and were averaged over both trio parents and 20 runs. (b) Distribution of maximum allele counts used for matching in a single window when choosing K = 400 copying states. Horizontal axis: maximum minor allele count (bottom) and corresponding frequency in the reference panel (top) of a site used for matching. Vertical axis: frequency averaged over both trio parents and 20 different runs. Each bar represents a bin of width  corresponding to an allele count of 20
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4920110&req=5

btw065-F3: Properties of using rare variants for state selection. (a) Effect on switch error rate of varying the minimum minor allele count used for selecting individuals from whom to copy in SHAPEITR. Horizontal axes: minimum minor allele count (bottom) and corresponding frequency in panel (top) used for selection. Solid lines: mean switch error rates for SHAPEITR; dashed (and dashdot) lines: mean switch error rates for SHAPEIT2 with (and without) MCMC. Colours indicate whether (red) or (blue) copying states were used. In both cases, errors refer to phasing the whole of chromosome 20 and were averaged over both trio parents and 20 runs. (b) Distribution of maximum allele counts used for matching in a single window when choosing K = 400 copying states. Horizontal axis: maximum minor allele count (bottom) and corresponding frequency in the reference panel (top) of a site used for matching. Vertical axis: frequency averaged over both trio parents and 20 different runs. Each bar represents a bin of width corresponding to an allele count of 20
Mentions: The method of copying state selection used by SHAPEITR is based on the premise that alleles shared between a reference haplotype and an unphased genotype are more phase-informative when they are more rare. Figure 3a supports this presumption. As the minimum allele count used for selecting copying states in SHAPEITR (solid lines) is increased from 1 to 20, it is evident that the improvement in accuracy from using SHAPEITR is steadily eroded. Performance does remain better than SHAPEIT2 without using MCMC which reflects a much better initial choice of copying states. However, the results for K = 400 indicate that, when SHAPEIT2 uses MCMC iterations to update this choice, copying states chosen based on sharing of rare alleles with a minor allele count of ∼6 or greater are already no more informative of phase than the Hamming distance metric employed by SHAPEIT2. As expected, performance for K = 800 is more robust to loss of information from the lowest frequency alleles; typically the number of sites used by our algorithm for copy state selection is greater for larger K. While, on average, the sites corresponding to higher frequency alleles are less informative, their greater number (for K = 800) gives greater coverage within the window.Fig. 3.

Bottom Line: Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed.For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage.These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of Oxford, Oxford, UK.

No MeSH data available.


Related in: MedlinePlus