PIGS: improved estimates of identity-by-descent probabilities by probabilistic IBD graph sampling.
Bottom Line: Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics.IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci.Simultaneous detection of IBD over multiple haplotypes has proven to be computationally difficult.
Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has proven to be computationally difficult. To overcome this, many state of the art methods estimate the probability of IBD between each pair of haplotypes separately. While computationally efficient, these methods fail to leverage the clique structure of IBD resulting in less powerful IBD identification, especially for small IBD segments.
License 1 - License 2
Mentions: As shown in Figure 7, when considering PIGS and Refined IBD calls at 0.5 centimorgans, there is an increase of 10% in the number of segments identified by PIGS over Refined IBD. After applying DASH and EMI to the input of both methods we see an increase of 8% and 7%, respectively, for PIGS input. It is clear that both DASH and EMI improve the power of both main approaches to detect IBD for use in association studies regardless of the segment size. DASH and EMI seem to perform similarly in terms of boosting power when called segments are bigger than 0.8 centimorgans, but EMI appears to have the upper hand for anything smaller. For example, at 0.5 centimorgans the difference between EMI and DASH for PIGS input is 8% but at 0.8 centimorgans the difference is only 0.8%. Across all segment sizes, we see increases of 4%, 3%, and 2.5% for PIGS, P-DASH, and P-EMI over their Refined IBD counterparts. The increases are more modest than in the simulated data, most likely due to the fact that without sequencing data we are underpowered to detect small segments of IBD even when trio phased genotypes are available.