Limits...
Detection of identity by descent using next-generation whole genome sequencing data.

Su SY, Kasberger J, Baranzini S, Byerley W, Liao W, Oksenberg J, Sherr E, Jorgenson E - BMC Bioinformatics (2012)

Bottom Line: We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate.We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments.These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Ernest Gallo Clinic and Research Center, University of California San Francisco, 5858 Horton St., Suite 200, Emeryville, CA 94608, USA. shuyisu@gallo.ucsf.edu

ABSTRACT

Background: Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the soft- ware programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, in- cluding identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data.

Results: Here, we investigate how different levels of variant coverage in sequencing and microarray genotype data influences the resolution at which IBD can be detected. This includes microarray genotype data from the WTCCC study, denser genotype data from the HapMap Project, low coverage sequencing data from the 1000 Genomes Project, and deep coverage complete genome data from our own projects. With high power (78%), we can detect segments of length 0.4 cM or larger using fastIBD and GERMLINE in sequencing data. This compares to similar power to detect segments of length 1.0 cM or higher with microarray genotype data. We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate.

Conclusion: We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments. These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies.

Show MeSH
Illustration of the construction of composite segments. Each line represents the chromosome sequence of an individual. The colored circle represents the consecutive sequence of a segment size 0.02 cM, which may contain multiple SNPs. A composite segment of size 0.2 cM is composed of 10 consecutive segments of size 0.02 cM from 10 different individuals. To create a composite segment of size 0.4 cM, two composite segments of size 0.2 cM are constructed and merged. A similar procedure is conducted to create composite segments of size 0.6, 1 and 2 cM, where three, five and ten small composite segments are constructed and merged respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3403908&req=5

Figure 7: Illustration of the construction of composite segments. Each line represents the chromosome sequence of an individual. The colored circle represents the consecutive sequence of a segment size 0.02 cM, which may contain multiple SNPs. A composite segment of size 0.2 cM is composed of 10 consecutive segments of size 0.02 cM from 10 different individuals. To create a composite segment of size 0.4 cM, two composite segments of size 0.2 cM are constructed and merged. A similar procedure is conducted to create composite segments of size 0.6, 1 and 2 cM, where three, five and ten small composite segments are constructed and merged respectively.

Mentions: Construction of composite individuals for assessing the false positive rate. We then investigated the rate of falsely detecting IBD when no IBD is present through a sec- ond simulation. We started by constructing a composite chromosomal segment for each of 10 simulated individuals [14]. To do this, we selected 100 individuals from the each of the WTCCC and 1000 Genomes datasets. Composite chromosome segments of length 0.2 cM were constructed by copying 10 consecutive regions of length 0.02 cM from 10 different individuals (Figure 7). To create segments of 0.4, 0.6, 1, and 2 cM, we combined 2, 3, 5, and 10 consecutive composite segments. In this way, we generated 10 simulated individuals with specific chromosomal segments that are not IBD. In this simulation setting, we expect that any pair of these 10 individuals with composite chromosomal segments are unlikely to share any part of that segment longer than 0.02 cM. Thus, detecting an IBD segment longer than 0.02 cM among these individuals can be considered as a false positive.


Detection of identity by descent using next-generation whole genome sequencing data.

Su SY, Kasberger J, Baranzini S, Byerley W, Liao W, Oksenberg J, Sherr E, Jorgenson E - BMC Bioinformatics (2012)

Illustration of the construction of composite segments. Each line represents the chromosome sequence of an individual. The colored circle represents the consecutive sequence of a segment size 0.02 cM, which may contain multiple SNPs. A composite segment of size 0.2 cM is composed of 10 consecutive segments of size 0.02 cM from 10 different individuals. To create a composite segment of size 0.4 cM, two composite segments of size 0.2 cM are constructed and merged. A similar procedure is conducted to create composite segments of size 0.6, 1 and 2 cM, where three, five and ten small composite segments are constructed and merged respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3403908&req=5

Figure 7: Illustration of the construction of composite segments. Each line represents the chromosome sequence of an individual. The colored circle represents the consecutive sequence of a segment size 0.02 cM, which may contain multiple SNPs. A composite segment of size 0.2 cM is composed of 10 consecutive segments of size 0.02 cM from 10 different individuals. To create a composite segment of size 0.4 cM, two composite segments of size 0.2 cM are constructed and merged. A similar procedure is conducted to create composite segments of size 0.6, 1 and 2 cM, where three, five and ten small composite segments are constructed and merged respectively.
Mentions: Construction of composite individuals for assessing the false positive rate. We then investigated the rate of falsely detecting IBD when no IBD is present through a sec- ond simulation. We started by constructing a composite chromosomal segment for each of 10 simulated individuals [14]. To do this, we selected 100 individuals from the each of the WTCCC and 1000 Genomes datasets. Composite chromosome segments of length 0.2 cM were constructed by copying 10 consecutive regions of length 0.02 cM from 10 different individuals (Figure 7). To create segments of 0.4, 0.6, 1, and 2 cM, we combined 2, 3, 5, and 10 consecutive composite segments. In this way, we generated 10 simulated individuals with specific chromosomal segments that are not IBD. In this simulation setting, we expect that any pair of these 10 individuals with composite chromosomal segments are unlikely to share any part of that segment longer than 0.02 cM. Thus, detecting an IBD segment longer than 0.02 cM among these individuals can be considered as a false positive.

Bottom Line: We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate.We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments.These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies.

View Article: PubMed Central - HTML - PubMed

Affiliation: Ernest Gallo Clinic and Research Center, University of California San Francisco, 5858 Horton St., Suite 200, Emeryville, CA 94608, USA. shuyisu@gallo.ucsf.edu

ABSTRACT

Background: Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the soft- ware programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, in- cluding identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data.

Results: Here, we investigate how different levels of variant coverage in sequencing and microarray genotype data influences the resolution at which IBD can be detected. This includes microarray genotype data from the WTCCC study, denser genotype data from the HapMap Project, low coverage sequencing data from the 1000 Genomes Project, and deep coverage complete genome data from our own projects. With high power (78%), we can detect segments of length 0.4 cM or larger using fastIBD and GERMLINE in sequencing data. This compares to similar power to detect segments of length 1.0 cM or higher with microarray genotype data. We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate.

Conclusion: We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments. These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies.

Show MeSH