Limits...
High resolution discovery and confirmation of copy number variants in 90 Yoruba Nigerians.

Matsuzaki H, Wang PH, Hu J, Rava R, Fu GK - Genome Biol. (2009)

Bottom Line: We used custom high density oligonucleotide arrays in whole-genome scans at approximately 200-bp resolution, and followed up with a localized CNV typing array at resolutions as close as 10 bp, to confirm regions from the initial genome scans, and to detect the occurrence of sample-level events at shorter CNV regions identified in recent whole-genome sequencing studies.Event frequencies were noticeably higher at shorter regions < 1 kb compared to longer CNVs (> 1 kb).As new shorter CNVs are discovered through whole-genome sequencing, high resolution microarrays offer a cost-effective means to detect the occurrence of events at these regions in large numbers of individuals in order to gain biological insights beyond the initial discovery.

View Article: PubMed Central - HTML - PubMed

Affiliation: Affymetrix, Inc, 3420 Central Expressway, Santa Clara, CA 95051, USA. hajime_matsuzaki@affymetrix.com

ABSTRACT

Background: Copy number variants (CNVs) account for a large proportion of genetic variation in the genome. The initial discoveries of long (> 100 kb) CNVs in normal healthy individuals were made on BAC arrays and low resolution oligonucleotide arrays. Subsequent studies that used higher resolution microarrays and SNP genotyping arrays detected the presence of large numbers of CNVs that are < 100 kb, with median lengths of approximately 10 kb. More recently, whole genome sequencing of individuals has revealed an abundance of shorter CNVs with lengths < 1 kb.

Results: We used custom high density oligonucleotide arrays in whole-genome scans at approximately 200-bp resolution, and followed up with a localized CNV typing array at resolutions as close as 10 bp, to confirm regions from the initial genome scans, and to detect the occurrence of sample-level events at shorter CNV regions identified in recent whole-genome sequencing studies. We surveyed 90 Yoruba Nigerians from the HapMap Project, and uncovered approximately 2,700 potentially novel CNVs not previously reported in the literature having a median length of approximately 3 kb. We generated sample-level event calls in the 90 Yoruba at nearly 9,000 regions, including approximately 2,500 regions having a median length of just approximately 200 bp that represent the union of CNVs independently discovered through whole-genome sequencing of two individuals of Western European descent. Event frequencies were noticeably higher at shorter regions < 1 kb compared to longer CNVs (> 1 kb).

Conclusions: As new shorter CNVs are discovered through whole-genome sequencing, high resolution microarrays offer a cost-effective means to detect the occurrence of events at these regions in large numbers of individuals in order to gain biological insights beyond the initial discovery.

Show MeSH

Related in: MedlinePlus

Breakdown of event occurrence tallies by region lengths. Panels correspond to confirmed CNVs from our work, and regions discovered by whole-genome sequencing as summarized in Table 4. Box-plots show medians and interquartile ranges, with whiskers extending to maximum or minimum values within 1.5 times the 75th or 25th percentiles, respectively. The width of boxes is proportional to the number of regions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3091319&req=5

Figure 8: Breakdown of event occurrence tallies by region lengths. Panels correspond to confirmed CNVs from our work, and regions discovered by whole-genome sequencing as summarized in Table 4. Box-plots show medians and interquartile ranges, with whiskers extending to maximum or minimum values within 1.5 times the 75th or 25th percentiles, respectively. The width of boxes is proportional to the number of regions.

Mentions: A large majority (> 77%) of the shorter CNVs that were discovered by sequencing individuals of Western European descent had at least one observed event in the Yoruba (Table 4). Based on detected events across the 90 Yoruba, the median lengths were 190 bp and 240 bp in the Levy_only and Wheeler_only groups, respectively (Table 4), and the length distributions of these regions were skewed toward the 100-bp cutoff (Figure 5). Bearing in mind that observed frequencies may be underestimated due to missed event calls as suggested by the trio analysis above, the three groups of regions had noticeably higher event frequencies compared to the 6,368 confirmed CNVs from our work, as measured by average events per region, or cumulative events in the 90 Yoruba (Table 4, Figure 6). But a subset of 1,107 confirmed CNVs from our work, having lengths < 1 kb, had similar high event frequencies, and cumulative events, resembling the Levy_only group (Figure 6). The cumulative event curves are distinctly different between the Levy_only and Wheeler_only groups, with the Levy+Wheeler curve intermediate between the two. Increasing the specificity of event calls (lowering false-positive events at the expense of sensitivity) noticeably lowered event frequencies in the Levy_only group, and to a lesser degree in the < 1 kb confirmed CNVs from our work, but the Levy+Wheeler and Wheeler_only groups maintained high relative event frequencies (Figure 7). The occurrence of loss events was higher than gain events at the confirmed CNVs, but to a lesser degree in the Wheeler_only group, and even less so in the Levy_only and Levy+Wheeler groups (Table 4). For comparison, in previous studies the ratio of loss:gain in Yoruba ranged from 6.3, 3.5, 2.5, to 0.9, and 0.9 in the McCarroll et al. [14], Korbel et al. [31], Wang et al. [15], Perry et al. [13], and Kidd et al. [30] studies, respectively. In total, we generated sample-level event calls in the 90 Yoruba at nearly 9,000 regions (approximately 4% of genome), including > 3,300 shorter regions (< 1 kb). A breakdown of event occurrence by region lengths shows that event frequencies were higher in subsets of shorter (< 1 kb) CNVs from both our work or the Levy et al. [18] and Wheeler et al. [19] studies (Figure 8).


High resolution discovery and confirmation of copy number variants in 90 Yoruba Nigerians.

Matsuzaki H, Wang PH, Hu J, Rava R, Fu GK - Genome Biol. (2009)

Breakdown of event occurrence tallies by region lengths. Panels correspond to confirmed CNVs from our work, and regions discovered by whole-genome sequencing as summarized in Table 4. Box-plots show medians and interquartile ranges, with whiskers extending to maximum or minimum values within 1.5 times the 75th or 25th percentiles, respectively. The width of boxes is proportional to the number of regions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3091319&req=5

Figure 8: Breakdown of event occurrence tallies by region lengths. Panels correspond to confirmed CNVs from our work, and regions discovered by whole-genome sequencing as summarized in Table 4. Box-plots show medians and interquartile ranges, with whiskers extending to maximum or minimum values within 1.5 times the 75th or 25th percentiles, respectively. The width of boxes is proportional to the number of regions.
Mentions: A large majority (> 77%) of the shorter CNVs that were discovered by sequencing individuals of Western European descent had at least one observed event in the Yoruba (Table 4). Based on detected events across the 90 Yoruba, the median lengths were 190 bp and 240 bp in the Levy_only and Wheeler_only groups, respectively (Table 4), and the length distributions of these regions were skewed toward the 100-bp cutoff (Figure 5). Bearing in mind that observed frequencies may be underestimated due to missed event calls as suggested by the trio analysis above, the three groups of regions had noticeably higher event frequencies compared to the 6,368 confirmed CNVs from our work, as measured by average events per region, or cumulative events in the 90 Yoruba (Table 4, Figure 6). But a subset of 1,107 confirmed CNVs from our work, having lengths < 1 kb, had similar high event frequencies, and cumulative events, resembling the Levy_only group (Figure 6). The cumulative event curves are distinctly different between the Levy_only and Wheeler_only groups, with the Levy+Wheeler curve intermediate between the two. Increasing the specificity of event calls (lowering false-positive events at the expense of sensitivity) noticeably lowered event frequencies in the Levy_only group, and to a lesser degree in the < 1 kb confirmed CNVs from our work, but the Levy+Wheeler and Wheeler_only groups maintained high relative event frequencies (Figure 7). The occurrence of loss events was higher than gain events at the confirmed CNVs, but to a lesser degree in the Wheeler_only group, and even less so in the Levy_only and Levy+Wheeler groups (Table 4). For comparison, in previous studies the ratio of loss:gain in Yoruba ranged from 6.3, 3.5, 2.5, to 0.9, and 0.9 in the McCarroll et al. [14], Korbel et al. [31], Wang et al. [15], Perry et al. [13], and Kidd et al. [30] studies, respectively. In total, we generated sample-level event calls in the 90 Yoruba at nearly 9,000 regions (approximately 4% of genome), including > 3,300 shorter regions (< 1 kb). A breakdown of event occurrence by region lengths shows that event frequencies were higher in subsets of shorter (< 1 kb) CNVs from both our work or the Levy et al. [18] and Wheeler et al. [19] studies (Figure 8).

Bottom Line: We used custom high density oligonucleotide arrays in whole-genome scans at approximately 200-bp resolution, and followed up with a localized CNV typing array at resolutions as close as 10 bp, to confirm regions from the initial genome scans, and to detect the occurrence of sample-level events at shorter CNV regions identified in recent whole-genome sequencing studies.Event frequencies were noticeably higher at shorter regions < 1 kb compared to longer CNVs (> 1 kb).As new shorter CNVs are discovered through whole-genome sequencing, high resolution microarrays offer a cost-effective means to detect the occurrence of events at these regions in large numbers of individuals in order to gain biological insights beyond the initial discovery.

View Article: PubMed Central - HTML - PubMed

Affiliation: Affymetrix, Inc, 3420 Central Expressway, Santa Clara, CA 95051, USA. hajime_matsuzaki@affymetrix.com

ABSTRACT

Background: Copy number variants (CNVs) account for a large proportion of genetic variation in the genome. The initial discoveries of long (> 100 kb) CNVs in normal healthy individuals were made on BAC arrays and low resolution oligonucleotide arrays. Subsequent studies that used higher resolution microarrays and SNP genotyping arrays detected the presence of large numbers of CNVs that are < 100 kb, with median lengths of approximately 10 kb. More recently, whole genome sequencing of individuals has revealed an abundance of shorter CNVs with lengths < 1 kb.

Results: We used custom high density oligonucleotide arrays in whole-genome scans at approximately 200-bp resolution, and followed up with a localized CNV typing array at resolutions as close as 10 bp, to confirm regions from the initial genome scans, and to detect the occurrence of sample-level events at shorter CNV regions identified in recent whole-genome sequencing studies. We surveyed 90 Yoruba Nigerians from the HapMap Project, and uncovered approximately 2,700 potentially novel CNVs not previously reported in the literature having a median length of approximately 3 kb. We generated sample-level event calls in the 90 Yoruba at nearly 9,000 regions, including approximately 2,500 regions having a median length of just approximately 200 bp that represent the union of CNVs independently discovered through whole-genome sequencing of two individuals of Western European descent. Event frequencies were noticeably higher at shorter regions < 1 kb compared to longer CNVs (> 1 kb).

Conclusions: As new shorter CNVs are discovered through whole-genome sequencing, high resolution microarrays offer a cost-effective means to detect the occurrence of events at these regions in large numbers of individuals in order to gain biological insights beyond the initial discovery.

Show MeSH
Related in: MedlinePlus