Limits...
Personalized copy number and segmental duplication maps using next-generation sequencing.

Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, Sahinalp SC, Gibbs RA, Eichler EE - Nat. Genet. (2009)

Bottom Line: We examine three human genomes and experimentally validate genome-wide copy number differences.We estimate that, on average, 73-87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 x 10(-16)).Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA.

ABSTRACT
Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73-87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 x 10(-16)). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.

Show MeSH

Related in: MedlinePlus

Correlation between computational and experimental copy number for NA18507 vs. JDWWe computed the copy number for each shared (gray) and individual specific duplication interval (blue or orange) based on the depth-of-coverage of aligned WGS against the human reference assembly (build35). Based on this computational estimates of copy number, we calculated a predicted log2 copy-number ratio for each autosomal duplication interval >20 kbp in length (and with less than 80% of total common repeat content). These values were plotted against the experimental log2 ratios determined by oligonucleotide arrayCGH. The vertical red lines indicate the threshold used for the validated calls (see Supplementary Note).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2875196&req=5

Figure 4: Correlation between computational and experimental copy number for NA18507 vs. JDWWe computed the copy number for each shared (gray) and individual specific duplication interval (blue or orange) based on the depth-of-coverage of aligned WGS against the human reference assembly (build35). Based on this computational estimates of copy number, we calculated a predicted log2 copy-number ratio for each autosomal duplication interval >20 kbp in length (and with less than 80% of total common repeat content). These values were plotted against the experimental log2 ratios determined by oligonucleotide arrayCGH. The vertical red lines indicate the threshold used for the validated calls (see Supplementary Note).

Mentions: Irrespective of the next-generation sequence (NGS) platform, the pattern of read-depth was remarkably reproducible for 48% of the shared duplications (44711/94070 Supplementary Figure 4). However among the remaining 52% of duplications, read-depth did not correlate between individuals. This suggests that shared duplications show the greatest extremes of copy-number variation between individuals (Supplementary Figure 5). Using absolute estimates of copy number, we calculated an in silico log2 ratio for each of the three genome-wide comparisons and compared it to the experimental values as determined by arrayCGH (Figure 4, Supplementary Figure 6). Overall, we found a positive correlation with copy-number predictions (R2=~0.52–0.63 depending on the pairwise comparison). We note that the ability of arrayCGH to discriminate absolute differences diminishes as the duplication copy number increases 14.


Personalized copy number and segmental duplication maps using next-generation sequencing.

Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, Sahinalp SC, Gibbs RA, Eichler EE - Nat. Genet. (2009)

Correlation between computational and experimental copy number for NA18507 vs. JDWWe computed the copy number for each shared (gray) and individual specific duplication interval (blue or orange) based on the depth-of-coverage of aligned WGS against the human reference assembly (build35). Based on this computational estimates of copy number, we calculated a predicted log2 copy-number ratio for each autosomal duplication interval >20 kbp in length (and with less than 80% of total common repeat content). These values were plotted against the experimental log2 ratios determined by oligonucleotide arrayCGH. The vertical red lines indicate the threshold used for the validated calls (see Supplementary Note).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2875196&req=5

Figure 4: Correlation between computational and experimental copy number for NA18507 vs. JDWWe computed the copy number for each shared (gray) and individual specific duplication interval (blue or orange) based on the depth-of-coverage of aligned WGS against the human reference assembly (build35). Based on this computational estimates of copy number, we calculated a predicted log2 copy-number ratio for each autosomal duplication interval >20 kbp in length (and with less than 80% of total common repeat content). These values were plotted against the experimental log2 ratios determined by oligonucleotide arrayCGH. The vertical red lines indicate the threshold used for the validated calls (see Supplementary Note).
Mentions: Irrespective of the next-generation sequence (NGS) platform, the pattern of read-depth was remarkably reproducible for 48% of the shared duplications (44711/94070 Supplementary Figure 4). However among the remaining 52% of duplications, read-depth did not correlate between individuals. This suggests that shared duplications show the greatest extremes of copy-number variation between individuals (Supplementary Figure 5). Using absolute estimates of copy number, we calculated an in silico log2 ratio for each of the three genome-wide comparisons and compared it to the experimental values as determined by arrayCGH (Figure 4, Supplementary Figure 6). Overall, we found a positive correlation with copy-number predictions (R2=~0.52–0.63 depending on the pairwise comparison). We note that the ability of arrayCGH to discriminate absolute differences diminishes as the duplication copy number increases 14.

Bottom Line: We examine three human genomes and experimentally validate genome-wide copy number differences.We estimate that, on average, 73-87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 x 10(-16)).Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA.

ABSTRACT
Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73-87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 x 10(-16)). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.

Show MeSH
Related in: MedlinePlus