Limits...
Comment on "genomic hypomethylation in the human germline associates with selective structural mutability in the human genome".

Watson CT, Garg P, Sharp AJ - PLoS Genet. (2013)

View Article: PubMed Central - PubMed

Affiliation: Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

The distribution of CNVs in mammalian genomes is nonrandom, and several sequence features have been associated with CNV breakpoints and regions of high structural mutability –... Based on an analysis of DNA methylation patterns in human sperm, Li et al. recently reported a significant relationship between CNVs and hypomethylation in the male germline, leading to the suggestion that DNA hypomethylation plays a causative role in the generation of structural variation... Given the potentially profound implications of this report for the study of human disease, we read the findings of Li et al. with great interest... However, after systematically reanalyzing the relationship between CNVs and DNA methylation patterns in sperm, we have identified several cryptic confounders in the data that we believe seriously undermine the conclusions of Li et al... They then applied two independent methods to estimate germline DNA methylation within each window: (i) directly using published whole genome 15× bisulfite sequencing of sperm DNA and a second low coverage 2.5× dataset, and (ii) indirectly by calculating a Methylation Index (MI) based on the relative occurrence of C>T SNPs defined by the HapMap project... Indeed, after removing all 100 kb windows that contain satellites or contain >99 percentile by LINE, SINE, LTR, or total repeat content, we observed that in every dataset analyzed, enrichments for CNVs in windows with the lowest 1% mean methylation either significantly diminished or disappeared completely (Figure 1b)... We next considered the influence of problems associated with mapping reduced-complexity bisulfite reads in duplicated regions of the genome... Thus, by measuring the relative occurrence of C>T SNPs within CpG dinucleotides (termed “mSNPs”), it is possible to draw inferences about the ancestral methylation state of a region... However, SNP-based studies of structural variation are often compromised due to the fact that many CNV regions show significantly reduced SNP density compared to the genome average (median density of HapMap SNPs within HapMap CNVs is 1 per 1,087 bp, compared to 1 per 738 bp genome-wide)... This stems largely from the fact that ∼98% of HapMap SNP assays map uniquely within the genome, resulting in markedly reduced SNP density in duplicated portions of the genome, precisely those regions that are also enriched for CNVs, ,... As a result, there is a strong confounding relationship between CNV regions and low SNP density that renders the use of a SNP-based MI inherently flawed for studies of structural variation... Taking a more direct approach, we used published 15× sperm bisulfite sequencing data to calculate mean methylation per base both within and flanking 5,360 nonredundant HapMap CNVs <20 kb in size (mean CNV size 3,789 bp) (Figure 2a)... Although we observed a small decrease in methylation levels within CNVs compared to flanking regions, overall CNV regions have consistently high levels of methylation (mean 69%) that are only slightly lower than the genome average (70%)... In summary, we identify multiple strong confounders in the study of Li et al. that in our opinion cast serious doubt on the notion that germline hypomethylation is causally related to structural mutability.

Show MeSH
Multiple strong confounders contribute to artifactual associations between CNVs and hypomethylation.(a) Hypomethylated regions of the human genome are highly enriched for satellite repeats. We observed a strong enrichment for satellite repeats in regions of the genome <1st percentile of mean methylation level. Satellites comprise a mean of 16.6% of the hypomethylated windows, compared to only 0.26% in the rest of the genome (∼64-fold enrichment, p = 1.4×10−29, Mann-Whitney Rank Sum Test). Previous analysis has shown that satellites tend to be strongly hypomethylated in human sperm [10]. Furthermore, given their highly repetitive and dynamic nature, loci rich in satellites are enriched for CNVs (51.7% of windows containing satellites overlap HapMap CNVs [7] compared to 20.5% in the rest of the genome), creating an inherent confounder between CNVs and hypomethylation. (b) No enrichment for CNVs in hypomethylated regions after removal of confounding genomic features. Li et al. reported significant enrichments for overlap with multiple CNV datasets in “methylation deserts” (those with the lowest 1% mean methylation) and regions of the genome with MI = 0 [9]. However, after excluding regions of extreme repeat content (all windows containing satellite repeats, and those >99th percentile by LINE, SINE, LTR, and total repeat content, n = 1,716), and/or windows in which only a minority of CpGs were sampled (n = 430), all reported CNV enrichments reduce significantly and in most cases disappear entirely. Dashed grey line represents equal prevalence of CNVs between hypomethylated regions compared with the rest of the genome. (c) Bisulfite reads within “methylation deserts” preferentially map to CpG islands/shores. We observed that windows scored as “methylation deserts” by Li et al. (those with the lowest 1% mean methylation) show a strong bias for bisulfite reads to be mapped within ±2 kb of CGIs. As CGIs, especially those associated with the promoters of expressed genes, are typically unmethylated, this creates an underestimate of the mean methylation value in the wider region. Data shown represent fraction of CpGs per window with at least one overlapping read that map within ±2 kb of CGIs, after first excluding all windows containing satellite repeats, or those >99th percentile based on LINE, SINE, LTR, or total repeat content. (d) A huge reduction in SNP density in windows with MI = 0. We observed a massively reduced density of HapMap SNPs in windows with MI = 0 (mean, 25; median, 13) compared to the genome average (mean, 143; median, 137). As mSNPs represent only 8.2% of all SNPs in the genome and the formula used by Li et al. to calculate MI reports MI = 0 when no mSNPs are present, the use of a methylation index based on SNP content is inherently biased to score windows containing only a small number of SNPs as MI = 0. Because of stringent quality filtering, ∼98% of HapMap SNP assays map uniquely within the genome [20]. Therefore, a significant negative correlation exists between SNP density and segmental duplications (r = −0.337, p<10−323), a fraction of the genome that is highly enriched for structural variation [2], [3], [7]. (e) No enrichment for CNVs in regions with MI = 0 after removal of windows with low SNP density. Li et al. reported that windows with MI = 0 are enriched for CNVs identified in several different studies [9]. However, power calculations (Figure S4) show that at least 28 SNPs per window are required to achieve a <10% false discovery rate for MI = 0. After excluding windows containing <28 SNPs (n = 811), all enrichments for CNVs in the remaining regions with MI = 0 disappear, indicating that the conclusions of Li et al. are likely artifactual resulting from low SNP density in many CNV regions.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3585013&req=5

pgen-1003332-g001: Multiple strong confounders contribute to artifactual associations between CNVs and hypomethylation.(a) Hypomethylated regions of the human genome are highly enriched for satellite repeats. We observed a strong enrichment for satellite repeats in regions of the genome <1st percentile of mean methylation level. Satellites comprise a mean of 16.6% of the hypomethylated windows, compared to only 0.26% in the rest of the genome (∼64-fold enrichment, p = 1.4×10−29, Mann-Whitney Rank Sum Test). Previous analysis has shown that satellites tend to be strongly hypomethylated in human sperm [10]. Furthermore, given their highly repetitive and dynamic nature, loci rich in satellites are enriched for CNVs (51.7% of windows containing satellites overlap HapMap CNVs [7] compared to 20.5% in the rest of the genome), creating an inherent confounder between CNVs and hypomethylation. (b) No enrichment for CNVs in hypomethylated regions after removal of confounding genomic features. Li et al. reported significant enrichments for overlap with multiple CNV datasets in “methylation deserts” (those with the lowest 1% mean methylation) and regions of the genome with MI = 0 [9]. However, after excluding regions of extreme repeat content (all windows containing satellite repeats, and those >99th percentile by LINE, SINE, LTR, and total repeat content, n = 1,716), and/or windows in which only a minority of CpGs were sampled (n = 430), all reported CNV enrichments reduce significantly and in most cases disappear entirely. Dashed grey line represents equal prevalence of CNVs between hypomethylated regions compared with the rest of the genome. (c) Bisulfite reads within “methylation deserts” preferentially map to CpG islands/shores. We observed that windows scored as “methylation deserts” by Li et al. (those with the lowest 1% mean methylation) show a strong bias for bisulfite reads to be mapped within ±2 kb of CGIs. As CGIs, especially those associated with the promoters of expressed genes, are typically unmethylated, this creates an underestimate of the mean methylation value in the wider region. Data shown represent fraction of CpGs per window with at least one overlapping read that map within ±2 kb of CGIs, after first excluding all windows containing satellite repeats, or those >99th percentile based on LINE, SINE, LTR, or total repeat content. (d) A huge reduction in SNP density in windows with MI = 0. We observed a massively reduced density of HapMap SNPs in windows with MI = 0 (mean, 25; median, 13) compared to the genome average (mean, 143; median, 137). As mSNPs represent only 8.2% of all SNPs in the genome and the formula used by Li et al. to calculate MI reports MI = 0 when no mSNPs are present, the use of a methylation index based on SNP content is inherently biased to score windows containing only a small number of SNPs as MI = 0. Because of stringent quality filtering, ∼98% of HapMap SNP assays map uniquely within the genome [20]. Therefore, a significant negative correlation exists between SNP density and segmental duplications (r = −0.337, p<10−323), a fraction of the genome that is highly enriched for structural variation [2], [3], [7]. (e) No enrichment for CNVs in regions with MI = 0 after removal of windows with low SNP density. Li et al. reported that windows with MI = 0 are enriched for CNVs identified in several different studies [9]. However, power calculations (Figure S4) show that at least 28 SNPs per window are required to achieve a <10% false discovery rate for MI = 0. After excluding windows containing <28 SNPs (n = 811), all enrichments for CNVs in the remaining regions with MI = 0 disappear, indicating that the conclusions of Li et al. are likely artifactual resulting from low SNP density in many CNV regions.

Mentions: We first investigated repeat content within “methylation deserts” identified by Li et al. We observed a strong enrichment for common repeats in windows with the lowest 1% methylation (Figure 1a), with 30% of windows defined as “methylation deserts” containing >99th percentile of total repeat content. In particular, we noted a massive enrichment for satellite repeats within these “methylation deserts.” Satellites comprise 16.6% of sequence in hypomethylated windows, compared to only 0.26% in the rest of the genome, corresponding to a 64-fold enrichment (p = 1.4×10−29, Mann-Whitney Rank Sum Test). Importantly, as noted by Molaro et al. [10], the vast majority of pericentromeric satellites and many other subtypes of common repeat are hypomethylated specifically in sperm. Due to their repetitive and nonunique nature, pericentromeres and regions with extreme repeat content are also hotspots for structural variation [15], [16], representing a strong confounder in any analysis of CNVs and hypomethylation. Indeed, after removing all 100 kb windows that contain satellites or contain >99th percentile by LINE, SINE, LTR, or total repeat content, we observed that in every dataset analyzed, enrichments for CNVs in windows with the lowest 1% mean methylation either significantly diminished or disappeared completely (Figure 1b).


Comment on "genomic hypomethylation in the human germline associates with selective structural mutability in the human genome".

Watson CT, Garg P, Sharp AJ - PLoS Genet. (2013)

Multiple strong confounders contribute to artifactual associations between CNVs and hypomethylation.(a) Hypomethylated regions of the human genome are highly enriched for satellite repeats. We observed a strong enrichment for satellite repeats in regions of the genome <1st percentile of mean methylation level. Satellites comprise a mean of 16.6% of the hypomethylated windows, compared to only 0.26% in the rest of the genome (∼64-fold enrichment, p = 1.4×10−29, Mann-Whitney Rank Sum Test). Previous analysis has shown that satellites tend to be strongly hypomethylated in human sperm [10]. Furthermore, given their highly repetitive and dynamic nature, loci rich in satellites are enriched for CNVs (51.7% of windows containing satellites overlap HapMap CNVs [7] compared to 20.5% in the rest of the genome), creating an inherent confounder between CNVs and hypomethylation. (b) No enrichment for CNVs in hypomethylated regions after removal of confounding genomic features. Li et al. reported significant enrichments for overlap with multiple CNV datasets in “methylation deserts” (those with the lowest 1% mean methylation) and regions of the genome with MI = 0 [9]. However, after excluding regions of extreme repeat content (all windows containing satellite repeats, and those >99th percentile by LINE, SINE, LTR, and total repeat content, n = 1,716), and/or windows in which only a minority of CpGs were sampled (n = 430), all reported CNV enrichments reduce significantly and in most cases disappear entirely. Dashed grey line represents equal prevalence of CNVs between hypomethylated regions compared with the rest of the genome. (c) Bisulfite reads within “methylation deserts” preferentially map to CpG islands/shores. We observed that windows scored as “methylation deserts” by Li et al. (those with the lowest 1% mean methylation) show a strong bias for bisulfite reads to be mapped within ±2 kb of CGIs. As CGIs, especially those associated with the promoters of expressed genes, are typically unmethylated, this creates an underestimate of the mean methylation value in the wider region. Data shown represent fraction of CpGs per window with at least one overlapping read that map within ±2 kb of CGIs, after first excluding all windows containing satellite repeats, or those >99th percentile based on LINE, SINE, LTR, or total repeat content. (d) A huge reduction in SNP density in windows with MI = 0. We observed a massively reduced density of HapMap SNPs in windows with MI = 0 (mean, 25; median, 13) compared to the genome average (mean, 143; median, 137). As mSNPs represent only 8.2% of all SNPs in the genome and the formula used by Li et al. to calculate MI reports MI = 0 when no mSNPs are present, the use of a methylation index based on SNP content is inherently biased to score windows containing only a small number of SNPs as MI = 0. Because of stringent quality filtering, ∼98% of HapMap SNP assays map uniquely within the genome [20]. Therefore, a significant negative correlation exists between SNP density and segmental duplications (r = −0.337, p<10−323), a fraction of the genome that is highly enriched for structural variation [2], [3], [7]. (e) No enrichment for CNVs in regions with MI = 0 after removal of windows with low SNP density. Li et al. reported that windows with MI = 0 are enriched for CNVs identified in several different studies [9]. However, power calculations (Figure S4) show that at least 28 SNPs per window are required to achieve a <10% false discovery rate for MI = 0. After excluding windows containing <28 SNPs (n = 811), all enrichments for CNVs in the remaining regions with MI = 0 disappear, indicating that the conclusions of Li et al. are likely artifactual resulting from low SNP density in many CNV regions.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3585013&req=5

pgen-1003332-g001: Multiple strong confounders contribute to artifactual associations between CNVs and hypomethylation.(a) Hypomethylated regions of the human genome are highly enriched for satellite repeats. We observed a strong enrichment for satellite repeats in regions of the genome <1st percentile of mean methylation level. Satellites comprise a mean of 16.6% of the hypomethylated windows, compared to only 0.26% in the rest of the genome (∼64-fold enrichment, p = 1.4×10−29, Mann-Whitney Rank Sum Test). Previous analysis has shown that satellites tend to be strongly hypomethylated in human sperm [10]. Furthermore, given their highly repetitive and dynamic nature, loci rich in satellites are enriched for CNVs (51.7% of windows containing satellites overlap HapMap CNVs [7] compared to 20.5% in the rest of the genome), creating an inherent confounder between CNVs and hypomethylation. (b) No enrichment for CNVs in hypomethylated regions after removal of confounding genomic features. Li et al. reported significant enrichments for overlap with multiple CNV datasets in “methylation deserts” (those with the lowest 1% mean methylation) and regions of the genome with MI = 0 [9]. However, after excluding regions of extreme repeat content (all windows containing satellite repeats, and those >99th percentile by LINE, SINE, LTR, and total repeat content, n = 1,716), and/or windows in which only a minority of CpGs were sampled (n = 430), all reported CNV enrichments reduce significantly and in most cases disappear entirely. Dashed grey line represents equal prevalence of CNVs between hypomethylated regions compared with the rest of the genome. (c) Bisulfite reads within “methylation deserts” preferentially map to CpG islands/shores. We observed that windows scored as “methylation deserts” by Li et al. (those with the lowest 1% mean methylation) show a strong bias for bisulfite reads to be mapped within ±2 kb of CGIs. As CGIs, especially those associated with the promoters of expressed genes, are typically unmethylated, this creates an underestimate of the mean methylation value in the wider region. Data shown represent fraction of CpGs per window with at least one overlapping read that map within ±2 kb of CGIs, after first excluding all windows containing satellite repeats, or those >99th percentile based on LINE, SINE, LTR, or total repeat content. (d) A huge reduction in SNP density in windows with MI = 0. We observed a massively reduced density of HapMap SNPs in windows with MI = 0 (mean, 25; median, 13) compared to the genome average (mean, 143; median, 137). As mSNPs represent only 8.2% of all SNPs in the genome and the formula used by Li et al. to calculate MI reports MI = 0 when no mSNPs are present, the use of a methylation index based on SNP content is inherently biased to score windows containing only a small number of SNPs as MI = 0. Because of stringent quality filtering, ∼98% of HapMap SNP assays map uniquely within the genome [20]. Therefore, a significant negative correlation exists between SNP density and segmental duplications (r = −0.337, p<10−323), a fraction of the genome that is highly enriched for structural variation [2], [3], [7]. (e) No enrichment for CNVs in regions with MI = 0 after removal of windows with low SNP density. Li et al. reported that windows with MI = 0 are enriched for CNVs identified in several different studies [9]. However, power calculations (Figure S4) show that at least 28 SNPs per window are required to achieve a <10% false discovery rate for MI = 0. After excluding windows containing <28 SNPs (n = 811), all enrichments for CNVs in the remaining regions with MI = 0 disappear, indicating that the conclusions of Li et al. are likely artifactual resulting from low SNP density in many CNV regions.
Mentions: We first investigated repeat content within “methylation deserts” identified by Li et al. We observed a strong enrichment for common repeats in windows with the lowest 1% methylation (Figure 1a), with 30% of windows defined as “methylation deserts” containing >99th percentile of total repeat content. In particular, we noted a massive enrichment for satellite repeats within these “methylation deserts.” Satellites comprise 16.6% of sequence in hypomethylated windows, compared to only 0.26% in the rest of the genome, corresponding to a 64-fold enrichment (p = 1.4×10−29, Mann-Whitney Rank Sum Test). Importantly, as noted by Molaro et al. [10], the vast majority of pericentromeric satellites and many other subtypes of common repeat are hypomethylated specifically in sperm. Due to their repetitive and nonunique nature, pericentromeres and regions with extreme repeat content are also hotspots for structural variation [15], [16], representing a strong confounder in any analysis of CNVs and hypomethylation. Indeed, after removing all 100 kb windows that contain satellites or contain >99th percentile by LINE, SINE, LTR, or total repeat content, we observed that in every dataset analyzed, enrichments for CNVs in windows with the lowest 1% mean methylation either significantly diminished or disappeared completely (Figure 1b).

View Article: PubMed Central - PubMed

Affiliation: Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

The distribution of CNVs in mammalian genomes is nonrandom, and several sequence features have been associated with CNV breakpoints and regions of high structural mutability –... Based on an analysis of DNA methylation patterns in human sperm, Li et al. recently reported a significant relationship between CNVs and hypomethylation in the male germline, leading to the suggestion that DNA hypomethylation plays a causative role in the generation of structural variation... Given the potentially profound implications of this report for the study of human disease, we read the findings of Li et al. with great interest... However, after systematically reanalyzing the relationship between CNVs and DNA methylation patterns in sperm, we have identified several cryptic confounders in the data that we believe seriously undermine the conclusions of Li et al... They then applied two independent methods to estimate germline DNA methylation within each window: (i) directly using published whole genome 15× bisulfite sequencing of sperm DNA and a second low coverage 2.5× dataset, and (ii) indirectly by calculating a Methylation Index (MI) based on the relative occurrence of C>T SNPs defined by the HapMap project... Indeed, after removing all 100 kb windows that contain satellites or contain >99 percentile by LINE, SINE, LTR, or total repeat content, we observed that in every dataset analyzed, enrichments for CNVs in windows with the lowest 1% mean methylation either significantly diminished or disappeared completely (Figure 1b)... We next considered the influence of problems associated with mapping reduced-complexity bisulfite reads in duplicated regions of the genome... Thus, by measuring the relative occurrence of C>T SNPs within CpG dinucleotides (termed “mSNPs”), it is possible to draw inferences about the ancestral methylation state of a region... However, SNP-based studies of structural variation are often compromised due to the fact that many CNV regions show significantly reduced SNP density compared to the genome average (median density of HapMap SNPs within HapMap CNVs is 1 per 1,087 bp, compared to 1 per 738 bp genome-wide)... This stems largely from the fact that ∼98% of HapMap SNP assays map uniquely within the genome, resulting in markedly reduced SNP density in duplicated portions of the genome, precisely those regions that are also enriched for CNVs, ,... As a result, there is a strong confounding relationship between CNV regions and low SNP density that renders the use of a SNP-based MI inherently flawed for studies of structural variation... Taking a more direct approach, we used published 15× sperm bisulfite sequencing data to calculate mean methylation per base both within and flanking 5,360 nonredundant HapMap CNVs <20 kb in size (mean CNV size 3,789 bp) (Figure 2a)... Although we observed a small decrease in methylation levels within CNVs compared to flanking regions, overall CNV regions have consistently high levels of methylation (mean 69%) that are only slightly lower than the genome average (70%)... In summary, we identify multiple strong confounders in the study of Li et al. that in our opinion cast serious doubt on the notion that germline hypomethylation is causally related to structural mutability.

Show MeSH