Limits...
Single molecule-level detection and long read-based phasing of epigenetic variations in bacterial methylomes.

Beaulaurier J, Zhang XS, Zhu S, Sebra R, Rosenbluh C, Deikus G, Shen N, Munera D, Waldor MK, Chess A, Blaser MJ, Schadt EE, Fang G - Nat Commun (2015)

Bottom Line: Here, we present SMALR (single-molecule modification analysis of long reads), a novel framework for single molecule-level detection and phasing of DNA methylation.Using seven bacterial strains, we show that SMALR yields significantly improved resolution and reveals distinct types of epigenetic heterogeneity.SMALR is a powerful new tool that enables de novo detection of epigenetic heterogeneity and empowers investigation of its functions in bacterial populations.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York 10029, USA.

ABSTRACT
Beyond its role in host defense, bacterial DNA methylation also plays important roles in the regulation of gene expression, virulence and antibiotic resistance. Bacterial cells in a clonal population can generate epigenetic heterogeneity to increase population-level phenotypic plasticity. Single molecule, real-time (SMRT) sequencing enables the detection of N6-methyladenine and N4-methylcytosine, two major types of DNA modifications comprising the bacterial methylome. However, existing SMRT sequencing-based methods for studying bacterial methylomes rely on a population-level consensus that lacks the single-cell resolution required to observe epigenetic heterogeneity. Here, we present SMALR (single-molecule modification analysis of long reads), a novel framework for single molecule-level detection and phasing of DNA methylation. Using seven bacterial strains, we show that SMALR yields significantly improved resolution and reveals distinct types of epigenetic heterogeneity. SMALR is a powerful new tool that enables de novo detection of epigenetic heterogeneity and empowers investigation of its functions in bacterial populations.

No MeSH data available.


Related in: MedlinePlus

SMP score distributions reveal distinct types of epigenetic heterogeneity.(a) Single molecule, pooled (SMP) distribution for H. pylori J99 motif 5′-GATC and its corresponding IPD-shuffled control. The identical unimodal distributions suggest a fully active MTase (as expected). (b) SMP distributions for H. pylori J99 motif 5′-GWCAY and its corresponding WGA control. The major peak around SMP≈0 and minor peak around SMP≈2 suggests that the mostly inactive MTase targeting 5′-GWCAY, M.Hpy99XXI, is methylating 5′-GWCAY in a small fraction of cells. Methylated molecules with SMP scores>2 have an FDR<0.2%. (c) SMP distributions of 5′-TCAN6TRG/5′-CYAN6TGA in H. pylori J99 and its corresponding IPD-shuffled control. The major peak around SMP≈2 and minor peak around SMP≈0 indicates that the normally active MTase, Hpy99XXII, is inactive in a small fraction of cells. Non-methylated molecules with SMP scores<0 have an FDR<1.3%. (d) High-accuracy sequencing with Illumina MiSeq and read-level analysis of insertion/deletion calls shows significant variation in the lengths of two specific homopolymers in the coding sequences of M.Hpy99XXI and S.Hpy99XXII. The high percentage of deletions in these two genes stands apart from the deletion rates found in five other C/G homopolymers from H. pylori J99 and E. coli K12, suggesting that this is not simply due to lower sequencing accuracy in homopolymer regions. (e) SMP distributions of 5′-TCNNGA in H. pylori J99 and its corresponding IPD-shuffled control. The SMP scores suggest a MTases behaviour similar to that of Hpy99XXII. Non-methylated molecules with SMP scores<0 have an FDR<1.6%. (f) SMP distribution for the C. salexigens motif 5′-RGATCY. The major peak near SMP≈0.9 indicates that the IPDs sampled for each molecule reflect a mixture of both non-methylated (IPD≈0) and methylated (IPD≈2) motif sites, suggesting stochastic methylation as the primary source of epigenetic heterogeneity for this motif.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4490391&req=5

f4: SMP score distributions reveal distinct types of epigenetic heterogeneity.(a) Single molecule, pooled (SMP) distribution for H. pylori J99 motif 5′-GATC and its corresponding IPD-shuffled control. The identical unimodal distributions suggest a fully active MTase (as expected). (b) SMP distributions for H. pylori J99 motif 5′-GWCAY and its corresponding WGA control. The major peak around SMP≈0 and minor peak around SMP≈2 suggests that the mostly inactive MTase targeting 5′-GWCAY, M.Hpy99XXI, is methylating 5′-GWCAY in a small fraction of cells. Methylated molecules with SMP scores>2 have an FDR<0.2%. (c) SMP distributions of 5′-TCAN6TRG/5′-CYAN6TGA in H. pylori J99 and its corresponding IPD-shuffled control. The major peak around SMP≈2 and minor peak around SMP≈0 indicates that the normally active MTase, Hpy99XXII, is inactive in a small fraction of cells. Non-methylated molecules with SMP scores<0 have an FDR<1.3%. (d) High-accuracy sequencing with Illumina MiSeq and read-level analysis of insertion/deletion calls shows significant variation in the lengths of two specific homopolymers in the coding sequences of M.Hpy99XXI and S.Hpy99XXII. The high percentage of deletions in these two genes stands apart from the deletion rates found in five other C/G homopolymers from H. pylori J99 and E. coli K12, suggesting that this is not simply due to lower sequencing accuracy in homopolymer regions. (e) SMP distributions of 5′-TCNNGA in H. pylori J99 and its corresponding IPD-shuffled control. The SMP scores suggest a MTases behaviour similar to that of Hpy99XXII. Non-methylated molecules with SMP scores<0 have an FDR<1.6%. (f) SMP distribution for the C. salexigens motif 5′-RGATCY. The major peak near SMP≈0.9 indicates that the IPDs sampled for each molecule reflect a mixture of both non-methylated (IPD≈0) and methylated (IPD≈2) motif sites, suggesting stochastic methylation as the primary source of epigenetic heterogeneity for this motif.

Mentions: We first tested the SMP method on an H. pylori J99 isolate that was sequenced using libraries with long (∼20 kb) DNA inserts. We targeted the 4-mer 5′-GATC motif, where >95% of sites are expected to be methylated based on its SMSN distribution (Fig. 3a), and calculated a SMP score for each long read containing at least 10 GATC sites. The distribution of SMP scores for 5′-GATC was compared with a control distribution of SMP scores calculated after randomly shuffling IPD values between molecules (Fig. 4a). No bimodality is present in the 5′-GATC SMP distribution and it is nearly identical to the IPD-shuffled SMP distribution, suggesting that the MTase responsible for targeting the 5′-GATC motif in H. pylori J99 (M.Hpy99VI) is constitutively active. Through false-discovery rate (FDR) estimation (Methods), we found that only 0.07% of the molecules with at least 10 5′-GATC sites had evidence of non-methylation (maximum FDR=1%). This level of non-methylation is consistent with several other motifs for which we expect to observe near-universal methylation activity (5′-CATG, 5′-GANTC and 5′-GAGG), suggesting that the small number of non-methylated molecules may have originated from transiently hemi-methylated regions directly behind the DNA replication fork. Furthermore, phase variation of M.Hpy99VI was considered unlikely as no significant sequence variation was observed in its coding sequence (Supplementary Fig. 10).


Single molecule-level detection and long read-based phasing of epigenetic variations in bacterial methylomes.

Beaulaurier J, Zhang XS, Zhu S, Sebra R, Rosenbluh C, Deikus G, Shen N, Munera D, Waldor MK, Chess A, Blaser MJ, Schadt EE, Fang G - Nat Commun (2015)

SMP score distributions reveal distinct types of epigenetic heterogeneity.(a) Single molecule, pooled (SMP) distribution for H. pylori J99 motif 5′-GATC and its corresponding IPD-shuffled control. The identical unimodal distributions suggest a fully active MTase (as expected). (b) SMP distributions for H. pylori J99 motif 5′-GWCAY and its corresponding WGA control. The major peak around SMP≈0 and minor peak around SMP≈2 suggests that the mostly inactive MTase targeting 5′-GWCAY, M.Hpy99XXI, is methylating 5′-GWCAY in a small fraction of cells. Methylated molecules with SMP scores>2 have an FDR<0.2%. (c) SMP distributions of 5′-TCAN6TRG/5′-CYAN6TGA in H. pylori J99 and its corresponding IPD-shuffled control. The major peak around SMP≈2 and minor peak around SMP≈0 indicates that the normally active MTase, Hpy99XXII, is inactive in a small fraction of cells. Non-methylated molecules with SMP scores<0 have an FDR<1.3%. (d) High-accuracy sequencing with Illumina MiSeq and read-level analysis of insertion/deletion calls shows significant variation in the lengths of two specific homopolymers in the coding sequences of M.Hpy99XXI and S.Hpy99XXII. The high percentage of deletions in these two genes stands apart from the deletion rates found in five other C/G homopolymers from H. pylori J99 and E. coli K12, suggesting that this is not simply due to lower sequencing accuracy in homopolymer regions. (e) SMP distributions of 5′-TCNNGA in H. pylori J99 and its corresponding IPD-shuffled control. The SMP scores suggest a MTases behaviour similar to that of Hpy99XXII. Non-methylated molecules with SMP scores<0 have an FDR<1.6%. (f) SMP distribution for the C. salexigens motif 5′-RGATCY. The major peak near SMP≈0.9 indicates that the IPDs sampled for each molecule reflect a mixture of both non-methylated (IPD≈0) and methylated (IPD≈2) motif sites, suggesting stochastic methylation as the primary source of epigenetic heterogeneity for this motif.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4490391&req=5

f4: SMP score distributions reveal distinct types of epigenetic heterogeneity.(a) Single molecule, pooled (SMP) distribution for H. pylori J99 motif 5′-GATC and its corresponding IPD-shuffled control. The identical unimodal distributions suggest a fully active MTase (as expected). (b) SMP distributions for H. pylori J99 motif 5′-GWCAY and its corresponding WGA control. The major peak around SMP≈0 and minor peak around SMP≈2 suggests that the mostly inactive MTase targeting 5′-GWCAY, M.Hpy99XXI, is methylating 5′-GWCAY in a small fraction of cells. Methylated molecules with SMP scores>2 have an FDR<0.2%. (c) SMP distributions of 5′-TCAN6TRG/5′-CYAN6TGA in H. pylori J99 and its corresponding IPD-shuffled control. The major peak around SMP≈2 and minor peak around SMP≈0 indicates that the normally active MTase, Hpy99XXII, is inactive in a small fraction of cells. Non-methylated molecules with SMP scores<0 have an FDR<1.3%. (d) High-accuracy sequencing with Illumina MiSeq and read-level analysis of insertion/deletion calls shows significant variation in the lengths of two specific homopolymers in the coding sequences of M.Hpy99XXI and S.Hpy99XXII. The high percentage of deletions in these two genes stands apart from the deletion rates found in five other C/G homopolymers from H. pylori J99 and E. coli K12, suggesting that this is not simply due to lower sequencing accuracy in homopolymer regions. (e) SMP distributions of 5′-TCNNGA in H. pylori J99 and its corresponding IPD-shuffled control. The SMP scores suggest a MTases behaviour similar to that of Hpy99XXII. Non-methylated molecules with SMP scores<0 have an FDR<1.6%. (f) SMP distribution for the C. salexigens motif 5′-RGATCY. The major peak near SMP≈0.9 indicates that the IPDs sampled for each molecule reflect a mixture of both non-methylated (IPD≈0) and methylated (IPD≈2) motif sites, suggesting stochastic methylation as the primary source of epigenetic heterogeneity for this motif.
Mentions: We first tested the SMP method on an H. pylori J99 isolate that was sequenced using libraries with long (∼20 kb) DNA inserts. We targeted the 4-mer 5′-GATC motif, where >95% of sites are expected to be methylated based on its SMSN distribution (Fig. 3a), and calculated a SMP score for each long read containing at least 10 GATC sites. The distribution of SMP scores for 5′-GATC was compared with a control distribution of SMP scores calculated after randomly shuffling IPD values between molecules (Fig. 4a). No bimodality is present in the 5′-GATC SMP distribution and it is nearly identical to the IPD-shuffled SMP distribution, suggesting that the MTase responsible for targeting the 5′-GATC motif in H. pylori J99 (M.Hpy99VI) is constitutively active. Through false-discovery rate (FDR) estimation (Methods), we found that only 0.07% of the molecules with at least 10 5′-GATC sites had evidence of non-methylation (maximum FDR=1%). This level of non-methylation is consistent with several other motifs for which we expect to observe near-universal methylation activity (5′-CATG, 5′-GANTC and 5′-GAGG), suggesting that the small number of non-methylated molecules may have originated from transiently hemi-methylated regions directly behind the DNA replication fork. Furthermore, phase variation of M.Hpy99VI was considered unlikely as no significant sequence variation was observed in its coding sequence (Supplementary Fig. 10).

Bottom Line: Here, we present SMALR (single-molecule modification analysis of long reads), a novel framework for single molecule-level detection and phasing of DNA methylation.Using seven bacterial strains, we show that SMALR yields significantly improved resolution and reveals distinct types of epigenetic heterogeneity.SMALR is a powerful new tool that enables de novo detection of epigenetic heterogeneity and empowers investigation of its functions in bacterial populations.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York 10029, USA.

ABSTRACT
Beyond its role in host defense, bacterial DNA methylation also plays important roles in the regulation of gene expression, virulence and antibiotic resistance. Bacterial cells in a clonal population can generate epigenetic heterogeneity to increase population-level phenotypic plasticity. Single molecule, real-time (SMRT) sequencing enables the detection of N6-methyladenine and N4-methylcytosine, two major types of DNA modifications comprising the bacterial methylome. However, existing SMRT sequencing-based methods for studying bacterial methylomes rely on a population-level consensus that lacks the single-cell resolution required to observe epigenetic heterogeneity. Here, we present SMALR (single-molecule modification analysis of long reads), a novel framework for single molecule-level detection and phasing of DNA methylation. Using seven bacterial strains, we show that SMALR yields significantly improved resolution and reveals distinct types of epigenetic heterogeneity. SMALR is a powerful new tool that enables de novo detection of epigenetic heterogeneity and empowers investigation of its functions in bacterial populations.

No MeSH data available.


Related in: MedlinePlus