Limits...
Distinct modes of regulation by chromatin encoded through nucleosome positioning signals.

Field Y, Kaplan N, Fondufe-Mittendorf Y, Moore IK, Sharon E, Lubling Y, Widom J, Segal E - PLoS Comput. Biol. (2008)

Bottom Line: The detailed positions of nucleosomes profoundly impact gene regulation and are partly encoded by the genomic DNA sequence.We find that Poly(dA:dT) tracts are an important component of these nucleosome positioning signals and that their nucleosome-disfavoring action results in large nucleosome depletion over them and over their flanking regions and enhances the accessibility of transcription factors to their cognate sites.Our results suggest that the yeast genome may utilize these nucleosome positioning signals to regulate gene expression with different transcriptional noise and activation kinetics and DNA replication with different origin efficiency.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel.

ABSTRACT
The detailed positions of nucleosomes profoundly impact gene regulation and are partly encoded by the genomic DNA sequence. However, less is known about the functional consequences of this encoding. Here, we address this question using a genome-wide map of approximately 380,000 yeast nucleosomes that we sequenced in their entirety. Utilizing the high resolution of our map, we refine our understanding of how nucleosome organizations are encoded by the DNA sequence and demonstrate that the genomic sequence is highly predictive of the in vivo nucleosome organization, even across new nucleosome-bound sequences that we isolated from fly and human. We find that Poly(dA:dT) tracts are an important component of these nucleosome positioning signals and that their nucleosome-disfavoring action results in large nucleosome depletion over them and over their flanking regions and enhances the accessibility of transcription factors to their cognate sites. Our results suggest that the yeast genome may utilize these nucleosome positioning signals to regulate gene expression with different transcriptional noise and activation kinetics and DNA replication with different origin efficiency. These distinct functions may be achieved by encoding both relatively closed (nucleosome-covered) chromatin organizations over some factor binding sites, where factors must compete with nucleosomes for DNA access, and relatively open (nucleosome-depleted) organizations over other factor sites, where factors bind without competition.

Show MeSH

Related in: MedlinePlus

Nucleosome positioning signals in genomic sequence.(A) Fraction (normalized, see Methods) of AA/AT/TA/TT and separately, CC/CG/GC/GG dinucleotides at each position of our center-aligned nucleosome-bound sequences with length 146–148, showing ∼10 bp periodicity of these dinucleotide sets. (B) Many 5-mers are enriched in linker or nucleosome regions. Shown is the distribution of (log base 2) ratios between the frequency of 5-mers in linker regions and in nucleosomal DNA regions for all 5-mers (green line), and for the 32 5-mers composed exclusively of either G/C (red bars) or A/T (blue bars) nucleotides. Linkers are taken as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome read in our data. (C) Illustration of the key features of our probabilistic nucleosome–DNA interaction model, including the periodic dinucleotides patterns preferred within the nucleosome, and the 5-mers preferred in linkers. (D) Our model classifies linkers from nucleosomal DNA with high accuracy. Shown is the fraction of all measured nucleosomes that our model correctly classifies as nucleosomes (y-axis; true positive rate) against the fraction of all measured linkers that our model incorrectly classifies as nucleosomes (x-axis; false positive rate), for each possible threshold on the minimum score above which our model classifies a region as nucleosomal. The score of each measured nucleosome or linker is the mean score that our model assigns in the region that is within 20 bp from the center of the nucleosome or linker, respectively. Scores of the model are assigned using a cross validation scheme, in which every measured nucleosome or linker on a given chromosome is assigned a score using a model that was trained from the data of all other chromosomes. Linkers are defined as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome in our data. Results are shown for separating these 8,017 linkers from nucleosomes with various levels of occupancy (1, 2, 4, 8, and 16), where the occupancy of a nucleosome is defined by the number of nucleosome reads whose center is within 20 bp of its own center. The number of nucleosomes in each classification group are 84,410 (occupancy 1), 69,703 (occupancy 2), 38,787 (occupancy 4), 12,076 (occupancy 8), and 1,601 (occupancy 16). (E) Shown is the combined nucleosome fold depletion over all homopolymeric tracts of A or T (Poly(dA:dT) elements) of length k, for k = 5,6,7,…, and for Poly(dA:dT) elements with exactly 0, 2, 4, or 6 base substitutions (mismatches). Each graph is trimmed at a length K in which there are less than 10 elements, and the fold depletion at this final point is computed over all elements whose length is at least K. The combined fold depletion of a set of genomic elements (y-axis) is the ratio between their expected and observed nucleosome coverage, where the expected coverage is the average coverage of any basepair according to our data, and the observed coverage is the average coverage of a basepair from the set (see Methods). The number of underlying elements at various points in the graph is indicated (N). See Figure S4 for a graph of all possible mismatches and showing the number of elements at all points.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2570626&req=5

pcbi-1000216-g002: Nucleosome positioning signals in genomic sequence.(A) Fraction (normalized, see Methods) of AA/AT/TA/TT and separately, CC/CG/GC/GG dinucleotides at each position of our center-aligned nucleosome-bound sequences with length 146–148, showing ∼10 bp periodicity of these dinucleotide sets. (B) Many 5-mers are enriched in linker or nucleosome regions. Shown is the distribution of (log base 2) ratios between the frequency of 5-mers in linker regions and in nucleosomal DNA regions for all 5-mers (green line), and for the 32 5-mers composed exclusively of either G/C (red bars) or A/T (blue bars) nucleotides. Linkers are taken as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome read in our data. (C) Illustration of the key features of our probabilistic nucleosome–DNA interaction model, including the periodic dinucleotides patterns preferred within the nucleosome, and the 5-mers preferred in linkers. (D) Our model classifies linkers from nucleosomal DNA with high accuracy. Shown is the fraction of all measured nucleosomes that our model correctly classifies as nucleosomes (y-axis; true positive rate) against the fraction of all measured linkers that our model incorrectly classifies as nucleosomes (x-axis; false positive rate), for each possible threshold on the minimum score above which our model classifies a region as nucleosomal. The score of each measured nucleosome or linker is the mean score that our model assigns in the region that is within 20 bp from the center of the nucleosome or linker, respectively. Scores of the model are assigned using a cross validation scheme, in which every measured nucleosome or linker on a given chromosome is assigned a score using a model that was trained from the data of all other chromosomes. Linkers are defined as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome in our data. Results are shown for separating these 8,017 linkers from nucleosomes with various levels of occupancy (1, 2, 4, 8, and 16), where the occupancy of a nucleosome is defined by the number of nucleosome reads whose center is within 20 bp of its own center. The number of nucleosomes in each classification group are 84,410 (occupancy 1), 69,703 (occupancy 2), 38,787 (occupancy 4), 12,076 (occupancy 8), and 1,601 (occupancy 16). (E) Shown is the combined nucleosome fold depletion over all homopolymeric tracts of A or T (Poly(dA:dT) elements) of length k, for k = 5,6,7,…, and for Poly(dA:dT) elements with exactly 0, 2, 4, or 6 base substitutions (mismatches). Each graph is trimmed at a length K in which there are less than 10 elements, and the fold depletion at this final point is computed over all elements whose length is at least K. The combined fold depletion of a set of genomic elements (y-axis) is the ratio between their expected and observed nucleosome coverage, where the expected coverage is the average coverage of any basepair according to our data, and the observed coverage is the average coverage of a basepair from the set (see Methods). The number of underlying elements at various points in the graph is indicated (N). See Figure S4 for a graph of all possible mismatches and showing the number of elements at all points.

Mentions: Regarding the periodic component, several studies [2],[3],[14],[15] characterized the nucleosomes' intrinsic sequence preferences primarily by ∼10 bp periodicities of specific dinucleotides along the nucleosome length, thought to facilitate the sharp bending of DNA around the nucleosome [16]. We find similar periodicities in our new large nucleosome collection, demonstrating that these periodic dinucleotides are important genome-wide (Figure 2A and Figure S2). These same periodicities also arise in H2A.Z-containing nucleosomes [13], and in every in vivo and in vitro nucleosome collection obtained by direct sequencing from any organism [2], [11], [15], [17]–[19]. Moreover, these periodicities are also present in yeast transcription start sites (Figure 3), worm introns, 5′ and 3′ UTRs [20], human CpG dinucleotides not in CpG islands [21], and HIV integration sites in human [22].


Distinct modes of regulation by chromatin encoded through nucleosome positioning signals.

Field Y, Kaplan N, Fondufe-Mittendorf Y, Moore IK, Sharon E, Lubling Y, Widom J, Segal E - PLoS Comput. Biol. (2008)

Nucleosome positioning signals in genomic sequence.(A) Fraction (normalized, see Methods) of AA/AT/TA/TT and separately, CC/CG/GC/GG dinucleotides at each position of our center-aligned nucleosome-bound sequences with length 146–148, showing ∼10 bp periodicity of these dinucleotide sets. (B) Many 5-mers are enriched in linker or nucleosome regions. Shown is the distribution of (log base 2) ratios between the frequency of 5-mers in linker regions and in nucleosomal DNA regions for all 5-mers (green line), and for the 32 5-mers composed exclusively of either G/C (red bars) or A/T (blue bars) nucleotides. Linkers are taken as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome read in our data. (C) Illustration of the key features of our probabilistic nucleosome–DNA interaction model, including the periodic dinucleotides patterns preferred within the nucleosome, and the 5-mers preferred in linkers. (D) Our model classifies linkers from nucleosomal DNA with high accuracy. Shown is the fraction of all measured nucleosomes that our model correctly classifies as nucleosomes (y-axis; true positive rate) against the fraction of all measured linkers that our model incorrectly classifies as nucleosomes (x-axis; false positive rate), for each possible threshold on the minimum score above which our model classifies a region as nucleosomal. The score of each measured nucleosome or linker is the mean score that our model assigns in the region that is within 20 bp from the center of the nucleosome or linker, respectively. Scores of the model are assigned using a cross validation scheme, in which every measured nucleosome or linker on a given chromosome is assigned a score using a model that was trained from the data of all other chromosomes. Linkers are defined as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome in our data. Results are shown for separating these 8,017 linkers from nucleosomes with various levels of occupancy (1, 2, 4, 8, and 16), where the occupancy of a nucleosome is defined by the number of nucleosome reads whose center is within 20 bp of its own center. The number of nucleosomes in each classification group are 84,410 (occupancy 1), 69,703 (occupancy 2), 38,787 (occupancy 4), 12,076 (occupancy 8), and 1,601 (occupancy 16). (E) Shown is the combined nucleosome fold depletion over all homopolymeric tracts of A or T (Poly(dA:dT) elements) of length k, for k = 5,6,7,…, and for Poly(dA:dT) elements with exactly 0, 2, 4, or 6 base substitutions (mismatches). Each graph is trimmed at a length K in which there are less than 10 elements, and the fold depletion at this final point is computed over all elements whose length is at least K. The combined fold depletion of a set of genomic elements (y-axis) is the ratio between their expected and observed nucleosome coverage, where the expected coverage is the average coverage of any basepair according to our data, and the observed coverage is the average coverage of a basepair from the set (see Methods). The number of underlying elements at various points in the graph is indicated (N). See Figure S4 for a graph of all possible mismatches and showing the number of elements at all points.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2570626&req=5

pcbi-1000216-g002: Nucleosome positioning signals in genomic sequence.(A) Fraction (normalized, see Methods) of AA/AT/TA/TT and separately, CC/CG/GC/GG dinucleotides at each position of our center-aligned nucleosome-bound sequences with length 146–148, showing ∼10 bp periodicity of these dinucleotide sets. (B) Many 5-mers are enriched in linker or nucleosome regions. Shown is the distribution of (log base 2) ratios between the frequency of 5-mers in linker regions and in nucleosomal DNA regions for all 5-mers (green line), and for the 32 5-mers composed exclusively of either G/C (red bars) or A/T (blue bars) nucleotides. Linkers are taken as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome read in our data. (C) Illustration of the key features of our probabilistic nucleosome–DNA interaction model, including the periodic dinucleotides patterns preferred within the nucleosome, and the 5-mers preferred in linkers. (D) Our model classifies linkers from nucleosomal DNA with high accuracy. Shown is the fraction of all measured nucleosomes that our model correctly classifies as nucleosomes (y-axis; true positive rate) against the fraction of all measured linkers that our model incorrectly classifies as nucleosomes (x-axis; false positive rate), for each possible threshold on the minimum score above which our model classifies a region as nucleosomal. The score of each measured nucleosome or linker is the mean score that our model assigns in the region that is within 20 bp from the center of the nucleosome or linker, respectively. Scores of the model are assigned using a cross validation scheme, in which every measured nucleosome or linker on a given chromosome is assigned a score using a model that was trained from the data of all other chromosomes. Linkers are defined as contiguous non-repetitive regions of lengths 50–500 bp that are not covered by any nucleosome in our data. Results are shown for separating these 8,017 linkers from nucleosomes with various levels of occupancy (1, 2, 4, 8, and 16), where the occupancy of a nucleosome is defined by the number of nucleosome reads whose center is within 20 bp of its own center. The number of nucleosomes in each classification group are 84,410 (occupancy 1), 69,703 (occupancy 2), 38,787 (occupancy 4), 12,076 (occupancy 8), and 1,601 (occupancy 16). (E) Shown is the combined nucleosome fold depletion over all homopolymeric tracts of A or T (Poly(dA:dT) elements) of length k, for k = 5,6,7,…, and for Poly(dA:dT) elements with exactly 0, 2, 4, or 6 base substitutions (mismatches). Each graph is trimmed at a length K in which there are less than 10 elements, and the fold depletion at this final point is computed over all elements whose length is at least K. The combined fold depletion of a set of genomic elements (y-axis) is the ratio between their expected and observed nucleosome coverage, where the expected coverage is the average coverage of any basepair according to our data, and the observed coverage is the average coverage of a basepair from the set (see Methods). The number of underlying elements at various points in the graph is indicated (N). See Figure S4 for a graph of all possible mismatches and showing the number of elements at all points.
Mentions: Regarding the periodic component, several studies [2],[3],[14],[15] characterized the nucleosomes' intrinsic sequence preferences primarily by ∼10 bp periodicities of specific dinucleotides along the nucleosome length, thought to facilitate the sharp bending of DNA around the nucleosome [16]. We find similar periodicities in our new large nucleosome collection, demonstrating that these periodic dinucleotides are important genome-wide (Figure 2A and Figure S2). These same periodicities also arise in H2A.Z-containing nucleosomes [13], and in every in vivo and in vitro nucleosome collection obtained by direct sequencing from any organism [2], [11], [15], [17]–[19]. Moreover, these periodicities are also present in yeast transcription start sites (Figure 3), worm introns, 5′ and 3′ UTRs [20], human CpG dinucleotides not in CpG islands [21], and HIV integration sites in human [22].

Bottom Line: The detailed positions of nucleosomes profoundly impact gene regulation and are partly encoded by the genomic DNA sequence.We find that Poly(dA:dT) tracts are an important component of these nucleosome positioning signals and that their nucleosome-disfavoring action results in large nucleosome depletion over them and over their flanking regions and enhances the accessibility of transcription factors to their cognate sites.Our results suggest that the yeast genome may utilize these nucleosome positioning signals to regulate gene expression with different transcriptional noise and activation kinetics and DNA replication with different origin efficiency.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel.

ABSTRACT
The detailed positions of nucleosomes profoundly impact gene regulation and are partly encoded by the genomic DNA sequence. However, less is known about the functional consequences of this encoding. Here, we address this question using a genome-wide map of approximately 380,000 yeast nucleosomes that we sequenced in their entirety. Utilizing the high resolution of our map, we refine our understanding of how nucleosome organizations are encoded by the DNA sequence and demonstrate that the genomic sequence is highly predictive of the in vivo nucleosome organization, even across new nucleosome-bound sequences that we isolated from fly and human. We find that Poly(dA:dT) tracts are an important component of these nucleosome positioning signals and that their nucleosome-disfavoring action results in large nucleosome depletion over them and over their flanking regions and enhances the accessibility of transcription factors to their cognate sites. Our results suggest that the yeast genome may utilize these nucleosome positioning signals to regulate gene expression with different transcriptional noise and activation kinetics and DNA replication with different origin efficiency. These distinct functions may be achieved by encoding both relatively closed (nucleosome-covered) chromatin organizations over some factor binding sites, where factors must compete with nucleosomes for DNA access, and relatively open (nucleosome-depleted) organizations over other factor sites, where factors bind without competition.

Show MeSH
Related in: MedlinePlus