Limits...
Genomic sequence is highly predictive of local nucleosome depletion.

Yuan GC, Liu JS - PLoS Comput. Biol. (2007)

Bottom Line: This new approach has significantly improved the prediction accuracy.Regulatory elements are enriched in low N-score regions.While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America. gcyuan@jimmy.harvard.edu

ABSTRACT
The regulation of DNA accessibility through nucleosome positioning is important for transcription control. Computational models have been developed to predict genome-wide nucleosome positions from DNA sequences, but these models consider only nucleosome sequences, which may have limited their power. We developed a statistical multi-resolution approach to identify a sequence signature, called the N-score, that distinguishes nucleosome binding DNA from non-nucleosome DNA. This new approach has significantly improved the prediction accuracy. The sequence information is highly predictive for local nucleosome enrichment or depletion, whereas predictions of the exact positions are only modestly more accurate than a model, suggesting the importance of other regulatory factors in fine-tuning the nucleosome positions. The N-score in promoter regions is negatively correlated with gene expression levels. Regulatory elements are enriched in low N-score regions. While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.

Show MeSH
Comparison of the Accuracies of the Predicted Non–Chromosome III Nucleosome Positions Obtained from Segal's [7] and Our Model(A) False negative error rates; (B) false positive error rates. “Random” refers to a random permutation of prediction nucleosomes. “Trivial” means every base pair coordinate is predicted as a nucleosome position. “70k” or “47k” refers to the number of predicted nucleosome positions involved in the comparison. For Segal's model, the top-ranked nucleosomes were selected. Our model predicts a total of 47,000 non–chromosome III nucleosome positions.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2211532&req=5

pcbi-0040013-g005: Comparison of the Accuracies of the Predicted Non–Chromosome III Nucleosome Positions Obtained from Segal's [7] and Our Model(A) False negative error rates; (B) false positive error rates. “Random” refers to a random permutation of prediction nucleosomes. “Trivial” means every base pair coordinate is predicted as a nucleosome position. “70k” or “47k” refers to the number of predicted nucleosome positions involved in the comparison. For Segal's model, the top-ranked nucleosomes were selected. Our model predicts a total of 47,000 non–chromosome III nucleosome positions.

Mentions: In previous studies [7,25,26], the authors quantify their prediction accuracy as the fraction of experimentally verified nucleosomes in a particular genomic region that are correctly predicted by their models, which is equivalent to one minus the false negative error rate. They observed that their model predictions are significantly better than random guessing (i.e., randomly sampling the same number of positions as their model did). We obtained the chromosomal coordinates of the top 70,000 predicted nucleosome positions (of which 1,822 are on chromosome III) from Dr. Segal. The performance of each of the models was evaluated by validating against the non–chromosome III nucleosome positions in [6] because long linkers from chromosome III were used for training the N-score model. With a 35 bp prediction accuracy cutoff (i.e., a correctly predicted site has to be within 35 bp of a true site), the predicted nucleosome positions of Segal's method has a false negative rate of 0.56, compared to 0.66 by random guessing (Figure 5A). Our model has a lower false negative rate of 0.52 despite that it predicted fewer nucleosomes (47,113 in total). For this smaller number of predictions, the false negative rate of random guessing is at 0.75. For a fair comparison, we ranked Segal's predicted nucleosome positions by their predicted probabilities and selected the 47,113 top ranked predictions. The false negative rate of their method was increased to 0.69 due to this reduction of the total number of predictions.


Genomic sequence is highly predictive of local nucleosome depletion.

Yuan GC, Liu JS - PLoS Comput. Biol. (2007)

Comparison of the Accuracies of the Predicted Non–Chromosome III Nucleosome Positions Obtained from Segal's [7] and Our Model(A) False negative error rates; (B) false positive error rates. “Random” refers to a random permutation of prediction nucleosomes. “Trivial” means every base pair coordinate is predicted as a nucleosome position. “70k” or “47k” refers to the number of predicted nucleosome positions involved in the comparison. For Segal's model, the top-ranked nucleosomes were selected. Our model predicts a total of 47,000 non–chromosome III nucleosome positions.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2211532&req=5

pcbi-0040013-g005: Comparison of the Accuracies of the Predicted Non–Chromosome III Nucleosome Positions Obtained from Segal's [7] and Our Model(A) False negative error rates; (B) false positive error rates. “Random” refers to a random permutation of prediction nucleosomes. “Trivial” means every base pair coordinate is predicted as a nucleosome position. “70k” or “47k” refers to the number of predicted nucleosome positions involved in the comparison. For Segal's model, the top-ranked nucleosomes were selected. Our model predicts a total of 47,000 non–chromosome III nucleosome positions.
Mentions: In previous studies [7,25,26], the authors quantify their prediction accuracy as the fraction of experimentally verified nucleosomes in a particular genomic region that are correctly predicted by their models, which is equivalent to one minus the false negative error rate. They observed that their model predictions are significantly better than random guessing (i.e., randomly sampling the same number of positions as their model did). We obtained the chromosomal coordinates of the top 70,000 predicted nucleosome positions (of which 1,822 are on chromosome III) from Dr. Segal. The performance of each of the models was evaluated by validating against the non–chromosome III nucleosome positions in [6] because long linkers from chromosome III were used for training the N-score model. With a 35 bp prediction accuracy cutoff (i.e., a correctly predicted site has to be within 35 bp of a true site), the predicted nucleosome positions of Segal's method has a false negative rate of 0.56, compared to 0.66 by random guessing (Figure 5A). Our model has a lower false negative rate of 0.52 despite that it predicted fewer nucleosomes (47,113 in total). For this smaller number of predictions, the false negative rate of random guessing is at 0.75. For a fair comparison, we ranked Segal's predicted nucleosome positions by their predicted probabilities and selected the 47,113 top ranked predictions. The false negative rate of their method was increased to 0.69 due to this reduction of the total number of predictions.

Bottom Line: This new approach has significantly improved the prediction accuracy.Regulatory elements are enriched in low N-score regions.While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America. gcyuan@jimmy.harvard.edu

ABSTRACT
The regulation of DNA accessibility through nucleosome positioning is important for transcription control. Computational models have been developed to predict genome-wide nucleosome positions from DNA sequences, but these models consider only nucleosome sequences, which may have limited their power. We developed a statistical multi-resolution approach to identify a sequence signature, called the N-score, that distinguishes nucleosome binding DNA from non-nucleosome DNA. This new approach has significantly improved the prediction accuracy. The sequence information is highly predictive for local nucleosome enrichment or depletion, whereas predictions of the exact positions are only modestly more accurate than a model, suggesting the importance of other regulatory factors in fine-tuning the nucleosome positions. The N-score in promoter regions is negatively correlated with gene expression levels. Regulatory elements are enriched in low N-score regions. While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.

Show MeSH