Limits...
Genomic sequence is highly predictive of local nucleosome depletion.

Yuan GC, Liu JS - PLoS Comput. Biol. (2007)

Bottom Line: This new approach has significantly improved the prediction accuracy.Regulatory elements are enriched in low N-score regions.While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America. gcyuan@jimmy.harvard.edu

ABSTRACT
The regulation of DNA accessibility through nucleosome positioning is important for transcription control. Computational models have been developed to predict genome-wide nucleosome positions from DNA sequences, but these models consider only nucleosome sequences, which may have limited their power. We developed a statistical multi-resolution approach to identify a sequence signature, called the N-score, that distinguishes nucleosome binding DNA from non-nucleosome DNA. This new approach has significantly improved the prediction accuracy. The sequence information is highly predictive for local nucleosome enrichment or depletion, whereas predictions of the exact positions are only modestly more accurate than a model, suggesting the importance of other regulatory factors in fine-tuning the nucleosome positions. The N-score in promoter regions is negatively correlated with gene expression levels. Regulatory elements are enriched in low N-score regions. While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.

Show MeSH
Comparison of the Performance of the Nucleosome Scores from Different Models“This model” refers to the N-score in this paper; “Segal” refers to the apparent free energy score in Segal et al. [7]; and “Segal new” refers to a modified version of Segal's model. The modified apparent free energy score is the log-ratio of the likelihoods of the nucleosome model and the linker model; “Ioshikhes” refers to the NPS score in Ioshikhes et al. [25]; “Ioshikhes new” refers to the same as “Ioshikhes,” except that the NPS pattern was recalculated from the training nucleosome sequences; “Peckham” refers to the support vector machine generated discriminant score using the method in Peckham et al. [26](A) Cross-validation of model performance in discriminating nucleosome from linker sequences. The plotted ROC curves represent the average performance over five independent rounds of 2-fold cross-validations.(B) Model performance in discriminating nucleosome-enriched probes from -depleted probes in Pokholok et al [4]. The nucleosome scores for (B) are averaged over 300 bp windows.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2211532&req=5

pcbi-0040013-g001: Comparison of the Performance of the Nucleosome Scores from Different Models“This model” refers to the N-score in this paper; “Segal” refers to the apparent free energy score in Segal et al. [7]; and “Segal new” refers to a modified version of Segal's model. The modified apparent free energy score is the log-ratio of the likelihoods of the nucleosome model and the linker model; “Ioshikhes” refers to the NPS score in Ioshikhes et al. [25]; “Ioshikhes new” refers to the same as “Ioshikhes,” except that the NPS pattern was recalculated from the training nucleosome sequences; “Peckham” refers to the support vector machine generated discriminant score using the method in Peckham et al. [26](A) Cross-validation of model performance in discriminating nucleosome from linker sequences. The plotted ROC curves represent the average performance over five independent rounds of 2-fold cross-validations.(B) Model performance in discriminating nucleosome-enriched probes from -depleted probes in Pokholok et al [4]. The nucleosome scores for (B) are averaged over 300 bp windows.

Mentions: By fixing the cutoff value of the N-score, we can classify any nucleosome-sized sequence as either a nucleosome or a linker sequence. We compared the performance of different models by using a 5 × 2-fold cross-validation method recommended by Dietterich [28]. The dataset described in the previous section (199 nucleosome and 296 linker sequences) was randomly partitioned into two subsets of equal sizes, with the same proportion of positives and negatives in each training set. Each subset in turn was selected as the training subset with the other reserved for testing. A receiver operating characteristic (ROC) curve was obtained for the testing subset by varying the cutoff N-score values, and the ROC-score, defined as the area under the ROC curve, was used to measure the overall model performance. This 2-fold cross-validation procedure was repeated five times independently. The average ROC curve of our method is shown in Figure 1A, which has an ROC-score of 0.84.


Genomic sequence is highly predictive of local nucleosome depletion.

Yuan GC, Liu JS - PLoS Comput. Biol. (2007)

Comparison of the Performance of the Nucleosome Scores from Different Models“This model” refers to the N-score in this paper; “Segal” refers to the apparent free energy score in Segal et al. [7]; and “Segal new” refers to a modified version of Segal's model. The modified apparent free energy score is the log-ratio of the likelihoods of the nucleosome model and the linker model; “Ioshikhes” refers to the NPS score in Ioshikhes et al. [25]; “Ioshikhes new” refers to the same as “Ioshikhes,” except that the NPS pattern was recalculated from the training nucleosome sequences; “Peckham” refers to the support vector machine generated discriminant score using the method in Peckham et al. [26](A) Cross-validation of model performance in discriminating nucleosome from linker sequences. The plotted ROC curves represent the average performance over five independent rounds of 2-fold cross-validations.(B) Model performance in discriminating nucleosome-enriched probes from -depleted probes in Pokholok et al [4]. The nucleosome scores for (B) are averaged over 300 bp windows.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2211532&req=5

pcbi-0040013-g001: Comparison of the Performance of the Nucleosome Scores from Different Models“This model” refers to the N-score in this paper; “Segal” refers to the apparent free energy score in Segal et al. [7]; and “Segal new” refers to a modified version of Segal's model. The modified apparent free energy score is the log-ratio of the likelihoods of the nucleosome model and the linker model; “Ioshikhes” refers to the NPS score in Ioshikhes et al. [25]; “Ioshikhes new” refers to the same as “Ioshikhes,” except that the NPS pattern was recalculated from the training nucleosome sequences; “Peckham” refers to the support vector machine generated discriminant score using the method in Peckham et al. [26](A) Cross-validation of model performance in discriminating nucleosome from linker sequences. The plotted ROC curves represent the average performance over five independent rounds of 2-fold cross-validations.(B) Model performance in discriminating nucleosome-enriched probes from -depleted probes in Pokholok et al [4]. The nucleosome scores for (B) are averaged over 300 bp windows.
Mentions: By fixing the cutoff value of the N-score, we can classify any nucleosome-sized sequence as either a nucleosome or a linker sequence. We compared the performance of different models by using a 5 × 2-fold cross-validation method recommended by Dietterich [28]. The dataset described in the previous section (199 nucleosome and 296 linker sequences) was randomly partitioned into two subsets of equal sizes, with the same proportion of positives and negatives in each training set. Each subset in turn was selected as the training subset with the other reserved for testing. A receiver operating characteristic (ROC) curve was obtained for the testing subset by varying the cutoff N-score values, and the ROC-score, defined as the area under the ROC curve, was used to measure the overall model performance. This 2-fold cross-validation procedure was repeated five times independently. The average ROC curve of our method is shown in Figure 1A, which has an ROC-score of 0.84.

Bottom Line: This new approach has significantly improved the prediction accuracy.Regulatory elements are enriched in low N-score regions.While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America. gcyuan@jimmy.harvard.edu

ABSTRACT
The regulation of DNA accessibility through nucleosome positioning is important for transcription control. Computational models have been developed to predict genome-wide nucleosome positions from DNA sequences, but these models consider only nucleosome sequences, which may have limited their power. We developed a statistical multi-resolution approach to identify a sequence signature, called the N-score, that distinguishes nucleosome binding DNA from non-nucleosome DNA. This new approach has significantly improved the prediction accuracy. The sequence information is highly predictive for local nucleosome enrichment or depletion, whereas predictions of the exact positions are only modestly more accurate than a model, suggesting the importance of other regulatory factors in fine-tuning the nucleosome positions. The N-score in promoter regions is negatively correlated with gene expression levels. Regulatory elements are enriched in low N-score regions. While our model is derived from yeast data, the N-score pattern computed from this model agrees well with recent high-resolution protein-binding data in human.

Show MeSH