Limits...
Statistical methods for detecting periodic fragments in DNA sequence data.

Epps J, Ying H, Huttley GA - Biol. Direct (2011)

Bottom Line: The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the presence of eroded periodicity.The autocorrelation method was identified as poorly suited for use with the blockwise bootstrap.To facilitate this, the genomic coordinates are available as Additional files 1, 2, and 3 in a format suitable for visualisation as tracks on popular genome browsers.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, NSW 2052, Australia. j.epps@unsw.edu.au

ABSTRACT

Background: Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed.

Results: We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~21% and ~19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS).

Conclusions: For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the presence of eroded periodicity. The autocorrelation method was identified as poorly suited for use with the blockwise bootstrap. Application of our methods to the genomes of two model organisms revealed a striking proportion of the yeast and mouse genomes are spanned by NPS. Despite their markedly different sizes, roughly equivalent proportions (19-21%) of the genomes lie within period-10 spans of the NPS dinucleotides {AA, TT, TA}. The biological significance of these regions remains to be demonstrated. To facilitate this, the genomic coordinates are available as Additional files 1, 2, and 3 in a format suitable for visualisation as tracks on popular genome browsers.

Show MeSH
Overview of methods for estimating periodic signals from sequence data. In this work, both synthetic and real data are employed after a symbolic to numeric conversion. The smaller arrow connecting Synthetic data with Sequence represents a possible connection but in this study we directly synthesized the numeric data. The methods applied to exploratory or confirmatory period estimation are indicated above these elements. The embedded blockwise bootstrap (BWB) methods are embedded Autocorrelation, embedded Hybrid and embedded integer period discrete Fourier transform (IPDFT).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3111405&req=5

Figure 1: Overview of methods for estimating periodic signals from sequence data. In this work, both synthetic and real data are employed after a symbolic to numeric conversion. The smaller arrow connecting Synthetic data with Sequence represents a possible connection but in this study we directly synthesized the numeric data. The methods applied to exploratory or confirmatory period estimation are indicated above these elements. The embedded blockwise bootstrap (BWB) methods are embedded Autocorrelation, embedded Hybrid and embedded integer period discrete Fourier transform (IPDFT).

Mentions: To date, while various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail. Most studies employ period estimation techniques explicitly or implicitly for one of two purposes: exploratory or confirmatory. The conceptual relationship between these techniques and the methodological approach taken in this work is shown in Figure 1. Exploratory period estimation seeks to discover the existence of dominant periodic components in a sequence. Confirmatory period estimation seeks to detect the strength of a given (e.g. putatively dominant) periodic component and determine its significance relative to the remaining sequence components (see for example the analysis in Table 1 below). Confirmatory period estimation can be used in an exploratory manner (e.g. by exhaustive testing of relevant period candidates), however using exploratory techniques in a confirmatory manner may lead to erroneous attribution of significance to a particular periodic component [10,11].


Statistical methods for detecting periodic fragments in DNA sequence data.

Epps J, Ying H, Huttley GA - Biol. Direct (2011)

Overview of methods for estimating periodic signals from sequence data. In this work, both synthetic and real data are employed after a symbolic to numeric conversion. The smaller arrow connecting Synthetic data with Sequence represents a possible connection but in this study we directly synthesized the numeric data. The methods applied to exploratory or confirmatory period estimation are indicated above these elements. The embedded blockwise bootstrap (BWB) methods are embedded Autocorrelation, embedded Hybrid and embedded integer period discrete Fourier transform (IPDFT).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3111405&req=5

Figure 1: Overview of methods for estimating periodic signals from sequence data. In this work, both synthetic and real data are employed after a symbolic to numeric conversion. The smaller arrow connecting Synthetic data with Sequence represents a possible connection but in this study we directly synthesized the numeric data. The methods applied to exploratory or confirmatory period estimation are indicated above these elements. The embedded blockwise bootstrap (BWB) methods are embedded Autocorrelation, embedded Hybrid and embedded integer period discrete Fourier transform (IPDFT).
Mentions: To date, while various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail. Most studies employ period estimation techniques explicitly or implicitly for one of two purposes: exploratory or confirmatory. The conceptual relationship between these techniques and the methodological approach taken in this work is shown in Figure 1. Exploratory period estimation seeks to discover the existence of dominant periodic components in a sequence. Confirmatory period estimation seeks to detect the strength of a given (e.g. putatively dominant) periodic component and determine its significance relative to the remaining sequence components (see for example the analysis in Table 1 below). Confirmatory period estimation can be used in an exploratory manner (e.g. by exhaustive testing of relevant period candidates), however using exploratory techniques in a confirmatory manner may lead to erroneous attribution of significance to a particular periodic component [10,11].

Bottom Line: The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the presence of eroded periodicity.The autocorrelation method was identified as poorly suited for use with the blockwise bootstrap.To facilitate this, the genomic coordinates are available as Additional files 1, 2, and 3 in a format suitable for visualisation as tracks on popular genome browsers.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, NSW 2052, Australia. j.epps@unsw.edu.au

ABSTRACT

Background: Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed.

Results: We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~21% and ~19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS).

Conclusions: For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the presence of eroded periodicity. The autocorrelation method was identified as poorly suited for use with the blockwise bootstrap. Application of our methods to the genomes of two model organisms revealed a striking proportion of the yeast and mouse genomes are spanned by NPS. Despite their markedly different sizes, roughly equivalent proportions (19-21%) of the genomes lie within period-10 spans of the NPS dinucleotides {AA, TT, TA}. The biological significance of these regions remains to be demonstrated. To facilitate this, the genomic coordinates are available as Additional files 1, 2, and 3 in a format suitable for visualisation as tracks on popular genome browsers.

Show MeSH