Limits...
Local Renyi entropic profiles of DNA sequences.

Vinga S, Almeida JS - BMC Bioinformatics (2007)

Bottom Line: Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation.The new methodology enables two results.On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R, Alves Redol 9, 1000-029 Lisboa, Portugal. svinga@kdbio.inesc-id.pt

ABSTRACT

Background: In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs.

Results: The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/.

Conclusion: The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

Show MeSH

Related in: MedlinePlus

Entropic profile (EP) for sequence Hi – complete genome of H. influenzae. a) and b) Analysis of position 36532 (from the beginning of replication). c) and d) Detail for the EP for positions 36200 to 38200 and 36500 to 36600. The highest peaks in the EP correspond to uptake signal sequences (USS+) 5'-AAGTGCCGGT-3', its reverse complement (USS-) 5'-ACCGCACTT-3' and related motifs, such as AGTGCGGT and AAGTGCGG. The Chi sites are not particularly well conserved neither overexpressed [24] and therefore are not easily detected with this approach.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2238722&req=5

Figure 6: Entropic profile (EP) for sequence Hi – complete genome of H. influenzae. a) and b) Analysis of position 36532 (from the beginning of replication). c) and d) Detail for the EP for positions 36200 to 38200 and 36500 to 36600. The highest peaks in the EP correspond to uptake signal sequences (USS+) 5'-AAGTGCCGGT-3', its reverse complement (USS-) 5'-ACCGCACTT-3' and related motifs, such as AGTGCGGT and AAGTGCGG. The Chi sites are not particularly well conserved neither overexpressed [24] and therefore are not easily detected with this approach.

Mentions: When analyzing the genome of H. influenzae and studying one particular position where motif 5'-GGTGGTGG-3' ends (in the example, p = 36532), the following Figure 6 is obtained.


Local Renyi entropic profiles of DNA sequences.

Vinga S, Almeida JS - BMC Bioinformatics (2007)

Entropic profile (EP) for sequence Hi – complete genome of H. influenzae. a) and b) Analysis of position 36532 (from the beginning of replication). c) and d) Detail for the EP for positions 36200 to 38200 and 36500 to 36600. The highest peaks in the EP correspond to uptake signal sequences (USS+) 5'-AAGTGCCGGT-3', its reverse complement (USS-) 5'-ACCGCACTT-3' and related motifs, such as AGTGCGGT and AAGTGCGG. The Chi sites are not particularly well conserved neither overexpressed [24] and therefore are not easily detected with this approach.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2238722&req=5

Figure 6: Entropic profile (EP) for sequence Hi – complete genome of H. influenzae. a) and b) Analysis of position 36532 (from the beginning of replication). c) and d) Detail for the EP for positions 36200 to 38200 and 36500 to 36600. The highest peaks in the EP correspond to uptake signal sequences (USS+) 5'-AAGTGCCGGT-3', its reverse complement (USS-) 5'-ACCGCACTT-3' and related motifs, such as AGTGCGGT and AAGTGCGG. The Chi sites are not particularly well conserved neither overexpressed [24] and therefore are not easily detected with this approach.
Mentions: When analyzing the genome of H. influenzae and studying one particular position where motif 5'-GGTGGTGG-3' ends (in the example, p = 36532), the following Figure 6 is obtained.

Bottom Line: Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation.The new methodology enables two results.On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region.

View Article: PubMed Central - HTML - PubMed

Affiliation: Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R, Alves Redol 9, 1000-029 Lisboa, Portugal. svinga@kdbio.inesc-id.pt

ABSTRACT

Background: In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs.

Results: The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/.

Conclusion: The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

Show MeSH
Related in: MedlinePlus