Limits...
Genome-scale computational analysis of DNA curvature and repeats in Arabidopsis and rice uncovers plant-specific genomic properties.

Masoudi-Nejad A, Movahedi S, Jáuregui R - BMC Genomics (2011)

Bottom Line: By analyzing tandem repeats across the genome, we found that frequencies of repeats are higher in regions adjacent to those with high curvature value.Each CpG island appears in a local minimal curvature region, and CpG islands usually do not appear in the centromere or regions with high repeat frequency.This study represents the first systematic genome-scale analysis of DNA curvature, CpG islands and tandem repeats at the DNA sequence level in plant genomes, and finds that not all of the chromosomes in plants follow the same rules common to other eukaryote organisms, suggesting that some of these genomic properties might be considered as specific to plants.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics and COE in Biomathematics, University of Tehran, Iran. amasoudin@ibb.ut.ac.ir

ABSTRACT

Background: Due to its overarching role in genome function, sequence-dependent DNA curvature continues to attract great attention. The DNA double helix is not a rigid cylinder, but presents both curvature and flexibility in different regions, depending on the sequence. More in depth knowledge of the various orders of complexity of genomic DNA structure has allowed the design of sophisticated bioinformatics tools for its analysis and manipulation, which, in turn, have yielded a better understanding of the genome itself. Curved DNA is involved in many biologically important processes, such as transcription initiation and termination, recombination, DNA replication, and nucleosome positioning. CpG islands and tandem repeats also play significant roles in the dynamics and evolution of genomes.

Results: In this study, we analyzed the relationship between these three structural features within rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana) genomes. A genome-scale prediction of curvature distribution in rice and Arabidopsis indicated that most of the chromosomes of both genomes have maximal chromosomal DNA curvature adjacent to the centromeric region. By analyzing tandem repeats across the genome, we found that frequencies of repeats are higher in regions adjacent to those with high curvature value. Further analysis of CpG islands shows a clear interdependence between curvature value, repeat frequencies and CpG islands. Each CpG island appears in a local minimal curvature region, and CpG islands usually do not appear in the centromere or regions with high repeat frequency. A statistical evaluation demonstrates the significance and non-randomness of these features.

Conclusions: This study represents the first systematic genome-scale analysis of DNA curvature, CpG islands and tandem repeats at the DNA sequence level in plant genomes, and finds that not all of the chromosomes in plants follow the same rules common to other eukaryote organisms, suggesting that some of these genomic properties might be considered as specific to plants.

Show MeSH

Related in: MedlinePlus

Chromosomal curvature signal. Signal of the curvature value before (top) and after (bottom) applying the signal processing algorithm. Locations of maximal curvature values are marked by a blue arrow.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3113785&req=5

Figure 8: Chromosomal curvature signal. Signal of the curvature value before (top) and after (bottom) applying the signal processing algorithm. Locations of maximal curvature values are marked by a blue arrow.

Mentions: Our method considers a sliding window on a given signal that covers only part of the signal and each window contains a signal fragment with some high and low perturbations. In each window, we determine extreme points by a simple analysis in O(n) time complexity. When each point has a bigger or lower value than both its predecessor and successor points, it is called a maximal or minimal point and collected as an apex value. Thereafter in each window, two base lines for positive and negative apex values are defined such that via these base lines we construct two new coordinates for the signal's peak values. These new coordinates are suitable for exaggerating low and high perturbations. To describe this method, we focused first on positive values; if the positive peaks' values are members of the set Sp={P1, P2, ..., Pn}, the mean value (Mp) of the set can show the base line of positive apexes. By using Mp, a new set of positive apexes can be reached by subtracting Mp; thus giving a new_Sp={ P1-Mp, P2-Mp, ..., Pn-Mp}. Here the application of an exponential function {ex/ x is member of new_Sp} will emphasis high apex values and reduce low apex values. This process of changing coordinates is a type of kernel function, as used on statistical machine learning approaches (such as support vector machines). Through this change, the system's low perturbations, which have negative values in our exponential function, will be projected into small values whereas high perturbations that have positive values will be mapped to exponentially higher values after performing the exponential function. The process of analyzing negative apex values Sn= {N1, N2, ..., Nm} is similar to the positive values where the exponential function has changed to {-e-x/ x is member of new_Sn}. The details of the algorithm are presented below. Figure 8 shows the curvature signals before and after applying the algorithm.


Genome-scale computational analysis of DNA curvature and repeats in Arabidopsis and rice uncovers plant-specific genomic properties.

Masoudi-Nejad A, Movahedi S, Jáuregui R - BMC Genomics (2011)

Chromosomal curvature signal. Signal of the curvature value before (top) and after (bottom) applying the signal processing algorithm. Locations of maximal curvature values are marked by a blue arrow.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3113785&req=5

Figure 8: Chromosomal curvature signal. Signal of the curvature value before (top) and after (bottom) applying the signal processing algorithm. Locations of maximal curvature values are marked by a blue arrow.
Mentions: Our method considers a sliding window on a given signal that covers only part of the signal and each window contains a signal fragment with some high and low perturbations. In each window, we determine extreme points by a simple analysis in O(n) time complexity. When each point has a bigger or lower value than both its predecessor and successor points, it is called a maximal or minimal point and collected as an apex value. Thereafter in each window, two base lines for positive and negative apex values are defined such that via these base lines we construct two new coordinates for the signal's peak values. These new coordinates are suitable for exaggerating low and high perturbations. To describe this method, we focused first on positive values; if the positive peaks' values are members of the set Sp={P1, P2, ..., Pn}, the mean value (Mp) of the set can show the base line of positive apexes. By using Mp, a new set of positive apexes can be reached by subtracting Mp; thus giving a new_Sp={ P1-Mp, P2-Mp, ..., Pn-Mp}. Here the application of an exponential function {ex/ x is member of new_Sp} will emphasis high apex values and reduce low apex values. This process of changing coordinates is a type of kernel function, as used on statistical machine learning approaches (such as support vector machines). Through this change, the system's low perturbations, which have negative values in our exponential function, will be projected into small values whereas high perturbations that have positive values will be mapped to exponentially higher values after performing the exponential function. The process of analyzing negative apex values Sn= {N1, N2, ..., Nm} is similar to the positive values where the exponential function has changed to {-e-x/ x is member of new_Sn}. The details of the algorithm are presented below. Figure 8 shows the curvature signals before and after applying the algorithm.

Bottom Line: By analyzing tandem repeats across the genome, we found that frequencies of repeats are higher in regions adjacent to those with high curvature value.Each CpG island appears in a local minimal curvature region, and CpG islands usually do not appear in the centromere or regions with high repeat frequency.This study represents the first systematic genome-scale analysis of DNA curvature, CpG islands and tandem repeats at the DNA sequence level in plant genomes, and finds that not all of the chromosomes in plants follow the same rules common to other eukaryote organisms, suggesting that some of these genomic properties might be considered as specific to plants.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics and COE in Biomathematics, University of Tehran, Iran. amasoudin@ibb.ut.ac.ir

ABSTRACT

Background: Due to its overarching role in genome function, sequence-dependent DNA curvature continues to attract great attention. The DNA double helix is not a rigid cylinder, but presents both curvature and flexibility in different regions, depending on the sequence. More in depth knowledge of the various orders of complexity of genomic DNA structure has allowed the design of sophisticated bioinformatics tools for its analysis and manipulation, which, in turn, have yielded a better understanding of the genome itself. Curved DNA is involved in many biologically important processes, such as transcription initiation and termination, recombination, DNA replication, and nucleosome positioning. CpG islands and tandem repeats also play significant roles in the dynamics and evolution of genomes.

Results: In this study, we analyzed the relationship between these three structural features within rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana) genomes. A genome-scale prediction of curvature distribution in rice and Arabidopsis indicated that most of the chromosomes of both genomes have maximal chromosomal DNA curvature adjacent to the centromeric region. By analyzing tandem repeats across the genome, we found that frequencies of repeats are higher in regions adjacent to those with high curvature value. Further analysis of CpG islands shows a clear interdependence between curvature value, repeat frequencies and CpG islands. Each CpG island appears in a local minimal curvature region, and CpG islands usually do not appear in the centromere or regions with high repeat frequency. A statistical evaluation demonstrates the significance and non-randomness of these features.

Conclusions: This study represents the first systematic genome-scale analysis of DNA curvature, CpG islands and tandem repeats at the DNA sequence level in plant genomes, and finds that not all of the chromosomes in plants follow the same rules common to other eukaryote organisms, suggesting that some of these genomic properties might be considered as specific to plants.

Show MeSH
Related in: MedlinePlus