Lineage-specific conserved noncoding sequences of plant genomes: their possible role in nucleosome positioning.
Bottom Line: Many studies on conserved noncoding sequences (CNSs) have found that CNSs are enriched significantly in regulatory sequence elements.A drop of A+T content near the border of CNSs was observed and CNS regions showed a higher nucleosome occupancy probability.These CNSs are candidate regulatory elements, which are expected to define lineage-specific features of various plant groups.
Many studies on conserved noncoding sequences (CNSs) have found that CNSs are enriched significantly in regulatory sequence elements. We conducted whole-genome analysis on plant CNSs to identify lineage-specific CNSs in eudicots, monocots, angiosperms,and vascular plants based on the premise that lineage-specific CNSs define lineage-specific characters and functions in groups of organisms. We identified 27 eudicot, 204 monocot, 6,536 grass, 19 angiosperm, and 2 vascular plant lineage-specific CNSs(lengths range from 16 to 1,517 bp) that presumably originated in their respective common ancestors. A stronger constraint on the CNSs located in the untranslated regions was observed. The CNSs were often flanked by genes involved in transcription regulation. A drop of A+T content near the border of CNSs was observed and CNS regions showed a higher nucleosome occupancy probability. These CNSs are candidate regulatory elements, which are expected to define lineage-specific features of various plant groups.
Mentions: We searched eudicot lineage-specific CNSs by analyzing genome sequences of the following seven eudicot species: Ar. thaliana, Brassica rapa, Populus tricocarpa, Ricinus communis, V. vinifera, Cucumis sativus, and Aquilegia coerulea. To determine grass-specific CNSs, genome sequences of O. sativa, Brac. distachyon, So. bicolor, and Setaria italica were compared. The genome sequences of M. acuminata were also analyzed to determine monocot-specific CNSs in addition to the four grass species mentioned above. It has to be noted that in order to look for the specific CNSs in the analysis we have included the most basal species sequenced so far, assuming that if a CNS is present in the most diverged species, it is highly likely to be found in closer species inside a group. The most basal eudicot species used in the study is Aq. coerulea which diverged about 120 Ma (Anderson et al. 2005) from the rest of the eudicot species used in this study. Musa acuminata is considered as the basal monocot species, which diverged from grasses about 115 Ma (D’Hont et al. 2012). The other species used in the study are Selaginella moellendorffii that diverged from angiosperms about 400 Ma (Banks et al. 2011), Physcomitrella patens that diverged 450 Ma (Rensing et al. 2007) from vascular plants, and Chlamydomonas reinhardtii that diverged from land plants more than 1,000 Ma (Heckman et al. 2001). A total of 15 species (see fig. 3 for their phylogenetic relationship) were used with the expectation of finding the group-specific CNSs in this study.