Limits...
Identification of novel motif patterns to decipher the promoter architecture of co-expressed genes in Arabidopsis thaliana.

López Y, Patil A, Nakai K - BMC Syst Biol (2013)

Bottom Line: The discovered PS-specific patterns were tested in the entire A. thaliana genome, correctly identifying 77.8%, 81.2%, 70.8% and 53.7% genes expressed in petal differentiation, synergid cells, root hair and trichome, as well as 88.4% housekeeping genes.Based on these findings, we conclude that the positioning and orientation of transcription factor binding sites at specific distances from the translation start site is a reliable measure to differentiate promoters of genes expressed in different A. thaliana structures from background genomic promoters.Our method can be used to predict novel motifs and decipher a similar promoter architecture for genes co-expressed in A. thaliana under different conditions.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The understanding of the mechanisms of transcriptional regulation remains a challenge for molecular biologists in the post-genome era. It is hypothesized that the regulatory regions of genes expressed in the same tissue or cell type share a similar structure. Though several studies have analyzed the promoters of genes expressed in specific metazoan tissues or cells, little research has been done in plants. Hence finding specific patterns of motifs to explain the promoter architecture of co-expressed genes in plants could shed light on their transcription mechanism.

Results: We identified novel patterns of sets of motifs in promoters of genes co-expressed in four different plant structures (PSs) and in the entire plant in Arabidopsis thaliana. Sets of genes expressed in four PSs (flower, seed, root, shoot) and housekeeping genes expressed in the entire plant were taken from a database of co-expressed genes in A. thaliana. PS-specific motifs were predicted using three motif-discovery algorithms, 8 of which are novel, to the best of our knowledge. A support vector machine was trained using the average upstream distance of the identified motifs from the translation start site on both strands of binding sites. The correctly classified promoters per PS were used to construct specific patterns of sets of motifs to describe the promoter architecture of those co-expressed genes. The discovered PS-specific patterns were tested in the entire A. thaliana genome, correctly identifying 77.8%, 81.2%, 70.8% and 53.7% genes expressed in petal differentiation, synergid cells, root hair and trichome, as well as 88.4% housekeeping genes.

Conclusions: We present five patterns of sets of motifs which describe the promoter architecture of co-expressed genes in five PSs with the ability to predict them from the entire A. thaliana genome. Based on these findings, we conclude that the positioning and orientation of transcription factor binding sites at specific distances from the translation start site is a reliable measure to differentiate promoters of genes expressed in different A. thaliana structures from background genomic promoters. Our method can be used to predict novel motifs and decipher a similar promoter architecture for genes co-expressed in A. thaliana under different conditions.

Show MeSH

Related in: MedlinePlus

Logos of the over-represented motifs in shoot. For each motif, its group specificity score and a comment is included. A known motif is also depicted with an E-val from the STAMP website application [24], a description of the TF binding to it and its reference.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3852273&req=5

Figure 5: Logos of the over-represented motifs in shoot. For each motif, its group specificity score and a comment is included. A known motif is also depicted with an E-val from the STAMP website application [24], a description of the TF binding to it and its reference.

Mentions: The motif-prediction process per set of promoters identified 142 flower-specific motifs, 183 seed-specific motifs, 171 root-specific motifs, 142 shoot-specific motifs and 141 whole plant-specific motifs, respectively (see table 1). To remove redundant motifs, each position frequency matrix was converted to a k-mer frequency vector that was then used to build a distance matrix by the Pearson Correlation distance. This matrix was used to cluster each group of PS-specific motifs by average-linkage hierarchical method. The optimal number of clusters per PS was 6, 3, 5, 4 and 2 for flower, seed, root, shoot and whole plant, respectively (see table 1). Hereafter the whole plant will be referred to as a PS for simplicity. The group specificity score (measure of how well a motif targets the promoter regions where it was found) [15] of each motif was computed and motifs with the smallest score per cluster were chosen for further analysis. The selected motifs were further compared with plant cis-acting regulatory elements in the PLACE database [13]. Motifs with p-values less than 0.001 were regarded as known motifs, otherwise, novel ones. In order to restrict as much as possible our motif comparison, we chose a strict p-value equal to that successfully used to validate the motif comparison algorithm TOMTOM (see additional data file in [16]). As a result, motif Rt_1 (see Figure 1) matched to ACIIPVPAL2 (motif known for playing a key role in vascular tissue whose primary component "xylem" is usually located close to the interior of roots), motif Sd_1 (see Figure 2) matched to ACGTSEED3 ("ACGT motif" related to seed expression) and motif Plt_1 (see Figure 3) matched to INTRONLOWER (motif involved in "3' intron-exon splice junctions" in the plant). On the contrary, flower-specific motifs Flw_1, Flw_2, Flw_3 and Flw_5 (see Figure 4), root-specific motifs Rt_2 and Rt_4 (see Figure 1), seed-specific motif Sd_2 (see Figure 2) and shoot-specific motif Sht_2 (see Figure 5) did not match significantly to any known cis-acting regulatory element in the PLACE database, thus representing potentially new regulatory motifs in plants. We also compared our predicted motifs with others previously reported in A. thaliana [12]. As a result, motif Plt_2 (see Figure 3) matched to Motif_8 (see Figure 1 in [12]), motif Rt_3 (see Figure 1) matched to Motif_3 (see Figure 1 in [12]) and motif Sd_1 (see Figure 2) matched to Motif_11 (see Figure 1 in [12]) with p-values less than 0.001. In addition, we compared our 8 novel motifs to those stored in JASPAR database [17] and found that all the compared plant motifs matched significantly to motifs in other organisms (see table 2).


Identification of novel motif patterns to decipher the promoter architecture of co-expressed genes in Arabidopsis thaliana.

López Y, Patil A, Nakai K - BMC Syst Biol (2013)

Logos of the over-represented motifs in shoot. For each motif, its group specificity score and a comment is included. A known motif is also depicted with an E-val from the STAMP website application [24], a description of the TF binding to it and its reference.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3852273&req=5

Figure 5: Logos of the over-represented motifs in shoot. For each motif, its group specificity score and a comment is included. A known motif is also depicted with an E-val from the STAMP website application [24], a description of the TF binding to it and its reference.
Mentions: The motif-prediction process per set of promoters identified 142 flower-specific motifs, 183 seed-specific motifs, 171 root-specific motifs, 142 shoot-specific motifs and 141 whole plant-specific motifs, respectively (see table 1). To remove redundant motifs, each position frequency matrix was converted to a k-mer frequency vector that was then used to build a distance matrix by the Pearson Correlation distance. This matrix was used to cluster each group of PS-specific motifs by average-linkage hierarchical method. The optimal number of clusters per PS was 6, 3, 5, 4 and 2 for flower, seed, root, shoot and whole plant, respectively (see table 1). Hereafter the whole plant will be referred to as a PS for simplicity. The group specificity score (measure of how well a motif targets the promoter regions where it was found) [15] of each motif was computed and motifs with the smallest score per cluster were chosen for further analysis. The selected motifs were further compared with plant cis-acting regulatory elements in the PLACE database [13]. Motifs with p-values less than 0.001 were regarded as known motifs, otherwise, novel ones. In order to restrict as much as possible our motif comparison, we chose a strict p-value equal to that successfully used to validate the motif comparison algorithm TOMTOM (see additional data file in [16]). As a result, motif Rt_1 (see Figure 1) matched to ACIIPVPAL2 (motif known for playing a key role in vascular tissue whose primary component "xylem" is usually located close to the interior of roots), motif Sd_1 (see Figure 2) matched to ACGTSEED3 ("ACGT motif" related to seed expression) and motif Plt_1 (see Figure 3) matched to INTRONLOWER (motif involved in "3' intron-exon splice junctions" in the plant). On the contrary, flower-specific motifs Flw_1, Flw_2, Flw_3 and Flw_5 (see Figure 4), root-specific motifs Rt_2 and Rt_4 (see Figure 1), seed-specific motif Sd_2 (see Figure 2) and shoot-specific motif Sht_2 (see Figure 5) did not match significantly to any known cis-acting regulatory element in the PLACE database, thus representing potentially new regulatory motifs in plants. We also compared our predicted motifs with others previously reported in A. thaliana [12]. As a result, motif Plt_2 (see Figure 3) matched to Motif_8 (see Figure 1 in [12]), motif Rt_3 (see Figure 1) matched to Motif_3 (see Figure 1 in [12]) and motif Sd_1 (see Figure 2) matched to Motif_11 (see Figure 1 in [12]) with p-values less than 0.001. In addition, we compared our 8 novel motifs to those stored in JASPAR database [17] and found that all the compared plant motifs matched significantly to motifs in other organisms (see table 2).

Bottom Line: The discovered PS-specific patterns were tested in the entire A. thaliana genome, correctly identifying 77.8%, 81.2%, 70.8% and 53.7% genes expressed in petal differentiation, synergid cells, root hair and trichome, as well as 88.4% housekeeping genes.Based on these findings, we conclude that the positioning and orientation of transcription factor binding sites at specific distances from the translation start site is a reliable measure to differentiate promoters of genes expressed in different A. thaliana structures from background genomic promoters.Our method can be used to predict novel motifs and decipher a similar promoter architecture for genes co-expressed in A. thaliana under different conditions.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The understanding of the mechanisms of transcriptional regulation remains a challenge for molecular biologists in the post-genome era. It is hypothesized that the regulatory regions of genes expressed in the same tissue or cell type share a similar structure. Though several studies have analyzed the promoters of genes expressed in specific metazoan tissues or cells, little research has been done in plants. Hence finding specific patterns of motifs to explain the promoter architecture of co-expressed genes in plants could shed light on their transcription mechanism.

Results: We identified novel patterns of sets of motifs in promoters of genes co-expressed in four different plant structures (PSs) and in the entire plant in Arabidopsis thaliana. Sets of genes expressed in four PSs (flower, seed, root, shoot) and housekeeping genes expressed in the entire plant were taken from a database of co-expressed genes in A. thaliana. PS-specific motifs were predicted using three motif-discovery algorithms, 8 of which are novel, to the best of our knowledge. A support vector machine was trained using the average upstream distance of the identified motifs from the translation start site on both strands of binding sites. The correctly classified promoters per PS were used to construct specific patterns of sets of motifs to describe the promoter architecture of those co-expressed genes. The discovered PS-specific patterns were tested in the entire A. thaliana genome, correctly identifying 77.8%, 81.2%, 70.8% and 53.7% genes expressed in petal differentiation, synergid cells, root hair and trichome, as well as 88.4% housekeeping genes.

Conclusions: We present five patterns of sets of motifs which describe the promoter architecture of co-expressed genes in five PSs with the ability to predict them from the entire A. thaliana genome. Based on these findings, we conclude that the positioning and orientation of transcription factor binding sites at specific distances from the translation start site is a reliable measure to differentiate promoters of genes expressed in different A. thaliana structures from background genomic promoters. Our method can be used to predict novel motifs and decipher a similar promoter architecture for genes co-expressed in A. thaliana under different conditions.

Show MeSH
Related in: MedlinePlus