Limits...
Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs.

Zhao Y, Hayashida M, Cao Y, Hwang J, Akutsu T - BMC Bioinformatics (2015)

Bottom Line: Many tree structures are found in nature and organisms.The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs.The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan. tyoyo@kuicr.kyoto-u.ac.jp.

ABSTRACT

Background: Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On the other hand, specified construction rules can be utilized to generate multiple similar trees.

Results: Therefore, in this paper, we develop novel methods to discover common rules for the construction of multiple distinct trees, by improving and extending the previous methods using integer programming. We apply our proposed methods to several sets of glycans and RNA secondary structures, which play important roles in cellular systems, and can be regarded as tree structures. The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs.

Conclusions: We propose integer programming-based methods MinSEOTGMul and MinSEUTGMul for the determination of the minimum grammars constructing multiple ordered and unordered trees, respectively. The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.

Show MeSH
Illustration of RNA tree representation.(a) An artificial RNA secondary structure. (b) The tree representation of (a) in which paired bases are extracted by following the sequence order from 5’ to 3’, where bases in loops are removed. Base pairs belonging to the same secondary structure are filled in the same color.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4419412&req=5

Fig12: Illustration of RNA tree representation.(a) An artificial RNA secondary structure. (b) The tree representation of (a) in which paired bases are extracted by following the sequence order from 5’ to 3’, where bases in loops are removed. Base pairs belonging to the same secondary structure are filled in the same color.

Mentions: In addition, 24 RNA secondary structures belonging to distinct RNA families were taken from the Rfam database [21], as shown in Additional file 1: Table S1 on our supplementary web site; these were transformed into rooted ordered trees. For this, one sequence was selected from multiple sequence alignments of each RNA family, as our method requires edge labels, i.e., bases. RNA secondary structures consist of base pairs with hydrogen bonds, and group binding, such as bulges and hairpin loops (as seen in Figure 12 (a)). There are several representations of trees for RNA secondary structures. An RNA secondary structure can be represented as an ordered rooted tree, by labeling the vertices with unpaired loops and the edges with paired bases [22]; this structure can be represented as an ordered rooted tree by labeling the vertices with hairpin loops, internal loops, bulges, and paired bases [7,23]. Chen and Zhang represented an RNA secondary structure using a paired base and a leaf, corresponding to an internal vertex and an unpaired base, respectively [8]. In our implementation, the representation by [8] was modified by eliminated vertices other than those corresponding to paired bases; in addition, the vertex labeled tree was transformed into an edge labeled tree in a manner similar to the glycans. Figure 12 illustrates the transformation, wherein a paired base was transformed into an edge, labeled with its base pair. It is noted that the edges in this representation are ordered by following the 5’-3’ direction of the RNA sequence.Figure 12


Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs.

Zhao Y, Hayashida M, Cao Y, Hwang J, Akutsu T - BMC Bioinformatics (2015)

Illustration of RNA tree representation.(a) An artificial RNA secondary structure. (b) The tree representation of (a) in which paired bases are extracted by following the sequence order from 5’ to 3’, where bases in loops are removed. Base pairs belonging to the same secondary structure are filled in the same color.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4419412&req=5

Fig12: Illustration of RNA tree representation.(a) An artificial RNA secondary structure. (b) The tree representation of (a) in which paired bases are extracted by following the sequence order from 5’ to 3’, where bases in loops are removed. Base pairs belonging to the same secondary structure are filled in the same color.
Mentions: In addition, 24 RNA secondary structures belonging to distinct RNA families were taken from the Rfam database [21], as shown in Additional file 1: Table S1 on our supplementary web site; these were transformed into rooted ordered trees. For this, one sequence was selected from multiple sequence alignments of each RNA family, as our method requires edge labels, i.e., bases. RNA secondary structures consist of base pairs with hydrogen bonds, and group binding, such as bulges and hairpin loops (as seen in Figure 12 (a)). There are several representations of trees for RNA secondary structures. An RNA secondary structure can be represented as an ordered rooted tree, by labeling the vertices with unpaired loops and the edges with paired bases [22]; this structure can be represented as an ordered rooted tree by labeling the vertices with hairpin loops, internal loops, bulges, and paired bases [7,23]. Chen and Zhang represented an RNA secondary structure using a paired base and a leaf, corresponding to an internal vertex and an unpaired base, respectively [8]. In our implementation, the representation by [8] was modified by eliminated vertices other than those corresponding to paired bases; in addition, the vertex labeled tree was transformed into an edge labeled tree in a manner similar to the glycans. Figure 12 illustrates the transformation, wherein a paired base was transformed into an edge, labeled with its base pair. It is noted that the edges in this representation are ordered by following the 5’-3’ direction of the RNA sequence.Figure 12

Bottom Line: Many tree structures are found in nature and organisms.The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs.The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan. tyoyo@kuicr.kyoto-u.ac.jp.

ABSTRACT

Background: Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On the other hand, specified construction rules can be utilized to generate multiple similar trees.

Results: Therefore, in this paper, we develop novel methods to discover common rules for the construction of multiple distinct trees, by improving and extending the previous methods using integer programming. We apply our proposed methods to several sets of glycans and RNA secondary structures, which play important roles in cellular systems, and can be regarded as tree structures. The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs.

Conclusions: We propose integer programming-based methods MinSEOTGMul and MinSEUTGMul for the determination of the minimum grammars constructing multiple ordered and unordered trees, respectively. The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.

Show MeSH