Limits...
Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs.

Zhao Y, Hayashida M, Cao Y, Hwang J, Akutsu T - BMC Bioinformatics (2015)

Bottom Line: Many tree structures are found in nature and organisms.The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs.The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan. tyoyo@kuicr.kyoto-u.ac.jp.

ABSTRACT

Background: Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On the other hand, specified construction rules can be utilized to generate multiple similar trees.

Results: Therefore, in this paper, we develop novel methods to discover common rules for the construction of multiple distinct trees, by improving and extending the previous methods using integer programming. We apply our proposed methods to several sets of glycans and RNA secondary structures, which play important roles in cellular systems, and can be regarded as tree structures. The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs.

Conclusions: We propose integer programming-based methods MinSEOTGMul and MinSEUTGMul for the determination of the minimum grammars constructing multiple ordered and unordered trees, respectively. The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.

Show MeSH
Results of subtrees corresponding to nonterminal symbols in the minimum SEOTG for ordered rooted trees of RNAs from RF00002 and RF00008. Subtrees corresponding to the same nonterminal symbol are filled in the same color.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4419412&req=5

Fig14: Results of subtrees corresponding to nonterminal symbols in the minimum SEOTG for ordered rooted trees of RNAs from RF00002 and RF00008. Subtrees corresponding to the same nonterminal symbol are filled in the same color.

Mentions: The proposed method MinSEOTGMul was applied to the RNA secondary structures and their several combinations. Table 2 shows the minimum number of nonterminal symbols of SEOTG for RNA ordered trees. In all cases, the minimum number of nonterminal symbols for multiple trees of RNA was lower than the sum of minimum numbers for its single trees, similar to the glycans. The execution time and the memory usage by the IP solver for multiple trees in our experiments were between two seconds and six hours, and between 260 Mbytes and 37 Gbytes, respectively. Figure 14 shows the subtrees corresponding to nonterminal symbols in the minimum SEOTG for ordered rooted trees of the RNA secondary structures RF00002 and RF00008. Figure 15 shows the original RNA secondary structures of RF00002 and RF00008, and the base pairs corresponding to nonterminal symbols in the minimum SEOTG. We also observed the hierarchical structure of the nonterminal symbols, colored blue and brown.Table 2


Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs.

Zhao Y, Hayashida M, Cao Y, Hwang J, Akutsu T - BMC Bioinformatics (2015)

Results of subtrees corresponding to nonterminal symbols in the minimum SEOTG for ordered rooted trees of RNAs from RF00002 and RF00008. Subtrees corresponding to the same nonterminal symbol are filled in the same color.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4419412&req=5

Fig14: Results of subtrees corresponding to nonterminal symbols in the minimum SEOTG for ordered rooted trees of RNAs from RF00002 and RF00008. Subtrees corresponding to the same nonterminal symbol are filled in the same color.
Mentions: The proposed method MinSEOTGMul was applied to the RNA secondary structures and their several combinations. Table 2 shows the minimum number of nonterminal symbols of SEOTG for RNA ordered trees. In all cases, the minimum number of nonterminal symbols for multiple trees of RNA was lower than the sum of minimum numbers for its single trees, similar to the glycans. The execution time and the memory usage by the IP solver for multiple trees in our experiments were between two seconds and six hours, and between 260 Mbytes and 37 Gbytes, respectively. Figure 14 shows the subtrees corresponding to nonterminal symbols in the minimum SEOTG for ordered rooted trees of the RNA secondary structures RF00002 and RF00008. Figure 15 shows the original RNA secondary structures of RF00002 and RF00008, and the base pairs corresponding to nonterminal symbols in the minimum SEOTG. We also observed the hierarchical structure of the nonterminal symbols, colored blue and brown.Table 2

Bottom Line: Many tree structures are found in nature and organisms.The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs.The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan. tyoyo@kuicr.kyoto-u.ac.jp.

ABSTRACT

Background: Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On the other hand, specified construction rules can be utilized to generate multiple similar trees.

Results: Therefore, in this paper, we develop novel methods to discover common rules for the construction of multiple distinct trees, by improving and extending the previous methods using integer programming. We apply our proposed methods to several sets of glycans and RNA secondary structures, which play important roles in cellular systems, and can be regarded as tree structures. The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs.

Conclusions: We propose integer programming-based methods MinSEOTGMul and MinSEUTGMul for the determination of the minimum grammars constructing multiple ordered and unordered trees, respectively. The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.

Show MeSH