Limits...
Efficient mining of interesting patterns in large biological sequences.

Rashid MM, Karim MR, Jeong BS, Choi HJ - Genomics Inform (2012)

Bottom Line: So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not.In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin.Experimental results show that our approach can find interesting patterns within an acceptable computation time.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.

ABSTRACT
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.

No MeSH data available.


Index-based fixed-length spanning tree.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC3475482&req=5

Figure 3: Index-based fixed-length spanning tree.

Mentions: The spanning tree is shown in Fig. 3, which is constructed based on the database available in Table 1. We have constructed a fixed-length spanning tree using the method suggested by Zerin et al. [13] but put the sequence ID and the staring position in the leaf node of the tree as a variable length array. Once the tree is constructed like Fig. 3, retrieval of the tree can obtain contiguous subsequences with length-4, satisfying the satisfying minimum information gain threshold and minimum confidence threshold. Then, the obtained length-4 surprising contiguous patterns are 〈ATCG〉, 〈TCGT〉, 〈TGAT〉, 〈CGTG〉, 〈CGTT〉, 〈CATC〉, and 〈GTGA〉, shown in Fig. 4.


Efficient mining of interesting patterns in large biological sequences.

Rashid MM, Karim MR, Jeong BS, Choi HJ - Genomics Inform (2012)

Index-based fixed-length spanning tree.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC3475482&req=5

Figure 3: Index-based fixed-length spanning tree.
Mentions: The spanning tree is shown in Fig. 3, which is constructed based on the database available in Table 1. We have constructed a fixed-length spanning tree using the method suggested by Zerin et al. [13] but put the sequence ID and the staring position in the leaf node of the tree as a variable length array. Once the tree is constructed like Fig. 3, retrieval of the tree can obtain contiguous subsequences with length-4, satisfying the satisfying minimum information gain threshold and minimum confidence threshold. Then, the obtained length-4 surprising contiguous patterns are 〈ATCG〉, 〈TCGT〉, 〈TGAT〉, 〈CGTG〉, 〈CGTT〉, 〈CATC〉, and 〈GTGA〉, shown in Fig. 4.

Bottom Line: So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not.In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin.Experimental results show that our approach can find interesting patterns within an acceptable computation time.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.

ABSTRACT
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.

No MeSH data available.