Limits...
A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

Xue Y, Liao Z, Li M, Luo J, Kuang Q, Hu X, Li T - Comput Math Methods Med (2015)

Bottom Line: In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods.Finally, experiments were implemented on gene and synthetic datasets.Results demonstrated the effectiveness and efficiency of this method.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China.

ABSTRACT
Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

No MeSH data available.


© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4464847&req=5

Mentions: All common subsequence (ACS) [17] is a variation from the traditional longest common subsequence (LCS). LCS is a classical problem with a goal to determine LCS from a set of sequences (generally two sequences). Wang [17] proposed this new method to calculate the similarity between two sequences. Different from the previous LCS method, this method calculates the similarity based on the number of all common sequences between the two sequences. calACS [13] is a new method to calculate the number of ACS between sequences A and B. We improved calACS to obtain all common subsequences between two sequences. The pseudocode of the improved calACS algorithm is shown as Algorithm 1.


A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

Xue Y, Liao Z, Li M, Luo J, Kuang Q, Hu X, Li T - Comput Math Methods Med (2015)

© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4464847&req=5

Mentions: All common subsequence (ACS) [17] is a variation from the traditional longest common subsequence (LCS). LCS is a classical problem with a goal to determine LCS from a set of sequences (generally two sequences). Wang [17] proposed this new method to calculate the similarity between two sequences. Different from the previous LCS method, this method calculates the similarity based on the number of all common sequences between the two sequences. calACS [13] is a new method to calculate the number of ACS between sequences A and B. We improved calACS to obtain all common subsequences between two sequences. The pseudocode of the improved calACS algorithm is shown as Algorithm 1.

Bottom Line: In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods.Finally, experiments were implemented on gene and synthetic datasets.Results demonstrated the effectiveness and efficiency of this method.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, China.

ABSTRACT
Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

No MeSH data available.