Limits...
A Practical and Scalable Tool to Find Overlaps between Sequences.

Rachid MH, Malluhi Q - Biomed Res Int (2015)

Bottom Line: The paper demonstrates an efficient construction of this time-efficient and space-economical tree data structure.Experimental evaluation indicates superior results in terms of space and time over existing solutions.Results also show that the proposed technique is highly scalable in a parallel execution environment.

View Article: PubMed Central - PubMed

Affiliation: KINDI Lab for Computing Research, Qatar University, P.O. Box 2713, Doha, Qatar.

ABSTRACT
The evolution of the next generation sequencing technology increases the demand for efficient solutions, in terms of space and time, for several bioinformatics problems. This paper presents a practical and easy-to-implement solution for one of these problems, namely, the all-pairs suffix-prefix problem, using a compact prefix tree. The paper demonstrates an efficient construction of this time-efficient and space-economical tree data structure. The paper presents techniques for parallel implementations of the proposed solution. Experimental evaluation indicates superior results in terms of space and time over existing solutions. Results also show that the proposed technique is highly scalable in a parallel execution environment.

No MeSH data available.


Space comparison between 3 different solutions, running on machine A, with real data.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4417569&req=5

fig6: Space comparison between 3 different solutions, running on machine A, with real data.

Mentions: Using real data, SGA, Readjoiner, and SOF were tested on machine A using 4 threads (the maximum number of threads on machine A). We ignore other solutions since they do not support multithreading or they are remarkably slow. The time and space consumptions are shown in Figures 5 and 6. SOF had the best performance when using multithreading in most cases. In these results, the prefiltering time for Readjoiner is ignored. Both SOF and Readjoiner performed much better than SGA. We attribute the impressive performance and low space requirement of Readjoiner when testing with Atta cephalotes to the low number of strings in this data set. This is due to the fact that Readjoiner finds distinct prefixes which can be candidate for suffix-prefix matches. This procedure is related to the number of strings in the data set.


A Practical and Scalable Tool to Find Overlaps between Sequences.

Rachid MH, Malluhi Q - Biomed Res Int (2015)

Space comparison between 3 different solutions, running on machine A, with real data.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4417569&req=5

fig6: Space comparison between 3 different solutions, running on machine A, with real data.
Mentions: Using real data, SGA, Readjoiner, and SOF were tested on machine A using 4 threads (the maximum number of threads on machine A). We ignore other solutions since they do not support multithreading or they are remarkably slow. The time and space consumptions are shown in Figures 5 and 6. SOF had the best performance when using multithreading in most cases. In these results, the prefiltering time for Readjoiner is ignored. Both SOF and Readjoiner performed much better than SGA. We attribute the impressive performance and low space requirement of Readjoiner when testing with Atta cephalotes to the low number of strings in this data set. This is due to the fact that Readjoiner finds distinct prefixes which can be candidate for suffix-prefix matches. This procedure is related to the number of strings in the data set.

Bottom Line: The paper demonstrates an efficient construction of this time-efficient and space-economical tree data structure.Experimental evaluation indicates superior results in terms of space and time over existing solutions.Results also show that the proposed technique is highly scalable in a parallel execution environment.

View Article: PubMed Central - PubMed

Affiliation: KINDI Lab for Computing Research, Qatar University, P.O. Box 2713, Doha, Qatar.

ABSTRACT
The evolution of the next generation sequencing technology increases the demand for efficient solutions, in terms of space and time, for several bioinformatics problems. This paper presents a practical and easy-to-implement solution for one of these problems, namely, the all-pairs suffix-prefix problem, using a compact prefix tree. The paper demonstrates an efficient construction of this time-efficient and space-economical tree data structure. The paper presents techniques for parallel implementations of the proposed solution. Experimental evaluation indicates superior results in terms of space and time over existing solutions. Results also show that the proposed technique is highly scalable in a parallel execution environment.

No MeSH data available.