Limits...
Efficient sequential and parallel algorithms for planted motif search.

Nicolae M, Rajasekaran S - BMC Bioinformatics (2014)

Bottom Line: The PMS problem is NP-complete.We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances.PMS8 can solve instances not solved by any previous algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. marius.nicolae@engr.uconn.edu.

ABSTRACT

Background: Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances.

Results: This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/.

Conclusions: We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

Show MeSH

Related in: MedlinePlus

The effect of the “sort rows by size” speedup on runtime. Comparison between PMS8 and PMS8 without sorting rows by size (see the section on Speedup techniques). Note that, in both versions, all the other speedups are enabled. The runtimes are for single core execution, averaged over 5 random instances.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3924400&req=5

Figure 4: The effect of the “sort rows by size” speedup on runtime. Comparison between PMS8 and PMS8 without sorting rows by size (see the section on Speedup techniques). Note that, in both versions, all the other speedups are enabled. The runtimes are for single core execution, averaged over 5 random instances.

Mentions: In Figure 4 we show the effect of the first speedup technique (sorthing rows by size) on the runtime. Note that all other speedups are enabled, only sorting rows by size is varied. The figure shows that sorting the rows by size improves the speed of PMS8 by 25% to 50%.


Efficient sequential and parallel algorithms for planted motif search.

Nicolae M, Rajasekaran S - BMC Bioinformatics (2014)

The effect of the “sort rows by size” speedup on runtime. Comparison between PMS8 and PMS8 without sorting rows by size (see the section on Speedup techniques). Note that, in both versions, all the other speedups are enabled. The runtimes are for single core execution, averaged over 5 random instances.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3924400&req=5

Figure 4: The effect of the “sort rows by size” speedup on runtime. Comparison between PMS8 and PMS8 without sorting rows by size (see the section on Speedup techniques). Note that, in both versions, all the other speedups are enabled. The runtimes are for single core execution, averaged over 5 random instances.
Mentions: In Figure 4 we show the effect of the first speedup technique (sorthing rows by size) on the runtime. Note that all other speedups are enabled, only sorting rows by size is varied. The figure shows that sorting the rows by size improves the speed of PMS8 by 25% to 50%.

Bottom Line: The PMS problem is NP-complete.We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances.PMS8 can solve instances not solved by any previous algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. marius.nicolae@engr.uconn.edu.

ABSTRACT

Background: Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances.

Results: This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/.

Conclusions: We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

Show MeSH
Related in: MedlinePlus