Efficient sequential and parallel algorithms for planted motif search.
Bottom Line:
The PMS problem is NP-complete.We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances.PMS8 can solve instances not solved by any previous algorithms.
Affiliation: Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. marius.nicolae@engr.uconn.edu.
ABSTRACT
Show MeSH
Background: Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. Results: This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. Conclusions: We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms. Related in: MedlinePlus |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC3924400&req=5
Mentions: Case 2) For all i,1≤i≤3 we have ni<di. We construct M as shown in Figure 2. For columns k of type N0,N2 and N3 we set M[ k]=T1[ k]. For columns of type N1 we set M[ k]=T2[ k]. For any i,1≤i≤3 the following applies. If ni+n4≤di then the Hamming distance between M and Ti is less than di regardless of what characters we choose for M in the columns of type N4. On the other hand, if ni+n4>di then M and Ti have to match in at least ni+n4-di columns of type N4. Thus, we pick max(0,ni+n4-di) columns of type N4 and for each such column k we set M[ k]=Ti[ k]. Now we prove that we actually have enough columns to make the above choices, in other words . This is equivalent to the following conditions being true: |
View Article: PubMed Central - HTML - PubMed
Affiliation: Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. marius.nicolae@engr.uconn.edu.
Background: Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances.
Results: This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/.
Conclusions: We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.