Limits...
Efficient sequential and parallel algorithms for planted motif search.

Nicolae M, Rajasekaran S - BMC Bioinformatics (2014)

Bottom Line: The PMS problem is NP-complete.We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances.PMS8 can solve instances not solved by any previous algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. marius.nicolae@engr.uconn.edu.

ABSTRACT

Background: Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances.

Results: This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/.

Conclusions: We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

Show MeSH

Related in: MedlinePlus

Proof of theorem 1, case 2. Proof of theorem 1, case 2: ni<di for all i, 1≤i≤3. The top 3 rows represent the input l-mers. The last row shows a common neighbor M. In any column, identical colors represents matches, different colors represent mismatches.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3924400&req=5

Figure 2: Proof of theorem 1, case 2. Proof of theorem 1, case 2: ni<di for all i, 1≤i≤3. The top 3 rows represent the input l-mers. The last row shows a common neighbor M. In any column, identical colors represents matches, different colors represent mismatches.

Mentions: Case 2) For all i,1≤i≤3 we have ni<di. We construct M as shown in Figure 2. For columns k of type N0,N2 and N3 we set M[ k]=T1[ k]. For columns of type N1 we set M[ k]=T2[ k]. For any i,1≤i≤3 the following applies. If ni+n4≤di then the Hamming distance between M and Ti is less than di regardless of what characters we choose for M in the columns of type N4. On the other hand, if ni+n4>di then M and Ti have to match in at least ni+n4-di columns of type N4. Thus, we pick max(0,ni+n4-di) columns of type N4 and for each such column k we set M[ k]=Ti[ k]. Now we prove that we actually have enough columns to make the above choices, in other words . This is equivalent to the following conditions being true:


Efficient sequential and parallel algorithms for planted motif search.

Nicolae M, Rajasekaran S - BMC Bioinformatics (2014)

Proof of theorem 1, case 2. Proof of theorem 1, case 2: ni<di for all i, 1≤i≤3. The top 3 rows represent the input l-mers. The last row shows a common neighbor M. In any column, identical colors represents matches, different colors represent mismatches.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3924400&req=5

Figure 2: Proof of theorem 1, case 2. Proof of theorem 1, case 2: ni<di for all i, 1≤i≤3. The top 3 rows represent the input l-mers. The last row shows a common neighbor M. In any column, identical colors represents matches, different colors represent mismatches.
Mentions: Case 2) For all i,1≤i≤3 we have ni<di. We construct M as shown in Figure 2. For columns k of type N0,N2 and N3 we set M[ k]=T1[ k]. For columns of type N1 we set M[ k]=T2[ k]. For any i,1≤i≤3 the following applies. If ni+n4≤di then the Hamming distance between M and Ti is less than di regardless of what characters we choose for M in the columns of type N4. On the other hand, if ni+n4>di then M and Ti have to match in at least ni+n4-di columns of type N4. Thus, we pick max(0,ni+n4-di) columns of type N4 and for each such column k we set M[ k]=Ti[ k]. Now we prove that we actually have enough columns to make the above choices, in other words . This is equivalent to the following conditions being true:

Bottom Line: The PMS problem is NP-complete.We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances.PMS8 can solve instances not solved by any previous algorithms.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. marius.nicolae@engr.uconn.edu.

ABSTRACT

Background: Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances.

Results: This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/.

Conclusions: We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.

Show MeSH
Related in: MedlinePlus