Limits...
A niched Pareto genetic algorithm for finding variable length regulatory motifs in DNA sequences

View Article: PubMed Central

ABSTRACT

The transcription factor binding sites also called as motifs are short, recurring patterns in DNA sequences that are presumed to have a biological function. Identification of the motifs from the promoter region of the genes is an important and unsolved problem specifically in the eukaryotic genomes. In this paper, we present a niched Pareto genetic algorithm to identify the regulatory motifs. This approach is based on the maximization of two objectives of the problem that is the motif length and the consensus similarity score. A long motif means it is less likely to be a false motif. The similarity score represents a motifs probability of conservation in a given set of sequences. Proposed method can find multiple, variable length motifs. In this method, we represented a candidate motif as a combination of length and starting position of the motif in each sequence of the co-regulated genes. This enables the algorithm to identify multiple motifs of variable length. We applied this approach on various data sets and the results show that it can find multiple motifs of variable length in co-regulated genes.

No MeSH data available.


Flowchart of the niched Pareto genetic algorithm
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3376862&req=5

Fig1: Flowchart of the niched Pareto genetic algorithm

Mentions: Figure 1 illustrates the working of niched Pareto GA. The key components of the algorithm are initialization, selection, crossovers & mutation, insertion and evaluation and finish. The initialization step deals with the representation of the motifs using a suitable encoding scheme and the initialization of the population. The selection step selects the suitable candidate motifs for the reproduction from the current population. The crossover and mutation step deals with the generation of new offsprings and adaption of the environmental influences. The fitness of newly generated offsprings is evaluated using an objective fitness function, and the fit offsprings are inserted in the population. During each generation of the evolutionary process, each member of the population is evaluated by the objective fitness function. The evolutionary process stops when the stopping criteria are satisfied.Fig. 1


A niched Pareto genetic algorithm for finding variable length regulatory motifs in DNA sequences
Flowchart of the niched Pareto genetic algorithm
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3376862&req=5

Fig1: Flowchart of the niched Pareto genetic algorithm
Mentions: Figure 1 illustrates the working of niched Pareto GA. The key components of the algorithm are initialization, selection, crossovers & mutation, insertion and evaluation and finish. The initialization step deals with the representation of the motifs using a suitable encoding scheme and the initialization of the population. The selection step selects the suitable candidate motifs for the reproduction from the current population. The crossover and mutation step deals with the generation of new offsprings and adaption of the environmental influences. The fitness of newly generated offsprings is evaluated using an objective fitness function, and the fit offsprings are inserted in the population. During each generation of the evolutionary process, each member of the population is evaluated by the objective fitness function. The evolutionary process stops when the stopping criteria are satisfied.Fig. 1

View Article: PubMed Central

ABSTRACT

The transcription factor binding sites also called as motifs are short, recurring patterns in DNA sequences that are presumed to have a biological function. Identification of the motifs from the promoter region of the genes is an important and unsolved problem specifically in the eukaryotic genomes. In this paper, we present a niched Pareto genetic algorithm to identify the regulatory motifs. This approach is based on the maximization of two objectives of the problem that is the motif length and the consensus similarity score. A long motif means it is less likely to be a false motif. The similarity score represents a motifs probability of conservation in a given set of sequences. Proposed method can find multiple, variable length motifs. In this method, we represented a candidate motif as a combination of length and starting position of the motif in each sequence of the co-regulated genes. This enables the algorithm to identify multiple motifs of variable length. We applied this approach on various data sets and the results show that it can find multiple motifs of variable length in co-regulated genes.

No MeSH data available.