Limits...
NestedMICA as an ab initio protein motif discovery tool.

Doğruel M, Down TA, Hubbard TJ - BMC Bioinformatics (2008)

Bottom Line: It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest.In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME.NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences.

View Article: PubMed Central - HTML - PubMed

Affiliation: Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK. md5@sanger.ac.uk

ABSTRACT

Background: Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals, and compared its performance with another popular protein motif finder, MEME. NestedMICA, an open source protein motif discovery tool written in Java, is driven by a Monte Carlo technique called Nested Sampling. It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest. In order to assess NestedMICA as a protein motif finder, we have tested it on synthetic datasets produced by spiking instances of known motifs into a randomly selected set of protein sequences. NestedMICA was also tested using a biologically-authentic test set, where we evaluated its performance with respect to varying sequence length.

Results: Generally NestedMICA recovered most of the short (3-9 amino acid long) test protein motifs spiked into a test set of sequences at different frequencies. We showed that it can be used to find multiple motifs at the same time, too. In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME.

Conclusion: NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences.

Availability: NestedMICA is available under the Lesser GPL open-source license from: http://www.sanger.ac.uk/Software/analysis/nmica/

Show MeSH

Related in: MedlinePlus

Motifs recovered by NestedMICA and MEME in the single-motif spiking tests, for motif set 1. Motifs in this set were obtained from several Pfam domain entries. For each original test motif used in the motif spiking tests, the 3 tested abundance rates are shown in the next column. For motifs recovered by NestedMICA (fourth column) and MEME (sixth column) the cartesian distance to the original test motif and the MCC value obtained when the motif is used for sequence scanning are shown. For comparison purposes, the MCC values of the original test motifs are shown as well. In NestedMICA protein sequence logos, hydrophobic residues are represented in orange, polar and hydrophilic ones in green, acidic ones in pink, and finally basic amino acids are depicted in blue.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2267705&req=5

Figure 2: Motifs recovered by NestedMICA and MEME in the single-motif spiking tests, for motif set 1. Motifs in this set were obtained from several Pfam domain entries. For each original test motif used in the motif spiking tests, the 3 tested abundance rates are shown in the next column. For motifs recovered by NestedMICA (fourth column) and MEME (sixth column) the cartesian distance to the original test motif and the MCC value obtained when the motif is used for sequence scanning are shown. For comparison purposes, the MCC values of the original test motifs are shown as well. In NestedMICA protein sequence logos, hydrophobic residues are represented in orange, polar and hydrophilic ones in green, acidic ones in pink, and finally basic amino acids are depicted in blue.

Mentions: We used 3 different motif sets each containing 7 motifs of lengths ranging from 3 to 9 amino acids. Instances of each of the motifs (see Figure 2 for motif set 1, and Additional files 1 and 2 for motif sets 2 and 3, respectively) were separately spiked into the cytoplasmic dataset (see Methods). The 21 motifs were inserted into the sequences at different frequencies (10, 20 and 30%), allowing us to test motif discovery software under different conditions of motif abundance. Generally, performance for both NestedMICA and MEME increased with increasing abundance rate of the inserted motif.


NestedMICA as an ab initio protein motif discovery tool.

Doğruel M, Down TA, Hubbard TJ - BMC Bioinformatics (2008)

Motifs recovered by NestedMICA and MEME in the single-motif spiking tests, for motif set 1. Motifs in this set were obtained from several Pfam domain entries. For each original test motif used in the motif spiking tests, the 3 tested abundance rates are shown in the next column. For motifs recovered by NestedMICA (fourth column) and MEME (sixth column) the cartesian distance to the original test motif and the MCC value obtained when the motif is used for sequence scanning are shown. For comparison purposes, the MCC values of the original test motifs are shown as well. In NestedMICA protein sequence logos, hydrophobic residues are represented in orange, polar and hydrophilic ones in green, acidic ones in pink, and finally basic amino acids are depicted in blue.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2267705&req=5

Figure 2: Motifs recovered by NestedMICA and MEME in the single-motif spiking tests, for motif set 1. Motifs in this set were obtained from several Pfam domain entries. For each original test motif used in the motif spiking tests, the 3 tested abundance rates are shown in the next column. For motifs recovered by NestedMICA (fourth column) and MEME (sixth column) the cartesian distance to the original test motif and the MCC value obtained when the motif is used for sequence scanning are shown. For comparison purposes, the MCC values of the original test motifs are shown as well. In NestedMICA protein sequence logos, hydrophobic residues are represented in orange, polar and hydrophilic ones in green, acidic ones in pink, and finally basic amino acids are depicted in blue.
Mentions: We used 3 different motif sets each containing 7 motifs of lengths ranging from 3 to 9 amino acids. Instances of each of the motifs (see Figure 2 for motif set 1, and Additional files 1 and 2 for motif sets 2 and 3, respectively) were separately spiked into the cytoplasmic dataset (see Methods). The 21 motifs were inserted into the sequences at different frequencies (10, 20 and 30%), allowing us to test motif discovery software under different conditions of motif abundance. Generally, performance for both NestedMICA and MEME increased with increasing abundance rate of the inserted motif.

Bottom Line: It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest.In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME.NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences.

View Article: PubMed Central - HTML - PubMed

Affiliation: Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1HH, UK. md5@sanger.ac.uk

ABSTRACT

Background: Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals, and compared its performance with another popular protein motif finder, MEME. NestedMICA, an open source protein motif discovery tool written in Java, is driven by a Monte Carlo technique called Nested Sampling. It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest. In order to assess NestedMICA as a protein motif finder, we have tested it on synthetic datasets produced by spiking instances of known motifs into a randomly selected set of protein sequences. NestedMICA was also tested using a biologically-authentic test set, where we evaluated its performance with respect to varying sequence length.

Results: Generally NestedMICA recovered most of the short (3-9 amino acid long) test protein motifs spiked into a test set of sequences at different frequencies. We showed that it can be used to find multiple motifs at the same time, too. In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME.

Conclusion: NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences.

Availability: NestedMICA is available under the Lesser GPL open-source license from: http://www.sanger.ac.uk/Software/analysis/nmica/

Show MeSH
Related in: MedlinePlus