Limits...
RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences.

Rubino F, Attimonelli M - BMC Bioinformatics (2009)

Bottom Line: The use of REB in PPNEMA updating, the PPNEMA "characterize your sequence" option clearly demonstrates the power of the method.Using REB can also rapidly solve any other bioinformatics problem, where the addition of a new sequence to a pre-existing cluster is required.The statistical tests carried out here show the powerful flexibility of the method.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular Biology E, Quagliariello - Bari, 70126, Italy. rubino.francesco@gmail.com

ABSTRACT

Background: One of the most frequent uses of bioinformatics tools concerns functional characterization of a newly produced nucleotide sequence (a query sequence) by applying Blast or FASTA against a set of sequences (the subject sequences). However, in some specific contexts, it is useful to compare the query sequence against a cluster such as a MultiAlignment (MA). We present here the RegExpBlasting (REB) algorithm, which compares an unclassified sequence with a dataset of patterns defined by application of Regular Expression rules to a given-as-input MA datasets. The REB algorithm workflow consists in i. the definition of a dataset of multialignments ii. the association of each MA to a pattern, defined by application of regular expression rules; iii. automatic characterization of a submitted biosequence according to the function of the sequences described by the pattern best matching the query sequence.

Results: An application of this algorithm is used in the "characterize your sequence" tool available in the PPNEMA resource. PPNEMA is a resource of Ribosomal Cistron sequences from various species, grouped according to nematode genera. It allows the retrieval of plant nematode multialigned sequences or the classification of new nematode rDNA sequences by applying REB. The same algorithm also supports automatic updating of the PPNEMA database. The present paper gives examples of the use of REB within PPNEMA.

Conclusion: The use of REB in PPNEMA updating, the PPNEMA "characterize your sequence" option clearly demonstrates the power of the method. Using REB can also rapidly solve any other bioinformatics problem, where the addition of a new sequence to a pre-existing cluster is required. The statistical tests carried out here show the powerful flexibility of the method.

Show MeSH
(a) Selectivity (TP/(TP+FN)) trend and (b) False positives (FP)/(True Positives (TP) + False Positives (FP)) versus step values at various window lengths in the RegExpBlasting application to PPNEMA updating. Window length 20 clearly shows best performance, independent of step value.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2697652&req=5

Figure 5: (a) Selectivity (TP/(TP+FN)) trend and (b) False positives (FP)/(True Positives (TP) + False Positives (FP)) versus step values at various window lengths in the RegExpBlasting application to PPNEMA updating. Window length 20 clearly shows best performance, independent of step value.

Mentions: Figure 5a reports the sensitivity of the method (TP/AP) versus step s for three different window values. Clearly, within the same w value, step s has no influence. The shorter the window, the higher the sensitivity, but also the higher the number of FP (Figure 5b), although there is a minimal but still detectable variation when the step changes. Thus, once minimal length (minl = 60) and window length (w = 20) had been fixed, we observed the effect of various cut-off values and steps on selectivity and sensitivity. Figure 6a shows that i) step changes have no influence on the number of positives results, but the cut-off value plays a determinant role and, when the cf is higher than 0.85, the number of TP is drastically reduced. Thus, although the number of FP could be reduced by increasing the cf value, it is more convenient to fix it at 0.70 and work with step values not higher than 15.


RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences.

Rubino F, Attimonelli M - BMC Bioinformatics (2009)

(a) Selectivity (TP/(TP+FN)) trend and (b) False positives (FP)/(True Positives (TP) + False Positives (FP)) versus step values at various window lengths in the RegExpBlasting application to PPNEMA updating. Window length 20 clearly shows best performance, independent of step value.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2697652&req=5

Figure 5: (a) Selectivity (TP/(TP+FN)) trend and (b) False positives (FP)/(True Positives (TP) + False Positives (FP)) versus step values at various window lengths in the RegExpBlasting application to PPNEMA updating. Window length 20 clearly shows best performance, independent of step value.
Mentions: Figure 5a reports the sensitivity of the method (TP/AP) versus step s for three different window values. Clearly, within the same w value, step s has no influence. The shorter the window, the higher the sensitivity, but also the higher the number of FP (Figure 5b), although there is a minimal but still detectable variation when the step changes. Thus, once minimal length (minl = 60) and window length (w = 20) had been fixed, we observed the effect of various cut-off values and steps on selectivity and sensitivity. Figure 6a shows that i) step changes have no influence on the number of positives results, but the cut-off value plays a determinant role and, when the cf is higher than 0.85, the number of TP is drastically reduced. Thus, although the number of FP could be reduced by increasing the cf value, it is more convenient to fix it at 0.70 and work with step values not higher than 15.

Bottom Line: The use of REB in PPNEMA updating, the PPNEMA "characterize your sequence" option clearly demonstrates the power of the method.Using REB can also rapidly solve any other bioinformatics problem, where the addition of a new sequence to a pre-existing cluster is required.The statistical tests carried out here show the powerful flexibility of the method.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular Biology E, Quagliariello - Bari, 70126, Italy. rubino.francesco@gmail.com

ABSTRACT

Background: One of the most frequent uses of bioinformatics tools concerns functional characterization of a newly produced nucleotide sequence (a query sequence) by applying Blast or FASTA against a set of sequences (the subject sequences). However, in some specific contexts, it is useful to compare the query sequence against a cluster such as a MultiAlignment (MA). We present here the RegExpBlasting (REB) algorithm, which compares an unclassified sequence with a dataset of patterns defined by application of Regular Expression rules to a given-as-input MA datasets. The REB algorithm workflow consists in i. the definition of a dataset of multialignments ii. the association of each MA to a pattern, defined by application of regular expression rules; iii. automatic characterization of a submitted biosequence according to the function of the sequences described by the pattern best matching the query sequence.

Results: An application of this algorithm is used in the "characterize your sequence" tool available in the PPNEMA resource. PPNEMA is a resource of Ribosomal Cistron sequences from various species, grouped according to nematode genera. It allows the retrieval of plant nematode multialigned sequences or the classification of new nematode rDNA sequences by applying REB. The same algorithm also supports automatic updating of the PPNEMA database. The present paper gives examples of the use of REB within PPNEMA.

Conclusion: The use of REB in PPNEMA updating, the PPNEMA "characterize your sequence" option clearly demonstrates the power of the method. Using REB can also rapidly solve any other bioinformatics problem, where the addition of a new sequence to a pre-existing cluster is required. The statistical tests carried out here show the powerful flexibility of the method.

Show MeSH