Limits...
XRate: a fast prototyping, training and annotation tool for phylo-grammars.

Klosterman PS, Uzilov AV, Bendaña YR, Bradley RK, Chao S, Kosiol C, Goldman N, Holmes I - BMC Bioinformatics (2006)

Bottom Line: The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source.We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures.Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioengineering, University of California, Berkeley CA, USA. petek@accesscom.com

ABSTRACT

Background: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists.

Results: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures.

Conclusion: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.

Show MeSH
Example Stockholm-format input file for the protein secondary structure grammar (see figure 4). The alignment is of the pancreatic hormone family.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1622757&req=5

Figure 5: Example Stockholm-format input file for the protein secondary structure grammar (see figure 4). The alignment is of the pancreatic hormone family.

Mentions: An example of usage for this grammar follows. We also show an alignment from HOMSTRAD, too small to predict secondary structure with any confidence, but useful for illustrative purposes (see figure 5). Suppose we want to: (1) read in this alignment from a file named ' pp. stk'; (2) load a point substitution matrix from a file named 'dart/data/prot.eg' (this is an amino-acid matrix distributed with xrate; the filename path assumes that the DART package was downloaded to the current working directory); (3) use the above point substitution matrix to estimate a phylo-genetic tree (by neighbor-joining followed by EM on the branch lengths); (4) load the PROT3 model from a file named 'dart/data/prot3.eg' (again, this is distributed with xrate); and (5) use the PROT3 model to predict secondary structure classes for this protein family, printing the annotated alignment to the standard output. The following command-line syntax achieves this:


XRate: a fast prototyping, training and annotation tool for phylo-grammars.

Klosterman PS, Uzilov AV, Bendaña YR, Bradley RK, Chao S, Kosiol C, Goldman N, Holmes I - BMC Bioinformatics (2006)

Example Stockholm-format input file for the protein secondary structure grammar (see figure 4). The alignment is of the pancreatic hormone family.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1622757&req=5

Figure 5: Example Stockholm-format input file for the protein secondary structure grammar (see figure 4). The alignment is of the pancreatic hormone family.
Mentions: An example of usage for this grammar follows. We also show an alignment from HOMSTRAD, too small to predict secondary structure with any confidence, but useful for illustrative purposes (see figure 5). Suppose we want to: (1) read in this alignment from a file named ' pp. stk'; (2) load a point substitution matrix from a file named 'dart/data/prot.eg' (this is an amino-acid matrix distributed with xrate; the filename path assumes that the DART package was downloaded to the current working directory); (3) use the above point substitution matrix to estimate a phylo-genetic tree (by neighbor-joining followed by EM on the branch lengths); (4) load the PROT3 model from a file named 'dart/data/prot3.eg' (again, this is distributed with xrate); and (5) use the PROT3 model to predict secondary structure classes for this protein family, printing the annotated alignment to the standard output. The following command-line syntax achieves this:

Bottom Line: The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source.We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures.Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioengineering, University of California, Berkeley CA, USA. petek@accesscom.com

ABSTRACT

Background: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists.

Results: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures.

Conclusion: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.

Show MeSH