Limits...
XRate: a fast prototyping, training and annotation tool for phylo-grammars.

Klosterman PS, Uzilov AV, Bendaña YR, Bradley RK, Chao S, Kosiol C, Goldman N, Holmes I - BMC Bioinformatics (2006)

Bottom Line: The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source.We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures.Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioengineering, University of California, Berkeley CA, USA. petek@accesscom.com

ABSTRACT

Background: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists.

Results: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures.

Conclusion: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.

Show MeSH

Related in: MedlinePlus

An excerpt from an xgram-format grammar reproducing the protein secondary structure phylo-HMM of Goldman, Thorne and Jones. This excerpt shows only the transformation rules, and omits the alphabet and chain definitions. Three separate Markov chains for amino acid substitution are used (and are assumed to be defined elsewhere in the file): alpha_col denotes an amino acid in an alpha helix (annotated with character H), beta_col denotes an amino acid in a beta sheet (annotated with character E) and loop_col denotes an amino acid in a loop region (annotated with character L).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1622757&req=5

Figure 4: An excerpt from an xgram-format grammar reproducing the protein secondary structure phylo-HMM of Goldman, Thorne and Jones. This excerpt shows only the transformation rules, and omits the alphabet and chain definitions. Three separate Markov chains for amino acid substitution are used (and are assumed to be defined elsewhere in the file): alpha_col denotes an amino acid in an alpha helix (annotated with character H), beta_col denotes an amino acid in a beta sheet (annotated with character E) and loop_col denotes an amino acid in a loop region (annotated with character L).

Mentions: The PROT3 phylo-grammar has state labels for the three secondary structure classes of alpha-helix (H), beta-sheet (E) and loop (L). An excerpt of the grammar is shown (see figure 4).


XRate: a fast prototyping, training and annotation tool for phylo-grammars.

Klosterman PS, Uzilov AV, Bendaña YR, Bradley RK, Chao S, Kosiol C, Goldman N, Holmes I - BMC Bioinformatics (2006)

An excerpt from an xgram-format grammar reproducing the protein secondary structure phylo-HMM of Goldman, Thorne and Jones. This excerpt shows only the transformation rules, and omits the alphabet and chain definitions. Three separate Markov chains for amino acid substitution are used (and are assumed to be defined elsewhere in the file): alpha_col denotes an amino acid in an alpha helix (annotated with character H), beta_col denotes an amino acid in a beta sheet (annotated with character E) and loop_col denotes an amino acid in a loop region (annotated with character L).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1622757&req=5

Figure 4: An excerpt from an xgram-format grammar reproducing the protein secondary structure phylo-HMM of Goldman, Thorne and Jones. This excerpt shows only the transformation rules, and omits the alphabet and chain definitions. Three separate Markov chains for amino acid substitution are used (and are assumed to be defined elsewhere in the file): alpha_col denotes an amino acid in an alpha helix (annotated with character H), beta_col denotes an amino acid in a beta sheet (annotated with character E) and loop_col denotes an amino acid in a loop region (annotated with character L).
Mentions: The PROT3 phylo-grammar has state labels for the three secondary structure classes of alpha-helix (H), beta-sheet (E) and loop (L). An excerpt of the grammar is shown (see figure 4).

Bottom Line: The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source.We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures.Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioengineering, University of California, Berkeley CA, USA. petek@accesscom.com

ABSTRACT

Background: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists.

Results: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures.

Conclusion: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.

Show MeSH
Related in: MedlinePlus