Limits...
INDELible: a flexible simulator of biological sequence evolution.

Fletcher W, Yang Z - Mol. Biol. Evol. (2009)

Bottom Line: Indels are simulated under several models of indel-length distribution.The program implements a rich repertoire of substitution models, including the general unrestricted model and nonstationary nonhomogeneous models of nucleotide substitution, mixture, and partition models that account for heterogeneity among sites, and codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches.With its many unique features, INDELible should be useful for evaluating the performance of many inference methods, including those for multiple sequence alignment, phylogenetic tree inference, and ancestral sequence, or genome reconstruction.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Evolution and Environment and Centre for Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London, UK.

ABSTRACT
Many methods exist for reconstructing phylogenies from molecular sequence data, but few phylogenies are known and can be used to check their efficacy. Simulation remains the most important approach to testing the accuracy and robustness of phylogenetic inference methods. However, current simulation programs are limited, especially concerning realistic models for simulating insertions and deletions. We implement a portable and flexible application, named INDELible, for generating nucleotide, amino acid and codon sequence data by simulating insertions and deletions (indels) as well as substitutions. Indels are simulated under several models of indel-length distribution. The program implements a rich repertoire of substitution models, including the general unrestricted model and nonstationary nonhomogeneous models of nucleotide substitution, mixture, and partition models that account for heterogeneity among sites, and codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches. With its many unique features, INDELible should be useful for evaluating the performance of many inference methods, including those for multiple sequence alignment, phylogenetic tree inference, and ancestral sequence, or genome reconstruction.

Show MeSH
An example input file for INDELible. The substitution model has been set to HKY + Γ with a transition–transversion rate ratio of κ = 2, stationary base frequencies of 0.4 (T), 0.3 (C), 0.2 (A), and 0.1 (G), and continuous gamma rate variation with shape parameter α = 1. Insertions and deletions have both been set to have an instantaneous rate of 0.1 (relative to an average substitution rate of 1) and the same geometric length distribution with a mean length of 4. Then, the phylogeny with branch lengths is specified. In the simulations for the speed tests, a 32-taxa, symmetric, strictly bifurcating tree with all branch lengths equal to 0.1 is used instead. This simulation creates 100 replicate data sets each containing one partition with a randomly created root sequence of 1,000 bases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2712615&req=5

fig3: An example input file for INDELible. The substitution model has been set to HKY + Γ with a transition–transversion rate ratio of κ = 2, stationary base frequencies of 0.4 (T), 0.3 (C), 0.2 (A), and 0.1 (G), and continuous gamma rate variation with shape parameter α = 1. Insertions and deletions have both been set to have an instantaneous rate of 0.1 (relative to an average substitution rate of 1) and the same geometric length distribution with a mean length of 4. Then, the phylogeny with branch lengths is specified. In the simulations for the speed tests, a 32-taxa, symmetric, strictly bifurcating tree with all branch lengths equal to 0.1 is used instead. This simulation creates 100 replicate data sets each containing one partition with a randomly created root sequence of 1,000 bases.

Mentions: Speed comparison between DAWG and INDELible, with and without continuous gamma rate heterogeneity. The basic simulation model is specified by the settings in figure 3, whereas one factor is varied to see its impact in each plot. INDELible1 and INDELible2 refer to INDELible simulation under methods 1 and 2, respectively. The tests were carried out on a SunFire Opteron X4600M2 server running Linux.


INDELible: a flexible simulator of biological sequence evolution.

Fletcher W, Yang Z - Mol. Biol. Evol. (2009)

An example input file for INDELible. The substitution model has been set to HKY + Γ with a transition–transversion rate ratio of κ = 2, stationary base frequencies of 0.4 (T), 0.3 (C), 0.2 (A), and 0.1 (G), and continuous gamma rate variation with shape parameter α = 1. Insertions and deletions have both been set to have an instantaneous rate of 0.1 (relative to an average substitution rate of 1) and the same geometric length distribution with a mean length of 4. Then, the phylogeny with branch lengths is specified. In the simulations for the speed tests, a 32-taxa, symmetric, strictly bifurcating tree with all branch lengths equal to 0.1 is used instead. This simulation creates 100 replicate data sets each containing one partition with a randomly created root sequence of 1,000 bases.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2712615&req=5

fig3: An example input file for INDELible. The substitution model has been set to HKY + Γ with a transition–transversion rate ratio of κ = 2, stationary base frequencies of 0.4 (T), 0.3 (C), 0.2 (A), and 0.1 (G), and continuous gamma rate variation with shape parameter α = 1. Insertions and deletions have both been set to have an instantaneous rate of 0.1 (relative to an average substitution rate of 1) and the same geometric length distribution with a mean length of 4. Then, the phylogeny with branch lengths is specified. In the simulations for the speed tests, a 32-taxa, symmetric, strictly bifurcating tree with all branch lengths equal to 0.1 is used instead. This simulation creates 100 replicate data sets each containing one partition with a randomly created root sequence of 1,000 bases.
Mentions: Speed comparison between DAWG and INDELible, with and without continuous gamma rate heterogeneity. The basic simulation model is specified by the settings in figure 3, whereas one factor is varied to see its impact in each plot. INDELible1 and INDELible2 refer to INDELible simulation under methods 1 and 2, respectively. The tests were carried out on a SunFire Opteron X4600M2 server running Linux.

Bottom Line: Indels are simulated under several models of indel-length distribution.The program implements a rich repertoire of substitution models, including the general unrestricted model and nonstationary nonhomogeneous models of nucleotide substitution, mixture, and partition models that account for heterogeneity among sites, and codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches.With its many unique features, INDELible should be useful for evaluating the performance of many inference methods, including those for multiple sequence alignment, phylogenetic tree inference, and ancestral sequence, or genome reconstruction.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Evolution and Environment and Centre for Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London, UK.

ABSTRACT
Many methods exist for reconstructing phylogenies from molecular sequence data, but few phylogenies are known and can be used to check their efficacy. Simulation remains the most important approach to testing the accuracy and robustness of phylogenetic inference methods. However, current simulation programs are limited, especially concerning realistic models for simulating insertions and deletions. We implement a portable and flexible application, named INDELible, for generating nucleotide, amino acid and codon sequence data by simulating insertions and deletions (indels) as well as substitutions. Indels are simulated under several models of indel-length distribution. The program implements a rich repertoire of substitution models, including the general unrestricted model and nonstationary nonhomogeneous models of nucleotide substitution, mixture, and partition models that account for heterogeneity among sites, and codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches. With its many unique features, INDELible should be useful for evaluating the performance of many inference methods, including those for multiple sequence alignment, phylogenetic tree inference, and ancestral sequence, or genome reconstruction.

Show MeSH