Limits...
INDELible: a flexible simulator of biological sequence evolution.

Fletcher W, Yang Z - Mol. Biol. Evol. (2009)

Bottom Line: Indels are simulated under several models of indel-length distribution.The program implements a rich repertoire of substitution models, including the general unrestricted model and nonstationary nonhomogeneous models of nucleotide substitution, mixture, and partition models that account for heterogeneity among sites, and codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches.With its many unique features, INDELible should be useful for evaluating the performance of many inference methods, including those for multiple sequence alignment, phylogenetic tree inference, and ancestral sequence, or genome reconstruction.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Evolution and Environment and Centre for Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London, UK.

ABSTRACT
Many methods exist for reconstructing phylogenies from molecular sequence data, but few phylogenies are known and can be used to check their efficacy. Simulation remains the most important approach to testing the accuracy and robustness of phylogenetic inference methods. However, current simulation programs are limited, especially concerning realistic models for simulating insertions and deletions. We implement a portable and flexible application, named INDELible, for generating nucleotide, amino acid and codon sequence data by simulating insertions and deletions (indels) as well as substitutions. Indels are simulated under several models of indel-length distribution. The program implements a rich repertoire of substitution models, including the general unrestricted model and nonstationary nonhomogeneous models of nucleotide substitution, mixture, and partition models that account for heterogeneity among sites, and codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches. With its many unique features, INDELible should be useful for evaluating the performance of many inference methods, including those for multiple sequence alignment, phylogenetic tree inference, and ancestral sequence, or genome reconstruction.

Show MeSH
The Lavalette distribution of indel length plotted for different values of the maximum indel length M, with a = 0.5 fixed (see eq. 5). Note that u can take integer values 1, 2, … , M only.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2712615&req=5

fig1: The Lavalette distribution of indel length plotted for different values of the maximum indel length M, with a = 0.5 fixed (see eq. 5). Note that u can take integer values 1, 2, … , M only.

Mentions: The third model is the Lavalette distribution, by which the probability for size u is(5)where a is a parameter and M is the maximum indel size (Lavalette 1996; Popescu et al. 1997; Popescu 2003). The proportionality constant is determined such that the probabilities sum to 1. This model was first proposed to explain the distribution of journal impact factors. It has two desirable features. First, the mean and variance are finite because of the maximum length M. Second, it can approximate the Zipf distribution arbitrarily well by the use of a large M. This is because, apart from the normalizing constants, the two distributions differ only by the factor ϕ = [M/(M − u + 1)]−a, which is ≈1 when M ≫ 1. Figure 1 shows the distribution for a few different values of M.


INDELible: a flexible simulator of biological sequence evolution.

Fletcher W, Yang Z - Mol. Biol. Evol. (2009)

The Lavalette distribution of indel length plotted for different values of the maximum indel length M, with a = 0.5 fixed (see eq. 5). Note that u can take integer values 1, 2, … , M only.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2712615&req=5

fig1: The Lavalette distribution of indel length plotted for different values of the maximum indel length M, with a = 0.5 fixed (see eq. 5). Note that u can take integer values 1, 2, … , M only.
Mentions: The third model is the Lavalette distribution, by which the probability for size u is(5)where a is a parameter and M is the maximum indel size (Lavalette 1996; Popescu et al. 1997; Popescu 2003). The proportionality constant is determined such that the probabilities sum to 1. This model was first proposed to explain the distribution of journal impact factors. It has two desirable features. First, the mean and variance are finite because of the maximum length M. Second, it can approximate the Zipf distribution arbitrarily well by the use of a large M. This is because, apart from the normalizing constants, the two distributions differ only by the factor ϕ = [M/(M − u + 1)]−a, which is ≈1 when M ≫ 1. Figure 1 shows the distribution for a few different values of M.

Bottom Line: Indels are simulated under several models of indel-length distribution.The program implements a rich repertoire of substitution models, including the general unrestricted model and nonstationary nonhomogeneous models of nucleotide substitution, mixture, and partition models that account for heterogeneity among sites, and codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches.With its many unique features, INDELible should be useful for evaluating the performance of many inference methods, including those for multiple sequence alignment, phylogenetic tree inference, and ancestral sequence, or genome reconstruction.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Evolution and Environment and Centre for Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, London, UK.

ABSTRACT
Many methods exist for reconstructing phylogenies from molecular sequence data, but few phylogenies are known and can be used to check their efficacy. Simulation remains the most important approach to testing the accuracy and robustness of phylogenetic inference methods. However, current simulation programs are limited, especially concerning realistic models for simulating insertions and deletions. We implement a portable and flexible application, named INDELible, for generating nucleotide, amino acid and codon sequence data by simulating insertions and deletions (indels) as well as substitutions. Indels are simulated under several models of indel-length distribution. The program implements a rich repertoire of substitution models, including the general unrestricted model and nonstationary nonhomogeneous models of nucleotide substitution, mixture, and partition models that account for heterogeneity among sites, and codon models that allow the nonsynonymous/synonymous substitution rate ratio to vary among sites and branches. With its many unique features, INDELible should be useful for evaluating the performance of many inference methods, including those for multiple sequence alignment, phylogenetic tree inference, and ancestral sequence, or genome reconstruction.

Show MeSH