Limits...
Cross-species gene-family fluctuations reveal the dynamics of horizontal transfers.

Grilli J, Romano M, Bassetti F, Cosentino Lagomarsino M - Nucleic Acids Res. (2014)

Bottom Line: To elucidate the links between these processes and the cross-species gene-family statistics, we perform a large-scale data analysis of the cross-species variability of gene-family abundance (the number of members of the family found on a given genome).Analysis and model, combined, show a quantitative link between cross-species family abundance statistics and horizontal transfer dynamics, which can be used to analyze genome 'flux'.Groups of families with different values of the abundance variability index correspond to genome sub-parts having different plasticity in terms of the level of horizontal exchange allowed by natural selection.

View Article: PubMed Central - PubMed

Affiliation: Dipartimento di Fisica e Astronomia "G. Galilei", Università di Padova, Via Marzolo 8, I-35131 Padova, Italy.

Show MeSH

Related in: MedlinePlus

Definition and main predictions of the model. (A) Sketch of the model. For each genome representative of a species (top), the model describes the dynamics of each gene family separately over a set of species, represented here (for each species) as the Venn diagram of the elements of the family. Following stochastic binary ‘collision’ events between species (middle), genes of a given family are exchanged between species, and, over the same time scale, can be duplicated or lost in each species. (B) In absence of gene duplications, the model predicts a Poisson distribution for the family abundance profile. Symbols (green diamonds) are the steady-state abundance histogram from simulation of 1000 species, with pd = 0 ph = 0.01 and initial abundance of 30 for all species. The dashed line is the analytical prediction (a Poisson distribution with average 30). (C) In presence of duplications, the dispersion of the steady-state abundance histogram increases. Symbols (red triangles) represent the steady-state abundance histogram from simulation of 1000 species, with pd = 0.009 ph = 0.001 and initial abundance of 30 for all species. The dashed line is the analytical prediction, a negative binomial distribution, valid in the limit of small pd and ph (see text). The inset shows that in both cases (same symbols as above) the analytical estimates (dashed lines) capture well the scaling of the variance of the abundance profile with the average abundance found in simulations (symbols).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4066789&req=5

Figure 1: Definition and main predictions of the model. (A) Sketch of the model. For each genome representative of a species (top), the model describes the dynamics of each gene family separately over a set of species, represented here (for each species) as the Venn diagram of the elements of the family. Following stochastic binary ‘collision’ events between species (middle), genes of a given family are exchanged between species, and, over the same time scale, can be duplicated or lost in each species. (B) In absence of gene duplications, the model predicts a Poisson distribution for the family abundance profile. Symbols (green diamonds) are the steady-state abundance histogram from simulation of 1000 species, with pd = 0 ph = 0.01 and initial abundance of 30 for all species. The dashed line is the analytical prediction (a Poisson distribution with average 30). (C) In presence of duplications, the dispersion of the steady-state abundance histogram increases. Symbols (red triangles) represent the steady-state abundance histogram from simulation of 1000 species, with pd = 0.009 ph = 0.001 and initial abundance of 30 for all species. The dashed line is the analytical prediction, a negative binomial distribution, valid in the limit of small pd and ph (see text). The inset shows that in both cases (same symbols as above) the analytical estimates (dashed lines) capture well the scaling of the variance of the abundance profile with the average abundance found in simulations (symbols).

Mentions: We discuss first the stochastic model (Figure 1A), since the results are useful to introduce the data analysis. The model describes a minimal dynamics of duplication/loss and inter-species HGT, and formulates a minimal informed expectation for the family abundance profile. The model only describes events that are visible on the representative genome of the species (because they are fixed), and recapitulates the action of selection in the rates pd, ph and pl. Importantly, when compared to data, the model only describes inter-species events, and thus the ‘duplication’ move is an intra-species family expansion that includes duplication as well as intra-species horizontal transfers. For simplicity, we will mainly refer to the move as duplication in the description of the model, and explicitly address the question when dealing with the data. Finally, we assume independence between gene families. Thanks to the latter condition, the gene abundance Vi of a single family across all i = 1...N species can be described separately from the others. Note however that, while matching the model with empirical data, the effective rates are allowed to vary from family to family, giving rise to the observed diversity between families, hence this simplifying assumption is not restrictive. Model time maps to evolutionary time in a complex way. In comparing with data, we will assume that observed species had the time to reach a steady state where the gene-family abundance distributions are roughly invariant (i.e. that the stationary abundance distribution is the empirically relevant quantity). The main observable is the family abundance profile, the distribution of the family population V. Using mean-field kinetic equations similar to Boltzmann equations (26), it is possible to estimate the stationary-state value of all moments of V. Processes of the type considered have already been applied in various interdisciplinary contexts (27–30).


Cross-species gene-family fluctuations reveal the dynamics of horizontal transfers.

Grilli J, Romano M, Bassetti F, Cosentino Lagomarsino M - Nucleic Acids Res. (2014)

Definition and main predictions of the model. (A) Sketch of the model. For each genome representative of a species (top), the model describes the dynamics of each gene family separately over a set of species, represented here (for each species) as the Venn diagram of the elements of the family. Following stochastic binary ‘collision’ events between species (middle), genes of a given family are exchanged between species, and, over the same time scale, can be duplicated or lost in each species. (B) In absence of gene duplications, the model predicts a Poisson distribution for the family abundance profile. Symbols (green diamonds) are the steady-state abundance histogram from simulation of 1000 species, with pd = 0 ph = 0.01 and initial abundance of 30 for all species. The dashed line is the analytical prediction (a Poisson distribution with average 30). (C) In presence of duplications, the dispersion of the steady-state abundance histogram increases. Symbols (red triangles) represent the steady-state abundance histogram from simulation of 1000 species, with pd = 0.009 ph = 0.001 and initial abundance of 30 for all species. The dashed line is the analytical prediction, a negative binomial distribution, valid in the limit of small pd and ph (see text). The inset shows that in both cases (same symbols as above) the analytical estimates (dashed lines) capture well the scaling of the variance of the abundance profile with the average abundance found in simulations (symbols).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4066789&req=5

Figure 1: Definition and main predictions of the model. (A) Sketch of the model. For each genome representative of a species (top), the model describes the dynamics of each gene family separately over a set of species, represented here (for each species) as the Venn diagram of the elements of the family. Following stochastic binary ‘collision’ events between species (middle), genes of a given family are exchanged between species, and, over the same time scale, can be duplicated or lost in each species. (B) In absence of gene duplications, the model predicts a Poisson distribution for the family abundance profile. Symbols (green diamonds) are the steady-state abundance histogram from simulation of 1000 species, with pd = 0 ph = 0.01 and initial abundance of 30 for all species. The dashed line is the analytical prediction (a Poisson distribution with average 30). (C) In presence of duplications, the dispersion of the steady-state abundance histogram increases. Symbols (red triangles) represent the steady-state abundance histogram from simulation of 1000 species, with pd = 0.009 ph = 0.001 and initial abundance of 30 for all species. The dashed line is the analytical prediction, a negative binomial distribution, valid in the limit of small pd and ph (see text). The inset shows that in both cases (same symbols as above) the analytical estimates (dashed lines) capture well the scaling of the variance of the abundance profile with the average abundance found in simulations (symbols).
Mentions: We discuss first the stochastic model (Figure 1A), since the results are useful to introduce the data analysis. The model describes a minimal dynamics of duplication/loss and inter-species HGT, and formulates a minimal informed expectation for the family abundance profile. The model only describes events that are visible on the representative genome of the species (because they are fixed), and recapitulates the action of selection in the rates pd, ph and pl. Importantly, when compared to data, the model only describes inter-species events, and thus the ‘duplication’ move is an intra-species family expansion that includes duplication as well as intra-species horizontal transfers. For simplicity, we will mainly refer to the move as duplication in the description of the model, and explicitly address the question when dealing with the data. Finally, we assume independence between gene families. Thanks to the latter condition, the gene abundance Vi of a single family across all i = 1...N species can be described separately from the others. Note however that, while matching the model with empirical data, the effective rates are allowed to vary from family to family, giving rise to the observed diversity between families, hence this simplifying assumption is not restrictive. Model time maps to evolutionary time in a complex way. In comparing with data, we will assume that observed species had the time to reach a steady state where the gene-family abundance distributions are roughly invariant (i.e. that the stationary abundance distribution is the empirically relevant quantity). The main observable is the family abundance profile, the distribution of the family population V. Using mean-field kinetic equations similar to Boltzmann equations (26), it is possible to estimate the stationary-state value of all moments of V. Processes of the type considered have already been applied in various interdisciplinary contexts (27–30).

Bottom Line: To elucidate the links between these processes and the cross-species gene-family statistics, we perform a large-scale data analysis of the cross-species variability of gene-family abundance (the number of members of the family found on a given genome).Analysis and model, combined, show a quantitative link between cross-species family abundance statistics and horizontal transfer dynamics, which can be used to analyze genome 'flux'.Groups of families with different values of the abundance variability index correspond to genome sub-parts having different plasticity in terms of the level of horizontal exchange allowed by natural selection.

View Article: PubMed Central - PubMed

Affiliation: Dipartimento di Fisica e Astronomia "G. Galilei", Università di Padova, Via Marzolo 8, I-35131 Padova, Italy.

Show MeSH
Related in: MedlinePlus