Limits...
Inference and characterization of horizontally transferred gene families using stochastic mapping.

Cohen O, Pupko T - Mol. Biol. Evol. (2009)

Bottom Line: Our novel methodology allows us to infer and quantify horizontal gene transfer (HGT) events.This enables us to rank various gene families and lineages according to their propensity to undergo gains and losses.Applying our methodology to 4,873 gene families shows that: 1) the novel mixture models describe the observed variability in gene-family content among microbes significantly better than previous models; 2) The stochastic mapping approach enables accurate inference of gain and loss events based on simulations; 3) At least 34% of the gene families analyzed are inferred to have experienced HGT at least once during their evolution; and 4) Gene families that were inferred to experience HGT are both enriched and depleted with respect to specific functional categories.

View Article: PubMed Central - PubMed

Affiliation: Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.

ABSTRACT
Macrogenomic events, in which genes are gained and lost, play a pivotal evolutionary role in microbial evolution. Nevertheless, probabilistic-evolutionary models describing such events and methods for their robust inference are considerably less developed than existing methodologies for analyzing site-specific sequence evolution. Here, we present a novel method for the inference of gains and losses of gene families. First, we develop probabilistic-evolutionary models describing the dynamics of gene-family content, which are more biologically realistic than previously suggested models. In our likelihood-based models, gains and losses are represented by transitions between presence and absence, given an underlying phylogeny. We employ a mixture-model approach in which we allow both the gain rate and the loss rate to vary among gene families. Second, we use these models together with the analytic implementation of stochastic mapping to infer branch-specific events. Our novel methodology allows us to infer and quantify horizontal gene transfer (HGT) events. This enables us to rank various gene families and lineages according to their propensity to undergo gains and losses. Applying our methodology to 4,873 gene families shows that: 1) the novel mixture models describe the observed variability in gene-family content among microbes significantly better than previous models; 2) The stochastic mapping approach enables accurate inference of gain and loss events based on simulations; 3) At least 34% of the gene families analyzed are inferred to have experienced HGT at least once during their evolution; and 4) Gene families that were inferred to experience HGT are both enriched and depleted with respect to specific functional categories.

Show MeSH

Related in: MedlinePlus

The empirical distributions of gain and loss rates. The empirical distribution of gain rates (red) and loss rates (blue) were computed for all 4,873 COG gene families. The bins denoted by the symbols “†” and “‡” represent the loss rate of the 63 gene families that are present in all species and the loss rate of the 288 gene families that are present only in the three eukaryotes, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2822287&req=5

fig2: The empirical distributions of gain and loss rates. The empirical distribution of gain rates (red) and loss rates (blue) were computed for all 4,873 COG gene families. The bins denoted by the symbols “†” and “‡” represent the loss rate of the 63 gene families that are present in all species and the loss rate of the 288 gene families that are present only in the three eukaryotes, respectively.

Mentions: Several probabilistic models in increasing order of complexity were implemented. The simplest (M1 + Γ) assumes a single rate matrix (one free parameter for the gain rate and one for the loss rate). An additional free parameter is used to model rate variability among gene families. On a data set of 4,873 gene families across 66 species, the maximal log likelihood obtained under this model was −91,962.8. This model, however, assumes that the ratio of gain and loss rates is the same across all gene families, although biological intuition suggests that some gene families tend to be either gained or lost significantly more than others. Indeed, the mixture model (MM1) that allows for the gain and loss ratio to vary across gene families fits the data significantly better than M1 + Γ, with maximal log-likelihood differences in the orders of dozens (table 1). The justification for MMs that allow for independent distributions for gain and loss rates is evident in figure 2, where the empirical distributions of gain and loss rates of the COG gene families are presented (the computations of the gain and loss rates are based on eq. 7 above).


Inference and characterization of horizontally transferred gene families using stochastic mapping.

Cohen O, Pupko T - Mol. Biol. Evol. (2009)

The empirical distributions of gain and loss rates. The empirical distribution of gain rates (red) and loss rates (blue) were computed for all 4,873 COG gene families. The bins denoted by the symbols “†” and “‡” represent the loss rate of the 63 gene families that are present in all species and the loss rate of the 288 gene families that are present only in the three eukaryotes, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2822287&req=5

fig2: The empirical distributions of gain and loss rates. The empirical distribution of gain rates (red) and loss rates (blue) were computed for all 4,873 COG gene families. The bins denoted by the symbols “†” and “‡” represent the loss rate of the 63 gene families that are present in all species and the loss rate of the 288 gene families that are present only in the three eukaryotes, respectively.
Mentions: Several probabilistic models in increasing order of complexity were implemented. The simplest (M1 + Γ) assumes a single rate matrix (one free parameter for the gain rate and one for the loss rate). An additional free parameter is used to model rate variability among gene families. On a data set of 4,873 gene families across 66 species, the maximal log likelihood obtained under this model was −91,962.8. This model, however, assumes that the ratio of gain and loss rates is the same across all gene families, although biological intuition suggests that some gene families tend to be either gained or lost significantly more than others. Indeed, the mixture model (MM1) that allows for the gain and loss ratio to vary across gene families fits the data significantly better than M1 + Γ, with maximal log-likelihood differences in the orders of dozens (table 1). The justification for MMs that allow for independent distributions for gain and loss rates is evident in figure 2, where the empirical distributions of gain and loss rates of the COG gene families are presented (the computations of the gain and loss rates are based on eq. 7 above).

Bottom Line: Our novel methodology allows us to infer and quantify horizontal gene transfer (HGT) events.This enables us to rank various gene families and lineages according to their propensity to undergo gains and losses.Applying our methodology to 4,873 gene families shows that: 1) the novel mixture models describe the observed variability in gene-family content among microbes significantly better than previous models; 2) The stochastic mapping approach enables accurate inference of gain and loss events based on simulations; 3) At least 34% of the gene families analyzed are inferred to have experienced HGT at least once during their evolution; and 4) Gene families that were inferred to experience HGT are both enriched and depleted with respect to specific functional categories.

View Article: PubMed Central - PubMed

Affiliation: Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.

ABSTRACT
Macrogenomic events, in which genes are gained and lost, play a pivotal evolutionary role in microbial evolution. Nevertheless, probabilistic-evolutionary models describing such events and methods for their robust inference are considerably less developed than existing methodologies for analyzing site-specific sequence evolution. Here, we present a novel method for the inference of gains and losses of gene families. First, we develop probabilistic-evolutionary models describing the dynamics of gene-family content, which are more biologically realistic than previously suggested models. In our likelihood-based models, gains and losses are represented by transitions between presence and absence, given an underlying phylogeny. We employ a mixture-model approach in which we allow both the gain rate and the loss rate to vary among gene families. Second, we use these models together with the analytic implementation of stochastic mapping to infer branch-specific events. Our novel methodology allows us to infer and quantify horizontal gene transfer (HGT) events. This enables us to rank various gene families and lineages according to their propensity to undergo gains and losses. Applying our methodology to 4,873 gene families shows that: 1) the novel mixture models describe the observed variability in gene-family content among microbes significantly better than previous models; 2) The stochastic mapping approach enables accurate inference of gain and loss events based on simulations; 3) At least 34% of the gene families analyzed are inferred to have experienced HGT at least once during their evolution; and 4) Gene families that were inferred to experience HGT are both enriched and depleted with respect to specific functional categories.

Show MeSH
Related in: MedlinePlus