Limits...
Networks of gene sharing among 329 proteobacterial genomes reveal differences in lateral gene transfer frequency at different phylogenetic depths.

Kloesges T, Popa O, Martin W, Dagan T - Mol. Biol. Evol. (2010)

Bottom Line: The network of shared proteins reveals modularity structure that does not correspond to current classification schemes.Using a minimal lateral network approach, we compared LGT rates at different phylogenetic depths.Hence, our results indicate that the rate of gene acquisition per protein family is similar at the level of species (by recombination) and at the level of classes (by LGT).

View Article: PubMed Central - PubMed

Affiliation: Institute of Botany III, Heinrich-Heine University Düsseldorf, Düsseldorf, Germany.

ABSTRACT
Lateral gene transfer (LGT) is an important mechanism of natural variation among prokaryotes. Over the full course of evolution, most or all of the genes resident in a given prokaryotic genome have been affected by LGT, yet the frequency of LGT can vary greatly across genes and across prokaryotic groups. The proteobacteria are among the most diverse of prokaryotic taxa. The prevalence of LGT in their genome evolution calls for the application of network-based methods instead of tree-based methods to investigate the relationships among these species. Here, we report networks that capture both vertical and horizontal components of evolutionary history among 1,207,272 proteins distributed across 329 sequenced proteobacterial genomes. The network of shared proteins reveals modularity structure that does not correspond to current classification schemes. On the basis of shared protein-coding genes, the five classes of proteobacteria fall into two main modules, one including the alpha-, delta-, and epsilonproteobacteria and the other including beta- and gammaproteobacteria. The first module is stable over different protein identity thresholds. The second shows more plasticity with regard to the sequence conservation of proteins sampled, with the gammaproteobacteria showing the most chameleon-like evolutionary characteristics within the present sample. Using a minimal lateral network approach, we compared LGT rates at different phylogenetic depths. In general, gene evolution by LGT within proteobacteria is very common. At least one LGT event was inferred to have occurred in at least 75% of the protein families. The average LGT rate at the species and class depth is about one LGT event per protein family, the rate doubling at the phylum level to an average of two LGT events per protein family. Hence, our results indicate that the rate of gene acquisition per protein family is similar at the level of species (by recombination) and at the level of classes (by LGT). The frequency of LGT per genome strongly depends on the species lifestyle, with endosymbionts showing far lower LGT frequencies than free-living species. Moreover, the nature of the transferred genes suggests that gene transfer in proteobacteria is frequently mediated by conjugation.

Show MeSH

Related in: MedlinePlus

Properties of the minimal LGT networks in phylum and class scales. Properties are shown for a randomly selected replicate. The coefficient of variation for the whole data was ∼2% (table 3). (A–D) Distribution of connectivity, the number of one-edge-distanced neighbors for each vertex, in the MLN. (E–H) Probability density function (PDF) of edge weight in the lateral component of the MLN.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3021791&req=5

fig5: Properties of the minimal LGT networks in phylum and class scales. Properties are shown for a randomly selected replicate. The coefficient of variation for the whole data was ∼2% (table 3). (A–D) Distribution of connectivity, the number of one-edge-distanced neighbors for each vertex, in the MLN. (E–H) Probability density function (PDF) of edge weight in the lateral component of the MLN.

Mentions: The MLN reconstructed for all proteobacteria using T30 protein families, with the RNA reference tree, and the LGT7 model contains in total 657 nodes, with 329 external nodes (—operational taxonomic units [OTUs]) and 328 internal nodes (hypothetical taxonomic units [HTUs]), connected by 51,762 lateral edges (fig. 4). For protein families that have undergone more than one LGT, the number of lateral edges in the MLN exceeds the minimum number of LGTs required to account for the gene distribution. Hence, to address LGT network properties for the MLN, 1,000 rMLN were generated in which the number of lateral edges and the minimum number of LGTs for genes transferred more than once correspond exactly. Lateral edge frequency and edge weight distribution are similar among the rMLN networks. The number of lateral edges in the rMLNs is 3,345 ± 73 (coefficient of variation = 2%) on average. The connectivity (number of lateral edges per node) ranges between 0 and (344–384) with a mean between 100 and 102 and median between 85 and 91 (table 3). The connectivity distribution is semi-exponential with very few nodes that are highly connected (fig. 5A). Bigger genomes are generally more highly connected than smaller genomes, yet genome size explains only 16% of the variation in connectivity (P < 0.01, using Spearman correlation; Zar 1999).


Networks of gene sharing among 329 proteobacterial genomes reveal differences in lateral gene transfer frequency at different phylogenetic depths.

Kloesges T, Popa O, Martin W, Dagan T - Mol. Biol. Evol. (2010)

Properties of the minimal LGT networks in phylum and class scales. Properties are shown for a randomly selected replicate. The coefficient of variation for the whole data was ∼2% (table 3). (A–D) Distribution of connectivity, the number of one-edge-distanced neighbors for each vertex, in the MLN. (E–H) Probability density function (PDF) of edge weight in the lateral component of the MLN.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3021791&req=5

fig5: Properties of the minimal LGT networks in phylum and class scales. Properties are shown for a randomly selected replicate. The coefficient of variation for the whole data was ∼2% (table 3). (A–D) Distribution of connectivity, the number of one-edge-distanced neighbors for each vertex, in the MLN. (E–H) Probability density function (PDF) of edge weight in the lateral component of the MLN.
Mentions: The MLN reconstructed for all proteobacteria using T30 protein families, with the RNA reference tree, and the LGT7 model contains in total 657 nodes, with 329 external nodes (—operational taxonomic units [OTUs]) and 328 internal nodes (hypothetical taxonomic units [HTUs]), connected by 51,762 lateral edges (fig. 4). For protein families that have undergone more than one LGT, the number of lateral edges in the MLN exceeds the minimum number of LGTs required to account for the gene distribution. Hence, to address LGT network properties for the MLN, 1,000 rMLN were generated in which the number of lateral edges and the minimum number of LGTs for genes transferred more than once correspond exactly. Lateral edge frequency and edge weight distribution are similar among the rMLN networks. The number of lateral edges in the rMLNs is 3,345 ± 73 (coefficient of variation = 2%) on average. The connectivity (number of lateral edges per node) ranges between 0 and (344–384) with a mean between 100 and 102 and median between 85 and 91 (table 3). The connectivity distribution is semi-exponential with very few nodes that are highly connected (fig. 5A). Bigger genomes are generally more highly connected than smaller genomes, yet genome size explains only 16% of the variation in connectivity (P < 0.01, using Spearman correlation; Zar 1999).

Bottom Line: The network of shared proteins reveals modularity structure that does not correspond to current classification schemes.Using a minimal lateral network approach, we compared LGT rates at different phylogenetic depths.Hence, our results indicate that the rate of gene acquisition per protein family is similar at the level of species (by recombination) and at the level of classes (by LGT).

View Article: PubMed Central - PubMed

Affiliation: Institute of Botany III, Heinrich-Heine University Düsseldorf, Düsseldorf, Germany.

ABSTRACT
Lateral gene transfer (LGT) is an important mechanism of natural variation among prokaryotes. Over the full course of evolution, most or all of the genes resident in a given prokaryotic genome have been affected by LGT, yet the frequency of LGT can vary greatly across genes and across prokaryotic groups. The proteobacteria are among the most diverse of prokaryotic taxa. The prevalence of LGT in their genome evolution calls for the application of network-based methods instead of tree-based methods to investigate the relationships among these species. Here, we report networks that capture both vertical and horizontal components of evolutionary history among 1,207,272 proteins distributed across 329 sequenced proteobacterial genomes. The network of shared proteins reveals modularity structure that does not correspond to current classification schemes. On the basis of shared protein-coding genes, the five classes of proteobacteria fall into two main modules, one including the alpha-, delta-, and epsilonproteobacteria and the other including beta- and gammaproteobacteria. The first module is stable over different protein identity thresholds. The second shows more plasticity with regard to the sequence conservation of proteins sampled, with the gammaproteobacteria showing the most chameleon-like evolutionary characteristics within the present sample. Using a minimal lateral network approach, we compared LGT rates at different phylogenetic depths. In general, gene evolution by LGT within proteobacteria is very common. At least one LGT event was inferred to have occurred in at least 75% of the protein families. The average LGT rate at the species and class depth is about one LGT event per protein family, the rate doubling at the phylum level to an average of two LGT events per protein family. Hence, our results indicate that the rate of gene acquisition per protein family is similar at the level of species (by recombination) and at the level of classes (by LGT). The frequency of LGT per genome strongly depends on the species lifestyle, with endosymbionts showing far lower LGT frequencies than free-living species. Moreover, the nature of the transferred genes suggests that gene transfer in proteobacteria is frequently mediated by conjugation.

Show MeSH
Related in: MedlinePlus