Limits...
Identifying currents in the gene pool for bacterial populations using an integrative approach.

Tang J, Hanage WP, Fraser C, Corander J - PLoS Comput. Biol. (2009)

Bottom Line: However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place.Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations.Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland. jing.tang@helsinki.fi

ABSTRACT
The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html.

Show MeSH
Testing gene flow structure accuracy for  and .Graph similarity was measured in the Hamming distance coded in a gray-scale image. Cells with the paper white color represent the scenarios where the partition and the gene flow structure in Figure 2 are both correctly identified by BAPS.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2713424&req=5

pcbi-1000455-g004: Testing gene flow structure accuracy for and .Graph similarity was measured in the Hamming distance coded in a gray-scale image. Cells with the paper white color represent the scenarios where the partition and the gene flow structure in Figure 2 are both correctly identified by BAPS.

Mentions: We may look further into the gene flow graph prediction only if the genetic structure (i.e. the true partition) is correctly identified. We used Hamming distance as a measure of gene flow structure accuracy and the result is shown in Figure 4. The gene flow graph structure can be satisfactorily discovered when and . However, a negative correlation between and was also noticeable. This result suggests that if admixture affects a population through a small proportion of strains, then the chances of its correct estimation by BAPS 5.2 are high. In contrast, admixture that occurred at most of the strains is more likely to be ascribed to variation arising within the population by mutation. These observations are in harmony with the investigation of the effect recombination intensity on the emergence of distinct populations for a bacterial species in [3]. Extensive levels of recombination will act as a cohesive force keeping populations together as a large gene pool, which consequently prevents the statistical detection of the recombination in terms of such a population genetic model as investigated here. This is entirely reasonable, because any substantial genetic population boundaries will not exist under such circumstances, and consequently, recombinations over population boundaries are not meaningfully defined, let alone detectable by a statistical model. Moreover, if certain parts of the data are too weak for reliable admixture inferences due to very small population cardinalities in the genetic mixture estimate, it is possible to leave the admixture coefficients undetermined for them using the option available in BAPS, as discussed in [14]. The extensive simulation study performed by [30], showed that the BAPS inferences about the genetic structure were generally sensible from a phylogenetic perspective, even in the presence recombination events, provided that the data are at least reasonably informative. With very weakly informative molecular data, it cannot be expected that any detailed statistical population genetic model would provide highly accurate estimates of the population characteristics.


Identifying currents in the gene pool for bacterial populations using an integrative approach.

Tang J, Hanage WP, Fraser C, Corander J - PLoS Comput. Biol. (2009)

Testing gene flow structure accuracy for  and .Graph similarity was measured in the Hamming distance coded in a gray-scale image. Cells with the paper white color represent the scenarios where the partition and the gene flow structure in Figure 2 are both correctly identified by BAPS.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2713424&req=5

pcbi-1000455-g004: Testing gene flow structure accuracy for and .Graph similarity was measured in the Hamming distance coded in a gray-scale image. Cells with the paper white color represent the scenarios where the partition and the gene flow structure in Figure 2 are both correctly identified by BAPS.
Mentions: We may look further into the gene flow graph prediction only if the genetic structure (i.e. the true partition) is correctly identified. We used Hamming distance as a measure of gene flow structure accuracy and the result is shown in Figure 4. The gene flow graph structure can be satisfactorily discovered when and . However, a negative correlation between and was also noticeable. This result suggests that if admixture affects a population through a small proportion of strains, then the chances of its correct estimation by BAPS 5.2 are high. In contrast, admixture that occurred at most of the strains is more likely to be ascribed to variation arising within the population by mutation. These observations are in harmony with the investigation of the effect recombination intensity on the emergence of distinct populations for a bacterial species in [3]. Extensive levels of recombination will act as a cohesive force keeping populations together as a large gene pool, which consequently prevents the statistical detection of the recombination in terms of such a population genetic model as investigated here. This is entirely reasonable, because any substantial genetic population boundaries will not exist under such circumstances, and consequently, recombinations over population boundaries are not meaningfully defined, let alone detectable by a statistical model. Moreover, if certain parts of the data are too weak for reliable admixture inferences due to very small population cardinalities in the genetic mixture estimate, it is possible to leave the admixture coefficients undetermined for them using the option available in BAPS, as discussed in [14]. The extensive simulation study performed by [30], showed that the BAPS inferences about the genetic structure were generally sensible from a phylogenetic perspective, even in the presence recombination events, provided that the data are at least reasonably informative. With very weakly informative molecular data, it cannot be expected that any detailed statistical population genetic model would provide highly accurate estimates of the population characteristics.

Bottom Line: However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place.Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations.Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis.

View Article: PubMed Central - PubMed

Affiliation: Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland. jing.tang@helsinki.fi

ABSTRACT
The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html.

Show MeSH