Limits...
Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations.

Corander J, Marttinen P, Sirén J, Tang J - BMC Bioinformatics (2008)

Bottom Line: Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem.In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software.The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Mathematics, Fänriksgatan 3B, Abo Akademi University, Abo, Finland. jukka.corander@abo.fi

ABSTRACT

Background: During the most recent decade many Bayesian statistical models and software for answering questions related to the genetic structure underlying population samples have appeared in the scientific literature. Most of these methods utilize molecular markers for the inferences, while some are also capable of handling DNA sequence data. In a number of earlier works, we have introduced an array of statistical methods for population genetic inference that are implemented in the software BAPS. However, the complexity of biological problems related to genetic structure analysis keeps increasing such that in many cases the current methods may provide either inappropriate or insufficient solutions.

Results: We discuss the necessity of enhancing the statistical approaches to face the challenges posed by the ever-increasing amounts of molecular data generated by scientists over a wide range of research areas and introduce an array of new statistical tools implemented in the most recent version of BAPS. With these methods it is possible, e.g., to fit genetic mixture models using user-specified numbers of clusters and to estimate levels of admixture under a genetic linkage model. Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem. In general, we have improved further the computational characteristics of the algorithms behind the methods implemented in BAPS facilitating the analyses of large and complex datasets. In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software.

Conclusion: The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations. Their implementations in the BAPS software are designed to meet the increasing need for analyzing large-scale population genetics data. The software is freely downloadable for Windows, Linux and Mac OS X systems at http://web.abo.fi/fak/mnf//mate/jc/software/baps.html.

Show MeSH
Posterior probabilities of the origins of alleles for an admixed individual from the population labelled C/D, who was assigned into the cluster with green label in the genetic mixture analysis. The posterior probabilities are only shown for the alleles where the loge Bayes factor for an ancestry deviating from the origin labelled green exceeds the default threshold (2.30). For simplicity of the visualization, the genotype data are assumed ordered, such that the lower and upper panels correspond to chromosome 1 and 2, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2629778&req=5

Figure 1: Posterior probabilities of the origins of alleles for an admixed individual from the population labelled C/D, who was assigned into the cluster with green label in the genetic mixture analysis. The posterior probabilities are only shown for the alleles where the loge Bayes factor for an ancestry deviating from the origin labelled green exceeds the default threshold (2.30). For simplicity of the visualization, the genotype data are assumed ordered, such that the lower and upper panels correspond to chromosome 1 and 2, respectively.

Mentions: In order to complement the picture painted by an admixture analysis about the past events in a population, we introduce here a simple statistical tool which can be exploited to discover alleles with a deviating ancestry, given the results for an earlier estimated genetic mixture model. Our approach is based on the use of Bayes factors combined with predictive likelihoods to compare the evidence for alternative ancestral sources at each marker locus observed for a particular individual (examples are provided in Figures 1 and 2). In the implementation of this tool it is possible for a user to determine the level of conclusive evidence for deviating ancestry, while the default threshold is chosen according to the categories advocated in the theoretical literature [26]. We note that as the tool treats the data from all loci separately, it serves primarily as an exploratory method. In particular, for studies of bacterial populations based on DNA sequences from multiple genes, it is possible to perform more detailed analyses, for instance, using the model introduced by [29].


Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations.

Corander J, Marttinen P, Sirén J, Tang J - BMC Bioinformatics (2008)

Posterior probabilities of the origins of alleles for an admixed individual from the population labelled C/D, who was assigned into the cluster with green label in the genetic mixture analysis. The posterior probabilities are only shown for the alleles where the loge Bayes factor for an ancestry deviating from the origin labelled green exceeds the default threshold (2.30). For simplicity of the visualization, the genotype data are assumed ordered, such that the lower and upper panels correspond to chromosome 1 and 2, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2629778&req=5

Figure 1: Posterior probabilities of the origins of alleles for an admixed individual from the population labelled C/D, who was assigned into the cluster with green label in the genetic mixture analysis. The posterior probabilities are only shown for the alleles where the loge Bayes factor for an ancestry deviating from the origin labelled green exceeds the default threshold (2.30). For simplicity of the visualization, the genotype data are assumed ordered, such that the lower and upper panels correspond to chromosome 1 and 2, respectively.
Mentions: In order to complement the picture painted by an admixture analysis about the past events in a population, we introduce here a simple statistical tool which can be exploited to discover alleles with a deviating ancestry, given the results for an earlier estimated genetic mixture model. Our approach is based on the use of Bayes factors combined with predictive likelihoods to compare the evidence for alternative ancestral sources at each marker locus observed for a particular individual (examples are provided in Figures 1 and 2). In the implementation of this tool it is possible for a user to determine the level of conclusive evidence for deviating ancestry, while the default threshold is chosen according to the categories advocated in the theoretical literature [26]. We note that as the tool treats the data from all loci separately, it serves primarily as an exploratory method. In particular, for studies of bacterial populations based on DNA sequences from multiple genes, it is possible to perform more detailed analyses, for instance, using the model introduced by [29].

Bottom Line: Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem.In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software.The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Mathematics, Fänriksgatan 3B, Abo Akademi University, Abo, Finland. jukka.corander@abo.fi

ABSTRACT

Background: During the most recent decade many Bayesian statistical models and software for answering questions related to the genetic structure underlying population samples have appeared in the scientific literature. Most of these methods utilize molecular markers for the inferences, while some are also capable of handling DNA sequence data. In a number of earlier works, we have introduced an array of statistical methods for population genetic inference that are implemented in the software BAPS. However, the complexity of biological problems related to genetic structure analysis keeps increasing such that in many cases the current methods may provide either inappropriate or insufficient solutions.

Results: We discuss the necessity of enhancing the statistical approaches to face the challenges posed by the ever-increasing amounts of molecular data generated by scientists over a wide range of research areas and introduce an array of new statistical tools implemented in the most recent version of BAPS. With these methods it is possible, e.g., to fit genetic mixture models using user-specified numbers of clusters and to estimate levels of admixture under a genetic linkage model. Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem. In general, we have improved further the computational characteristics of the algorithms behind the methods implemented in BAPS facilitating the analyses of large and complex datasets. In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software.

Conclusion: The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations. Their implementations in the BAPS software are designed to meet the increasing need for analyzing large-scale population genetics data. The software is freely downloadable for Windows, Linux and Mac OS X systems at http://web.abo.fi/fak/mnf//mate/jc/software/baps.html.

Show MeSH