Limits...
Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration.

Gavryushkina A, Welch D, Stadler T, Drummond AJ - PLoS Comput. Biol. (2014)

Bottom Line: We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates.We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process.Such modelling has many advantages as argued in the literature.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.

ABSTRACT
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after sampling, in particular we extend the birth-death skyline model [Stadler et al., 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates. We also show that sampled ancestor birth-death models where every sample comes from a different time point are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply our phylogenetic inference accounting for sampled ancestors to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in the literature. The sampler is available as an open-source BEAST2 package (https://github.com/CompEvol/sampled-ancestors).

Show MeSH
Divergence time estimates for the bear dataset.The estimates are obtained from the analyses with DPPDiv [30] (left bars with blue dots) and BEAST2 (right bars with red dots) implementations of the fossilised birth-death model, which give the same results. The bars are 95% HPD intervals and the dots are mean estimates. The node numbering follows the original analysis [30]: nodes 1 and 2 represent the most recent common ancestors of the bear clade and two outgroups (gray wolf and spotted seal). Node 3 is the most recent common ancestor of all living bear species and nodes 4-9 are the divergence times within the bear clade.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4263412&req=5

pcbi-1003919-g007: Divergence time estimates for the bear dataset.The estimates are obtained from the analyses with DPPDiv [30] (left bars with blue dots) and BEAST2 (right bars with red dots) implementations of the fossilised birth-death model, which give the same results. The bars are 95% HPD intervals and the dots are mean estimates. The node numbering follows the original analysis [30]: nodes 1 and 2 represent the most recent common ancestors of the bear clade and two outgroups (gray wolf and spotted seal). Node 3 is the most recent common ancestor of all living bear species and nodes 4-9 are the divergence times within the bear clade.

Mentions: We ran two analyses of the bear dataset originally analysed in [30] with BEAST2 and with the DPPDiv implementation by Heath et al. under the same model. The tree topology relating all living bear species and two outgroup species is fixed in the analyses and we estimate the divergence times and three tree model parameters: , , and since the sampling probability was fixed to one in the inference. The estimates are the same in both analyses as expected. The estimated divergence times are shown in Figure 7. The median estimate and 95% HPD interval for the net diversification rate, , were 0.027 per million years and [0.002, 0.058]; for the turnover rate, , 0.51 and [0.1, 0.9]; and for the sampling proportion, , 0.77 and [0.46, 0.98]. Most of the fossil samples were estimated to be direct ancestors of extant species or other fossil species, that is, the median estimate of the number of sampled ancestors was 22 with 95% HPD interval [17], [24].


Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration.

Gavryushkina A, Welch D, Stadler T, Drummond AJ - PLoS Comput. Biol. (2014)

Divergence time estimates for the bear dataset.The estimates are obtained from the analyses with DPPDiv [30] (left bars with blue dots) and BEAST2 (right bars with red dots) implementations of the fossilised birth-death model, which give the same results. The bars are 95% HPD intervals and the dots are mean estimates. The node numbering follows the original analysis [30]: nodes 1 and 2 represent the most recent common ancestors of the bear clade and two outgroups (gray wolf and spotted seal). Node 3 is the most recent common ancestor of all living bear species and nodes 4-9 are the divergence times within the bear clade.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4263412&req=5

pcbi-1003919-g007: Divergence time estimates for the bear dataset.The estimates are obtained from the analyses with DPPDiv [30] (left bars with blue dots) and BEAST2 (right bars with red dots) implementations of the fossilised birth-death model, which give the same results. The bars are 95% HPD intervals and the dots are mean estimates. The node numbering follows the original analysis [30]: nodes 1 and 2 represent the most recent common ancestors of the bear clade and two outgroups (gray wolf and spotted seal). Node 3 is the most recent common ancestor of all living bear species and nodes 4-9 are the divergence times within the bear clade.
Mentions: We ran two analyses of the bear dataset originally analysed in [30] with BEAST2 and with the DPPDiv implementation by Heath et al. under the same model. The tree topology relating all living bear species and two outgroup species is fixed in the analyses and we estimate the divergence times and three tree model parameters: , , and since the sampling probability was fixed to one in the inference. The estimates are the same in both analyses as expected. The estimated divergence times are shown in Figure 7. The median estimate and 95% HPD interval for the net diversification rate, , were 0.027 per million years and [0.002, 0.058]; for the turnover rate, , 0.51 and [0.1, 0.9]; and for the sampling proportion, , 0.77 and [0.46, 0.98]. Most of the fossil samples were estimated to be direct ancestors of extant species or other fossil species, that is, the median estimate of the number of sampled ancestors was 22 with 95% HPD interval [17], [24].

Bottom Line: We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates.We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process.Such modelling has many advantages as argued in the literature.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.

ABSTRACT
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after sampling, in particular we extend the birth-death skyline model [Stadler et al., 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates. We also show that sampled ancestor birth-death models where every sample comes from a different time point are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply our phylogenetic inference accounting for sampled ancestors to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in the literature. The sampler is available as an open-source BEAST2 package (https://github.com/CompEvol/sampled-ancestors).

Show MeSH