Limits...
Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration.

Gavryushkina A, Welch D, Stadler T, Drummond AJ - PLoS Comput. Biol. (2014)

Bottom Line: We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates.We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process.Such modelling has many advantages as argued in the literature.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.

ABSTRACT
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after sampling, in particular we extend the birth-death skyline model [Stadler et al., 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates. We also show that sampled ancestor birth-death models where every sample comes from a different time point are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply our phylogenetic inference accounting for sampled ancestors to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in the literature. The sampler is available as an open-source BEAST2 package (https://github.com/CompEvol/sampled-ancestors).

Show MeSH
Full tree versus reconstructed tree.A full tree produced by the sampled ancestor birth-death process on the left and a reconstructed tree on the right. The sampled nodes are indicated by dots labeled by letters A through H. Nodes A, B and D are sampled ancestors. The reconstructed tree is represented by a sampled ancestor tree , where  denotes the ranked tree topology and , , and  denote the node ages. In the reconstructed tree the root is a sampled node. In the skyline model, birth-death parameters vary from interval to interval. There are two intervals in this figure bounded by the time of origin , parameter shift time , and present time . Between  and  parameters , ,  and  apply and between  and  parameters , , , and . There are additional sampling attempts at times  and  with sampling probabilities  and .
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4263412&req=5

pcbi-1003919-g001: Full tree versus reconstructed tree.A full tree produced by the sampled ancestor birth-death process on the left and a reconstructed tree on the right. The sampled nodes are indicated by dots labeled by letters A through H. Nodes A, B and D are sampled ancestors. The reconstructed tree is represented by a sampled ancestor tree , where denotes the ranked tree topology and , , and denote the node ages. In the reconstructed tree the root is a sampled node. In the skyline model, birth-death parameters vary from interval to interval. There are two intervals in this figure bounded by the time of origin , parameter shift time , and present time . Between and parameters , , and apply and between and parameters , , , and . There are additional sampling attempts at times and with sampling probabilities and .

Mentions: An important characteristic of the models we consider here is incomplete sampling, i.e., we only observe a part of the tree produced by the process. Consider a birth-death process that starts at some point in time (the time of origin) with one lineage and then each existing lineage may bifurcate or go extinct. Further, the lineages are randomly sampled through time. An example of a full tree produced by such process is shown in Figure 1 on the left. We have information only about the portion of the process that produces the samples, shown as labeled nodes, and do not observe the full tree. Thus we only consider this subtree relating to the sample, which is called the reconstructed tree (or the sampled tree) and is shown on the right of Figure 1.


Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration.

Gavryushkina A, Welch D, Stadler T, Drummond AJ - PLoS Comput. Biol. (2014)

Full tree versus reconstructed tree.A full tree produced by the sampled ancestor birth-death process on the left and a reconstructed tree on the right. The sampled nodes are indicated by dots labeled by letters A through H. Nodes A, B and D are sampled ancestors. The reconstructed tree is represented by a sampled ancestor tree , where  denotes the ranked tree topology and , , and  denote the node ages. In the reconstructed tree the root is a sampled node. In the skyline model, birth-death parameters vary from interval to interval. There are two intervals in this figure bounded by the time of origin , parameter shift time , and present time . Between  and  parameters , ,  and  apply and between  and  parameters , , , and . There are additional sampling attempts at times  and  with sampling probabilities  and .
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4263412&req=5

pcbi-1003919-g001: Full tree versus reconstructed tree.A full tree produced by the sampled ancestor birth-death process on the left and a reconstructed tree on the right. The sampled nodes are indicated by dots labeled by letters A through H. Nodes A, B and D are sampled ancestors. The reconstructed tree is represented by a sampled ancestor tree , where denotes the ranked tree topology and , , and denote the node ages. In the reconstructed tree the root is a sampled node. In the skyline model, birth-death parameters vary from interval to interval. There are two intervals in this figure bounded by the time of origin , parameter shift time , and present time . Between and parameters , , and apply and between and parameters , , , and . There are additional sampling attempts at times and with sampling probabilities and .
Mentions: An important characteristic of the models we consider here is incomplete sampling, i.e., we only observe a part of the tree produced by the process. Consider a birth-death process that starts at some point in time (the time of origin) with one lineage and then each existing lineage may bifurcate or go extinct. Further, the lineages are randomly sampled through time. An example of a full tree produced by such process is shown in Figure 1 on the left. We have information only about the portion of the process that produces the samples, shown as labeled nodes, and do not observe the full tree. Thus we only consider this subtree relating to the sample, which is called the reconstructed tree (or the sampled tree) and is shown on the right of Figure 1.

Bottom Line: We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates.We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process.Such modelling has many advantages as argued in the literature.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.

ABSTRACT
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after sampling, in particular we extend the birth-death skyline model [Stadler et al., 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates. We also show that sampled ancestor birth-death models where every sample comes from a different time point are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply our phylogenetic inference accounting for sampled ancestors to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in the literature. The sampler is available as an open-source BEAST2 package (https://github.com/CompEvol/sampled-ancestors).

Show MeSH