Limits...
Characterization of the uncertainty of divergence time estimation under relaxed molecular clock models using multiple loci.

Zhu T, Dos Reis M, Yang Z - Syst. Biol. (2014)

Bottom Line: Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis.Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ([Formula: see text]) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/[Formula: see text], and the limit is independent of the number of sites in the sequence alignment.Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation.

View Article: PubMed Central - PubMed

Affiliation: Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.

Show MeSH
The finite-sites theory applied to the analysis of genomic sequence data from six primate species (Fig. 3). The square of the posterior 95% CI widths () for the 5 node ages (, , , , and  is plotted against the reciprocal of the number of loci, sampled at random from 7947 protein coding genes (with only the third codon positions used).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4380039&req=5

Figure 4: The finite-sites theory applied to the analysis of genomic sequence data from six primate species (Fig. 3). The square of the posterior 95% CI widths () for the 5 node ages (, , , , and is plotted against the reciprocal of the number of loci, sampled at random from 7947 protein coding genes (with only the third codon positions used).

Mentions: In Figure 4 we plot the posterior uncertainty (measured by for the five divergence times in the phylogeny (– against 1/. In all cases except , shows a strong linear relationship with 1/ as long as 10, consistent with our predictions. For , the linear relationship holds well only for much larger values of , that is, only if . For small before the asymptotics become reliable, the posterior CI width is smaller than the predicted value from the straight line (see plot for in Fig. 4). As in the simulation for three species, the asymptotic theory starts to become reliable for smaller values of if the node has a less informative prior calibration. dos Reis and Yang (2013) suggested the use of the 95% prior interval width divided by the prior mean as a measure of prior or calibration precision. This is 0.91, 0.37, 0.90, 1.02, and 0.56 for , , , , and , respectively, with having the most precise calibration.


Characterization of the uncertainty of divergence time estimation under relaxed molecular clock models using multiple loci.

Zhu T, Dos Reis M, Yang Z - Syst. Biol. (2014)

The finite-sites theory applied to the analysis of genomic sequence data from six primate species (Fig. 3). The square of the posterior 95% CI widths () for the 5 node ages (, , , , and  is plotted against the reciprocal of the number of loci, sampled at random from 7947 protein coding genes (with only the third codon positions used).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4380039&req=5

Figure 4: The finite-sites theory applied to the analysis of genomic sequence data from six primate species (Fig. 3). The square of the posterior 95% CI widths () for the 5 node ages (, , , , and is plotted against the reciprocal of the number of loci, sampled at random from 7947 protein coding genes (with only the third codon positions used).
Mentions: In Figure 4 we plot the posterior uncertainty (measured by for the five divergence times in the phylogeny (– against 1/. In all cases except , shows a strong linear relationship with 1/ as long as 10, consistent with our predictions. For , the linear relationship holds well only for much larger values of , that is, only if . For small before the asymptotics become reliable, the posterior CI width is smaller than the predicted value from the straight line (see plot for in Fig. 4). As in the simulation for three species, the asymptotic theory starts to become reliable for smaller values of if the node has a less informative prior calibration. dos Reis and Yang (2013) suggested the use of the 95% prior interval width divided by the prior mean as a measure of prior or calibration precision. This is 0.91, 0.37, 0.90, 1.02, and 0.56 for , , , , and , respectively, with having the most precise calibration.

Bottom Line: Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis.Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ([Formula: see text]) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/[Formula: see text], and the limit is independent of the number of sites in the sequence alignment.Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation.

View Article: PubMed Central - PubMed

Affiliation: Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.

Show MeSH