Limits...
Impact of the partitioning scheme on divergence times inferred from Mammalian genomic data sets.

Voloch CM, Schrago CG - Evol. Bioinform. Online (2012)

Bottom Line: However, the effect of the partitioning scheme on divergence time estimates has generally been ignored.After drawing divergence time inferences using the uncorrelated relaxed clock in BEAST, we have compared the age estimates between the partitioning schemes.Our results show that, in general, both schemes resulted in similar chronological estimates, however the concatenated data sets were more efficient than the partitioned ones in attaining suitable effective sample sizes.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.

ABSTRACT
Data partitioning has long been regarded as an important parameter for phylogenetic inference. The division of heterogeneous multigene data sets into partitions with similar substitution patterns is known to increase the performance of probabilistic phylogenetic methods. However, the effect of the partitioning scheme on divergence time estimates has generally been ignored. To investigate the impact of data partitioning on the estimation of divergence times, we have constructed two genomic data sets. The first one with 15 nuclear genes comprising 50,928 bp were selected from the OrthoMam database; the second set was composed of complete mitochondrial genomes. We studied two partitioning schemes: concatenated supermatrices and partitioned gene analysis. We have also measured the impact of taxonomic sampling on the estimates. After drawing divergence time inferences using the uncorrelated relaxed clock in BEAST, we have compared the age estimates between the partitioning schemes. Our results show that, in general, both schemes resulted in similar chronological estimates, however the concatenated data sets were more efficient than the partitioned ones in attaining suitable effective sample sizes.

No MeSH data available.


Related in: MedlinePlus

Linear regressions between the means of the prior and posterior distributions of the node ages of the phylogenies in Figure 1.Notes: The solid green lines represent the regressions with a slope equal to 1. The dashed blue lines represent the regression between the prior and posterior under the partitioned scheme, whereas the dashed red lines represent the regression between the prior and posterior under the concatenated scheme. Regression coefficients (r2) are all significant at P < 0.001.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3362329&req=5

f3-ebo-8-2012-207: Linear regressions between the means of the prior and posterior distributions of the node ages of the phylogenies in Figure 1.Notes: The solid green lines represent the regressions with a slope equal to 1. The dashed blue lines represent the regression between the prior and posterior under the partitioned scheme, whereas the dashed red lines represent the regression between the prior and posterior under the concatenated scheme. Regression coefficients (r2) are all significant at P < 0.001.

Mentions: In the nuclear data sets, the comparison between the prior and posterior distributions revealed strong correlations between the estimates (Fig. 3A–C). However, as taxon sampling increased, the difference between the prior and posterior means of the chronological estimates became larger. For example, under the smallest taxonomic arrangement, the means of the posterior distributions of the concatenated scheme were more similar to their priors than to the posterior means of the partitioned scheme. The same was true for the comparison between the priors and posteriors of the partitioned scheme (Fig. 3A). However, when comparing the slopes of the regression lines in the second nuclear taxonomic composition, the posterior divergence time means of both of the partitioning schemes were more similar to each other than to their respective priors (Fig. 3B). This scenario was intensified in the more species-rich nuclear arrangement (Fig. 3C).


Impact of the partitioning scheme on divergence times inferred from Mammalian genomic data sets.

Voloch CM, Schrago CG - Evol. Bioinform. Online (2012)

Linear regressions between the means of the prior and posterior distributions of the node ages of the phylogenies in Figure 1.Notes: The solid green lines represent the regressions with a slope equal to 1. The dashed blue lines represent the regression between the prior and posterior under the partitioned scheme, whereas the dashed red lines represent the regression between the prior and posterior under the concatenated scheme. Regression coefficients (r2) are all significant at P < 0.001.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3362329&req=5

f3-ebo-8-2012-207: Linear regressions between the means of the prior and posterior distributions of the node ages of the phylogenies in Figure 1.Notes: The solid green lines represent the regressions with a slope equal to 1. The dashed blue lines represent the regression between the prior and posterior under the partitioned scheme, whereas the dashed red lines represent the regression between the prior and posterior under the concatenated scheme. Regression coefficients (r2) are all significant at P < 0.001.
Mentions: In the nuclear data sets, the comparison between the prior and posterior distributions revealed strong correlations between the estimates (Fig. 3A–C). However, as taxon sampling increased, the difference between the prior and posterior means of the chronological estimates became larger. For example, under the smallest taxonomic arrangement, the means of the posterior distributions of the concatenated scheme were more similar to their priors than to the posterior means of the partitioned scheme. The same was true for the comparison between the priors and posteriors of the partitioned scheme (Fig. 3A). However, when comparing the slopes of the regression lines in the second nuclear taxonomic composition, the posterior divergence time means of both of the partitioning schemes were more similar to each other than to their respective priors (Fig. 3B). This scenario was intensified in the more species-rich nuclear arrangement (Fig. 3C).

Bottom Line: However, the effect of the partitioning scheme on divergence time estimates has generally been ignored.After drawing divergence time inferences using the uncorrelated relaxed clock in BEAST, we have compared the age estimates between the partitioning schemes.Our results show that, in general, both schemes resulted in similar chronological estimates, however the concatenated data sets were more efficient than the partitioned ones in attaining suitable effective sample sizes.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.

ABSTRACT
Data partitioning has long been regarded as an important parameter for phylogenetic inference. The division of heterogeneous multigene data sets into partitions with similar substitution patterns is known to increase the performance of probabilistic phylogenetic methods. However, the effect of the partitioning scheme on divergence time estimates has generally been ignored. To investigate the impact of data partitioning on the estimation of divergence times, we have constructed two genomic data sets. The first one with 15 nuclear genes comprising 50,928 bp were selected from the OrthoMam database; the second set was composed of complete mitochondrial genomes. We studied two partitioning schemes: concatenated supermatrices and partitioned gene analysis. We have also measured the impact of taxonomic sampling on the estimates. After drawing divergence time inferences using the uncorrelated relaxed clock in BEAST, we have compared the age estimates between the partitioning schemes. Our results show that, in general, both schemes resulted in similar chronological estimates, however the concatenated data sets were more efficient than the partitioned ones in attaining suitable effective sample sizes.

No MeSH data available.


Related in: MedlinePlus