Limits...
Comparison of mode estimation methods and application in molecular clock analysis.

Hedges SB, Shah P - BMC Bioinformatics (2003)

Bottom Line: In those cases, the mode is a better estimator of the overall time of divergence than the mean or median.We determined that bootstrapping reduces the variance of both mode estimators.Application of the different methods to real data sets yielded results that were concordant with the simulations.

View Article: PubMed Central - HTML - PubMed

Affiliation: NASA Astrobiology Institute and Department of Biology, Pennsylvania State University, 208 Mueller Laboratory, University Park, PA 16802-5301, U,S,A. sbh1@psu.edu

ABSTRACT

Background: Distributions of time estimates in molecular clock studies are sometimes skewed or contain outliers. In those cases, the mode is a better estimator of the overall time of divergence than the mean or median. However, different methods are available for estimating the mode. We compared these methods in simulations to determine their strengths and weaknesses and further assessed their performance when applied to real data sets from a molecular clock study.

Results: We found that the half-range mode and robust parametric mode methods have a lower bias than other mode methods under a diversity of conditions. However, the half-range mode suffers from a relatively high variance and the robust parametric mode is more susceptible to bias by outliers. We determined that bootstrapping reduces the variance of both mode estimators. Application of the different methods to real data sets yielded results that were concordant with the simulations.

Conclusion: Because the half-range mode is a simple and fast method, and produced less bias overall in our simulations, we recommend the bootstrapped version of it as a general-purpose mode estimator and suggest a bootstrap method for obtaining the standard error and 95% confidence interval of the mode.

Show MeSH

Related in: MedlinePlus

Application of mode estimation methods to published data sets. The data are divergence time estimates (millions of years ago) from a molecular clock study of fungi and plants [8]. Both graphs include the histogram distribution, the actual data points plotted in a horizontal line, and positions of the various estimates of central tendency (Table 2). The two recommended mode estimators are highlighted in bold. (a) Archiascomycetes versus other Ascomycota (n = 70 constant rate proteins), (b) Hemiascomycetes versus filamentous Ascomycetes (n = 48).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC183840&req=5

Figure 6: Application of mode estimation methods to published data sets. The data are divergence time estimates (millions of years ago) from a molecular clock study of fungi and plants [8]. Both graphs include the histogram distribution, the actual data points plotted in a horizontal line, and positions of the various estimates of central tendency (Table 2). The two recommended mode estimators are highlighted in bold. (a) Archiascomycetes versus other Ascomycota (n = 70 constant rate proteins), (b) Hemiascomycetes versus filamentous Ascomycetes (n = 48).

Mentions: Analysis of the published molecular clock data for fungi and plants (Figure 6, Table 2) showed that the mean was higher than the median in most cases, indicating asymmetric distributions and supporting the use of the mode. Although the true modes are not known in these cases, some patterns were evident. Among the mode estimates, those using SPM often were the lowest and appeared to visually underestimate the center of the distribution. Of the remaining mode estimators, HRM and RPM were 10.1% different from each other, on average, across the five sample data sets (Table 2). HRM-BMO (mode of the bootstrapped modes) and RPM-BMO averaged 7.2% different, and HRM-BME (mean of the bootstrapped modes) and RPM-BME were only 1.4% different. A greater difference was observed between HRM and its bootstrapped estimate (HRM-BME) than between RPM and RPM-BME. All of these results from analyses of real data were consistent with the simulation results (Figure 5).


Comparison of mode estimation methods and application in molecular clock analysis.

Hedges SB, Shah P - BMC Bioinformatics (2003)

Application of mode estimation methods to published data sets. The data are divergence time estimates (millions of years ago) from a molecular clock study of fungi and plants [8]. Both graphs include the histogram distribution, the actual data points plotted in a horizontal line, and positions of the various estimates of central tendency (Table 2). The two recommended mode estimators are highlighted in bold. (a) Archiascomycetes versus other Ascomycota (n = 70 constant rate proteins), (b) Hemiascomycetes versus filamentous Ascomycetes (n = 48).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC183840&req=5

Figure 6: Application of mode estimation methods to published data sets. The data are divergence time estimates (millions of years ago) from a molecular clock study of fungi and plants [8]. Both graphs include the histogram distribution, the actual data points plotted in a horizontal line, and positions of the various estimates of central tendency (Table 2). The two recommended mode estimators are highlighted in bold. (a) Archiascomycetes versus other Ascomycota (n = 70 constant rate proteins), (b) Hemiascomycetes versus filamentous Ascomycetes (n = 48).
Mentions: Analysis of the published molecular clock data for fungi and plants (Figure 6, Table 2) showed that the mean was higher than the median in most cases, indicating asymmetric distributions and supporting the use of the mode. Although the true modes are not known in these cases, some patterns were evident. Among the mode estimates, those using SPM often were the lowest and appeared to visually underestimate the center of the distribution. Of the remaining mode estimators, HRM and RPM were 10.1% different from each other, on average, across the five sample data sets (Table 2). HRM-BMO (mode of the bootstrapped modes) and RPM-BMO averaged 7.2% different, and HRM-BME (mean of the bootstrapped modes) and RPM-BME were only 1.4% different. A greater difference was observed between HRM and its bootstrapped estimate (HRM-BME) than between RPM and RPM-BME. All of these results from analyses of real data were consistent with the simulation results (Figure 5).

Bottom Line: In those cases, the mode is a better estimator of the overall time of divergence than the mean or median.We determined that bootstrapping reduces the variance of both mode estimators.Application of the different methods to real data sets yielded results that were concordant with the simulations.

View Article: PubMed Central - HTML - PubMed

Affiliation: NASA Astrobiology Institute and Department of Biology, Pennsylvania State University, 208 Mueller Laboratory, University Park, PA 16802-5301, U,S,A. sbh1@psu.edu

ABSTRACT

Background: Distributions of time estimates in molecular clock studies are sometimes skewed or contain outliers. In those cases, the mode is a better estimator of the overall time of divergence than the mean or median. However, different methods are available for estimating the mode. We compared these methods in simulations to determine their strengths and weaknesses and further assessed their performance when applied to real data sets from a molecular clock study.

Results: We found that the half-range mode and robust parametric mode methods have a lower bias than other mode methods under a diversity of conditions. However, the half-range mode suffers from a relatively high variance and the robust parametric mode is more susceptible to bias by outliers. We determined that bootstrapping reduces the variance of both mode estimators. Application of the different methods to real data sets yielded results that were concordant with the simulations.

Conclusion: Because the half-range mode is a simple and fast method, and produced less bias overall in our simulations, we recommend the bootstrapped version of it as a general-purpose mode estimator and suggest a bootstrap method for obtaining the standard error and 95% confidence interval of the mode.

Show MeSH
Related in: MedlinePlus