Limits...
Comparison of mode estimation methods and application in molecular clock analysis.

Hedges SB, Shah P - BMC Bioinformatics (2003)

Bottom Line: In those cases, the mode is a better estimator of the overall time of divergence than the mean or median.We determined that bootstrapping reduces the variance of both mode estimators.Application of the different methods to real data sets yielded results that were concordant with the simulations.

View Article: PubMed Central - HTML - PubMed

Affiliation: NASA Astrobiology Institute and Department of Biology, Pennsylvania State University, 208 Mueller Laboratory, University Park, PA 16802-5301, U,S,A. sbh1@psu.edu

ABSTRACT

Background: Distributions of time estimates in molecular clock studies are sometimes skewed or contain outliers. In those cases, the mode is a better estimator of the overall time of divergence than the mean or median. However, different methods are available for estimating the mode. We compared these methods in simulations to determine their strengths and weaknesses and further assessed their performance when applied to real data sets from a molecular clock study.

Results: We found that the half-range mode and robust parametric mode methods have a lower bias than other mode methods under a diversity of conditions. However, the half-range mode suffers from a relatively high variance and the robust parametric mode is more susceptible to bias by outliers. We determined that bootstrapping reduces the variance of both mode estimators. Application of the different methods to real data sets yielded results that were concordant with the simulations.

Conclusion: Because the half-range mode is a simple and fast method, and produced less bias overall in our simulations, we recommend the bootstrapped version of it as a general-purpose mode estimator and suggest a bootstrap method for obtaining the standard error and 95% confidence interval of the mode.

Show MeSH

Related in: MedlinePlus

A normal distribution with outliers, showing the relative positions of the mean, median, and mode. In this case, the outliers (contaminants) are normally distributed and centered at twice the distance between the true mode and the 99th percentile of original normal distribution and account for 20% of the total data points.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC183840&req=5

Figure 1: A normal distribution with outliers, showing the relative positions of the mean, median, and mode. In this case, the outliers (contaminants) are normally distributed and centered at twice the distance between the true mode and the 99th percentile of original normal distribution and account for 20% of the total data points.

Mentions: It is not uncommon in many fields to encounter data distributions that are skewed or contain outliers. In those cases, the arithmetic mean may not be an appropriate statistic to represent the center of location of the data. Alternative statistics with less bias are the median and the mode. The median is the value of the variable, in an ordered array, which has an equal number of data points on either side, whereas the mode is the value of the peak of the distribution (Figure 1). The mode is biased the least by outliers and contaminants [1-3] and is used commonly in astronomy [4,5] and occasionally in other fields, including biology [6-10]. However, calculation of the mode is more difficult than the mean or median and this has limited its widespread application.


Comparison of mode estimation methods and application in molecular clock analysis.

Hedges SB, Shah P - BMC Bioinformatics (2003)

A normal distribution with outliers, showing the relative positions of the mean, median, and mode. In this case, the outliers (contaminants) are normally distributed and centered at twice the distance between the true mode and the 99th percentile of original normal distribution and account for 20% of the total data points.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC183840&req=5

Figure 1: A normal distribution with outliers, showing the relative positions of the mean, median, and mode. In this case, the outliers (contaminants) are normally distributed and centered at twice the distance between the true mode and the 99th percentile of original normal distribution and account for 20% of the total data points.
Mentions: It is not uncommon in many fields to encounter data distributions that are skewed or contain outliers. In those cases, the arithmetic mean may not be an appropriate statistic to represent the center of location of the data. Alternative statistics with less bias are the median and the mode. The median is the value of the variable, in an ordered array, which has an equal number of data points on either side, whereas the mode is the value of the peak of the distribution (Figure 1). The mode is biased the least by outliers and contaminants [1-3] and is used commonly in astronomy [4,5] and occasionally in other fields, including biology [6-10]. However, calculation of the mode is more difficult than the mean or median and this has limited its widespread application.

Bottom Line: In those cases, the mode is a better estimator of the overall time of divergence than the mean or median.We determined that bootstrapping reduces the variance of both mode estimators.Application of the different methods to real data sets yielded results that were concordant with the simulations.

View Article: PubMed Central - HTML - PubMed

Affiliation: NASA Astrobiology Institute and Department of Biology, Pennsylvania State University, 208 Mueller Laboratory, University Park, PA 16802-5301, U,S,A. sbh1@psu.edu

ABSTRACT

Background: Distributions of time estimates in molecular clock studies are sometimes skewed or contain outliers. In those cases, the mode is a better estimator of the overall time of divergence than the mean or median. However, different methods are available for estimating the mode. We compared these methods in simulations to determine their strengths and weaknesses and further assessed their performance when applied to real data sets from a molecular clock study.

Results: We found that the half-range mode and robust parametric mode methods have a lower bias than other mode methods under a diversity of conditions. However, the half-range mode suffers from a relatively high variance and the robust parametric mode is more susceptible to bias by outliers. We determined that bootstrapping reduces the variance of both mode estimators. Application of the different methods to real data sets yielded results that were concordant with the simulations.

Conclusion: Because the half-range mode is a simple and fast method, and produced less bias overall in our simulations, we recommend the bootstrapped version of it as a general-purpose mode estimator and suggest a bootstrap method for obtaining the standard error and 95% confidence interval of the mode.

Show MeSH
Related in: MedlinePlus