The Lambert Way to Gaussianize Heavy-Tailed Data with the Inverse of Tukey's h Transformation as a Special Case.
Bottom Line:
For X being Gaussian it reduces to Tukey's h distribution.Parameters can be estimated by maximum likelihood and applications to S&P 500 log-returns demonstrate the usefulness of the presented methodology.The R package Lambert W implements most of the introduced methodology and is publicly available on CRAN.
View Article:
PubMed Central - PubMed
Affiliation: Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
ABSTRACT
I present a parametric, bijective transformation to generate heavy tail versions of arbitrary random variables. The tail behavior of this heavy tail Lambert W × F X random variable depends on a tail parameter δ ≥ 0: for δ = 0, Y ≡ X, for δ > 0 Y has heavier tails than X. For X being Gaussian it reduces to Tukey's h distribution. The Lambert W function provides an explicit inverse transformation, which can thus remove heavy tails from observed data. It also provides closed-form expressions for the cumulative distribution (cdf) and probability density function (pdf). As a special case, these yield analytic expression for Tukey's h pdf and cdf. Parameters can be estimated by maximum likelihood and applications to S&P 500 log-returns demonstrate the usefulness of the presented methodology. The R package Lambert W implements most of the introduced methodology and is publicly available on CRAN. No MeSH data available. Related in: MedlinePlus |
Related In:
Results -
Collection
getmorefigures.php?uid=PMC4562338&req=5
Mentions: Figure 7(a) shows the S&P 500 log-returns with a total of N = 2,780 daily observations (R package MASS, dataset SP500). Table 3(b) confirms the heavy tails (sample kurtosis 7.70) but also indicates negative skewness (−0.296). As the sample skewness is very sensitive to outliers, we fit a distribution which allows skewness and test for symmetry. In case of the double-tail Lambert W × Gaussian this means testing H0 : δℓ = δr = δ versus H1 : δℓ ≠ δr. Using the likelihood expression in (28), we can use a likelihood ratio test with one degree of freedom (3 versus 4 parameters). The log-likelihood of the double-tail fit (Table 4(a)) equals −3606.0 = −2972.27 + (−633.73) (input log-likelihood + penalty), while the symmetric δ fit gives −3606.56 = −2971.47 + (−635.09). Here the symmetric fit gives a transformed sample that is more Gaussian, but it pays a greater penalty for transforming the data. Comparing twice their difference to a χ12 distribution gives a P-value of 0.29. For comparison, a skew-t fit [51], with location c, scale s, shape α, and ν degrees of freedom, also yields (Function st.mle in the R package sn.) a nonsignificant (Table 4(b)). Thus both fits cannot reject symmetry. |
View Article: PubMed Central - PubMed
Affiliation: Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
No MeSH data available.