Detecting Genetic Interactions for Quantitative Traits Using m-Spacing Entropy Measure.
Bottom Line:
Information gain based on entropy measure has previously been successful in identifying genetic associations with binary traits.Hence, the information gain can be obtained for any phenotype distribution.Here, we show its use to successfully identify the main effect, as well as the genetic interactions, associated with a quantitative trait.
View Article:
PubMed Central - PubMed
Affiliation: Department of Physiology and Biophysics, Eulji University, Daejeon, Republic of Korea.
ABSTRACT
A number of statistical methods for detecting gene-gene interactions have been developed in genetic association studies with binary traits. However, many phenotype measures are intrinsically quantitative and categorizing continuous traits may not always be straightforward and meaningful. Association of gene-gene interactions with an observed distribution of such phenotypes needs to be investigated directly without categorization. Information gain based on entropy measure has previously been successful in identifying genetic associations with binary traits. We extend the usefulness of this information gain by proposing a nonparametric evaluation method of conditional entropy of a quantitative phenotype associated with a given genotype. Hence, the information gain can be obtained for any phenotype distribution. Because any functional form, such as Gaussian, is not assumed for the entire distribution of a trait or a given genotype, this method is expected to be robust enough to be applied to any phenotypic association data. Here, we show its use to successfully identify the main effect, as well as the genetic interactions, associated with a quantitative trait. No MeSH data available. |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC4538333&req=5
Mentions: The estimator in (4) has both n and m as parameters. In genetic association studies, the number of samples, n, of several hundreds is common. However, when the conditional entropy is estimated, there may be a minor allele that could have a much smaller number of samples corresponding to that allele. Moreover, the choice of the sample-spacing, m, should affect the resulting estimation of an entropy value. Therefore, it is required to have an entropy estimation scheme independent of the number of samples, without the need of choosing a particular value of the sample-spacing. To illustrate such a requirement, an ensemble of 3,000 sets of the random deviation from N(0, 12) was generated for each data point in Figure 1, where the mean and standard deviation of the estimates are plotted for each ensemble. On the left panel of Figure 1, m is fixed to 10 and 20 while n is varied. The analytic formula of the entropy for a normal distribution can be obtained as follows [20], where e is Euler's number:(5)H=lnσ2πe.The calculated value of (5) is pointed on the vertical axis with a horizontal arrow with the corresponding σ above it. The obvious n-dependence of the estimator can be seen in this plot, where the estimation approaches the analytic value, as n increases with -consistency, as expected [24]. In Figure 1(b), n is fixed to 400, while m is varied. In this plot, the estimated entropy again changes in value throughout the possible range of m. It is shown that the estimated value is always smaller than the analytically calculated value. Therefore, assigning a particular value to m such as , the typical choice [25], would not be appropriate in this sampling range. Because of these n- and m-dependences, the estimator in (4) may need to be modified. Therefore, we modify the entropy estimator in (4) as follows:(6)Hm,n=1n−1∑m=1n−11n−m∑k=1n−mlnnmXn,k+m−Xn,kaaaaaaaaaaaaaaaa−Γ′mΓm+lnm.In this modification, an entropy estimator is averaged over the possible m values for each n, which is denoted by 〈m〉. This estimator is used to plot the entropy versus number of samples in Figure 2. Over a wide range of n, this entropy estimator yields very stable values, in contrast to Figure 1(a). An increase in the extremely small n range should be within the tolerable error in an application of genome-wide association, as the contribution to the conditional entropy by such a minor allele would be suppressed by the weighting factor of the marginal probability that should be proportional to the number of corresponding samples. Analytically obtained entropy values for N(0, σ2), with three different σ's, are marked on the vertical axis on the right-hand side. Regardless of the value of σ, the differences between the analytically obtained value and the values given by the estimator stay essentially the same. Considering that the association study measures the difference between the entropy and the corresponding conditional entropy, the stability should be a more critical issue than the absolute value of the estimates. Therefore compensation of this Δ would not be necessary as long as it is stable. Furthermore, the underestimation of the entropy shown in the plot should have little effect on the association strength. Hence, an entropy estimator has been set up that should satisfy the practical n-independence without the need to find a proper sample-spacing. |
View Article: PubMed Central - PubMed
Affiliation: Department of Physiology and Biophysics, Eulji University, Daejeon, Republic of Korea.
No MeSH data available.