Identification of grouped rare and common variants via penalized logistic regression.
Bottom Line: In spite of the success of genome-wide association studies in finding many common variants associated with disease, these variants seem to explain only a small proportion of the estimated heritability.We propose a penalized regression method (PeRC) that analyzes all genes at once, allowing grouping of all (rare and common) variants within a gene, along with subgrouping of the rare variants, thus borrowing strength from both rare and common variants within the same gene.The method can incorporate either a burden-based weighting of the rare variants or one in which the weights are data driven.
Affiliation: Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne NE1 3BZ, United Kingdom. firstname.lastname@example.orgShow MeSH
Related in: MedlinePlus
Mentions: The first term groups the rare and common variants within our region of interest, the second and third terms correspond to the elastic net and promote sparsity of the individual common variants and the groups of rare variants, while the final term prevents the coefficients for the rare variants from becoming too large and promotes a small amount of sparsity in the rare variants. If , then when , corresponds to a sparse group lasso, and corresponds to the elastic net. We realize that using a ridge penalty with a group penalty may be slightly redundant as the sparse group lasso gives an elastic net fit within each nonzero group, but this addition makes PeRC more flexible in terms of the analyses one can perform. To perform a burden-based procedure, we simply set , and force for all r. We could also force (to give a proportion of rare variants on a scale of 0–2 rather than a count), but we found that this did not perform as well in practice. We will refer to the weighted procedure as PeRCW and the burden procedure as PeRCB. For our weighted analysis, we keep λ4 constant at 0.5 to maintain the shape of that part of the penalty function (although we could choose instead to make it slightly higher to encourage more sparsity). We place a weight on each group dependent on its size, for example, max, where is the total number of common variants in the group plus one to account for the rare group coefficient. This prevents the preferential selection of large groups solely for their ability to explain a larger proportion of phenotype variance due to increased degrees of freedom. Additionally, we can assign individual weights to the penalty terms for each variable. For instance, we may choose to penalize the common variants based on their MAF and set equal to , which results in when , as implemented in the software Mendel [Zhou et al., 44, 2011]. This downweights the penalty of less common variants relative to more common variants. We also place a weight on the rare group coefficient of similar form, where the is replaced by the average of the variants in the rare group, or for the case of the burden procedure, the of the collapsed locus. With weights and unstandardized genotypes, the method preferentially selects mostly common variants. For each rare variant, we place weight to allow little penalty to be placed on very rare variants with frequencies much smaller than τ. After some experimentation, we have currently set for the burden procedure, and for the weighted procedure, where κ is a penalty strength to be determined by permutation testing for both the weighted and burden tests. This choice of penalty parameters allows for strong grouping, leading to heavy group sparsity and intermediate sparsity within a selected gene, yet has the ability to select a wide range of causal gene configurations. The log likelihood is maximized using cyclic coordinate ascent. See Figure 1 for a pictorial representation of the resulting penalty function shapes.
Affiliation: Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne NE1 3BZ, United Kingdom. email@example.com