Abstract
Proc. 50th Annu. Allerton Conf. Communication, Control, and
Computing, Monticello, IL, Oct. 1-5, 2012, pp. 494-501 Group model selection is the problem of determining a small subset of groups
of predictors (e.g., the expression data of genes) that are responsible for
majority of the variation in a response variable (e.g., the malignancy of a
tumor). This paper focuses on group model selection in high-dimensional linear
models, in which the number of predictors far exceeds the number of samples of
the response variable. Existing works on high-dimensional group model selection
either require the number of samples of the response variable to be
significantly larger than the total number of predictors contributing to the
response or impose restrictive statistical priors on the predictors and/or
nonzero regression coefficients. This paper provides comprehensive
understanding of a low-complexity approach to group model selection that avoids
some of these limitations. The proposed approach, termed Group Thresholding
(GroTh), is based on thresholding of marginal correlations of groups of
predictors with the response variable and is reminiscent of existing
thresholding-based approaches in the literature. The most important
contribution of the paper in this regard is relating the performance of GroTh
to a polynomial-time verifiable property of the predictors for the general case
of arbitrary (random or deterministic) predictors and arbitrary nonzero
regression coefficients.