This article is concerned with feature screening and variable selection for varying coefficient models with ultrahigh-dimensional covariates. We propose a new feature screening procedure for these models based on conditional correlation coefficient. We systematically study the theoretical properties of the proposed procedure, and establish their sure screening property and the ranking consistency. To enhance the finite sample performance of the proposed procedure, we further develop an iterative feature screening procedure. Monte Carlo simulation studies were conducted to examine the performance of the proposed procedures. In practice, we advocate a two-stage approach for varying coefficient models. The two-stage approach consists of (a) reducing the ultrahigh dimensionality by using the proposed procedure and (b) applying regularization methods for dimension-reduced varying coefficient models to make statistical inferences on the coefficient functions. We illustrate the proposed two-stage approach by a real data example. Supplementary materials for this article are available online.
This work is concerned with marginal sure independence feature screening for ultrahigh dimensional discriminant analysis. The response variable is categorical in discriminant analysis. This enables us to use the conditional distribution function to construct a new index for feature screening. In this article, we propose a marginal feature screening procedure based on empirical conditional distribution function. We establish the sure screening and ranking consistency properties for the proposed procedure without assuming any moment condition on the predictors. The proposed procedure enjoys several appealing merits. First, it is model-free in that its implementation does not require specification of a regression model. Second, it is robust to heavy-tailed distributions of predictors and the presence of potential outliers. Third, it allows the categorical response having a diverging number of classes in the order of O(n^κ ) with some κ ≥ 0. We assess the finite sample property of the proposed procedure byMonte Carlo simulation studies and numerical comparison.We further illustrate the proposed methodology by empirical analyses of two real-life datasets. Supplementary materials for this article are available online.
We establish necessary and sufficient conditions for consistent root reconstruction in continuous-time Markov models with countable state space on bounded-height trees. Here a root state estimator is said to be consistent if the probability that it returns to the true root state converges to 1 as the number of leaves tends to infinity. We also derive quantitative bounds on the error of reconstruction. Our results answer a question of Gascuel and Steel [GS10] and have implications for ancestral sequence reconstruction in a classical evolutionary model of nucleotide insertion and deletion [TKF91].
This paper deals with the estimation of a high-dimensional covariance with a con-
ditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error
covariance matrix in an approximate factor model, we allow for the presence of some
cross-sectional correlation even after taking out common but unobservable factors.
We introduce the Principal Orthogonal complEment Thresholding (POET) method
to explore such an approximate factor structure with sparsity. The POET estimator
includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan,
and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive
thresholding estimator (Cai and Liu, 2011) as specic examples. We provide mathe-
matical insights when the factor analysis is approximately the same as the principal
component analysis for high-dimensional data. The rates of convergence of the sparse
residual covariance matrix and the conditional sparse covariance matrix are studied
under various norms. It is shown that the impact of estimating the unknown factors
vanishes as the dimensionality increases. The uniform rates of convergence for the un-
observed factors and their factor loadings are derived. The asymptotic results are also
veried by extensive simulation studies. Finally, a real data application on portfolio
allocation is presented.
The purpose of this paper is to propose methodologies for statistical inference of low
dimensional parameters with high dimensional data.We focus on constructing confidence intervals
for individual coefficients and linear combinations of several of them in a linear regression
model, although our ideas are applicable in a much broader context.The theoretical results that
are presented provide sufficient conditions for the asymptotic normality of the proposed estimators
along with a consistent estimator for their finite dimensional covariance matrices. These
sufficient conditions allow the number of variables to exceed the sample size and the presence
of many small non-zero coefficients. Our methods and theory apply to interval estimation of a
preconceived regression coefficient or contrast as well as simultaneous interval estimation of
many regression coefficients. Moreover, the method proposed turns the regression data into
an approximate Gaussian sequence of point estimators of individual regression coefficients,
which can be used to select variables after proper thresholding. The simulation results that are
presented demonstrate the accuracy of the coverage probability of the confidence intervals
proposed as well as other desirable properties, strongly supporting the theoretical results.