Adaptive varying-coefficient linear model

ArticleinJournal Of The Royal Statistical Society 65(1):57 - 80 · February 2003with 92 Reads 
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
Cite this publication
Abstract
Varying-coefficient linear models arise from multivariate nonparametric regression, nonlinear time series modelling and forecasting, functional data analysis, longitudinal data analysis, and others. It has been a common practice to assume that the vary-coefficients are functions of a given variable which is often called an index. A frequently asked question is which variable should be used as the index. In this paper, we explore the class of the varying-coefficient linear models in which the index is unknown and is estimated as a linear combination of regression and/or other variables. This will enlarge the modelling capacity substantially. We search for the index such that the derived varying-coefficient model provides the best approximation to the underlying unknown multi-dimensional regression function in the least square sense. The search is implemented through the newly proposed hybrid backfitting algorithm. The core of the algorithm is the alternative iteration between estimating the index through a one-step scheme and estimating coefficient functions through a one-dimensional local linear smoothing. The generalised cross-validation method for choosing bandwidth is efficiently incorporated into the algorithm. The locally significant variables are selected in terms of the combined use of t-statistic and Akaike information criterion. We further extend the algorithm for the models with two indices. Simulation shows that the proposed methodology has appreciable flexibility to model complex multivariate nonlinear structure and is practically feasible with average modern computers. The methods are further illustrated through the Canadian mink-muskrat data in 1925-1994 and the pound/dollar exchange rates in 1974-1983.

Do you want to read the rest of this article?

Request Full-text Paper PDF
  • Chapter
    Searching for an effective dimension reduction space is an important problem in regression, especially for high dimensional data. We propose an adaptive approach based on semiparametric models, which we call the (conditional) minimum average variance estimation (MAVE) method, within quite a general setting. The MAVE method has the following advantages. Most existing methods must under smooth the nonparametric link function estimator to achieve a faster rate of consistency for the estimator of the parameters (than for that of the nonparametric function). In contrast, a faster consistency rate can be achieved by the MAVE method even without undersmoothing the nonparametric link function estimator. The MAVE method is applicable to a wide range of models, with fewer restrictions on the distribution of the covariates, to the extent that even time series can be included. Because of the faster rate of consistency for the parameter estimators, it is possible for us to estimate the dimension of the space conSistently. The relationship of the MAVE method with other methods is also investigated. In particular, a simple outer product gradient estimator is proposed as an initial estimator. In addition to theoretical results, we demonstrate the efficacy of the MAVE method for high dimensional data sets through simulation. Two real data sets are analysed by using the MAVE approach.
  • Article
    Full-text available
    A new methodology, which combines nonparametric method based on local functional coefficient autoregressive (LFAR) form with chaos theory and regional method, is proposed for multistep prediction of chaotic time series. The objective of this research study is to improve the performance of long-term forecasting of chaotic time series. To obtain the prediction values of chaotic time series, three steps are involved. Firstly, the original time series is reconstructed in m -dimensional phase space with a time delay τ by using chaos theory. Secondly, select the nearest neighbor points by using local method in the m -dimensional phase space. Thirdly, we use the nearest neighbor points to get a LFAR model. The proposed model’s parameters are selected by modified generalized cross validation (GCV) criterion. Both simulated data (Lorenz and Mackey-Glass systems) and real data (Sunspot time series) are used to illustrate the performance of the proposed methodology. By detailed investigation and comparing our results with published researches, we find that the LFAR model can effectively fit nonlinear characteristics of chaotic time series by using simple structure and has excellent performance for multistep forecasting.
  • Thesis
    Functional data analysis (FDA) is a statistical branch that is increasingly being used in many applied scientific fields such as biological experimentation, finance, physics, etc. A reason for this is the use of new data collection technologies that increase the number of observations during a time interval.Functional datasets are realization samples of some random functions which are measurable functions defined on some probability space with values in an infinite dimensional functional space.There are many questions that FDA studies, among which functional linear regression is one of the most studied, both in applications and in methodological development.The objective of this thesis is the study of functional linear regression models when both the covariate X and the response Y are random functions and both of them are time-dependent. In particular we want to address the question of how the history of a random function X influences the current value of another random function Y at any given time t.In order to do this we are mainly interested in three models: the functional concurrent model (FCCM), the functional convolution model (FCVM) and the historical functional linear model. In particular for the FCVM and FCCM we have proposed estimators which are consistent, robust and which are faster to compute compared to others already proposed in the literature.Our estimation method in the FCCM extends the Ridge Regression method developed in the classical linear case to the functional data framework. We prove the probability convergence of this estimator, obtain a rate of convergence and develop an optimal selection procedure of theregularization parameter.The FCVM allows to study the influence of the history of X on Y in a simple way through the convolution. In this case we use the continuous Fourier transform operator to define an estimator of the functional coefficient. This operator transforms the convolution model into a FCCM associated in the frequency domain. The consistency and rate of convergence of the estimator are derived from the FCCM.The FCVM can be generalized to the historical functional linear model, which is itself a particular case of the fully functional linear model. Thanks to this we have used the Karhunen–Loève estimator of the historical kernel. The related question about the estimation of the covariance operator of the noise in the fully functional linear model is also treated.Finally we use all the aforementioned models to study the interaction between Vapour Pressure Deficit (VPD) and Leaf Elongation Rate (LER) curves. This kind of data is obtained with high-throughput plant phenotyping platform and is well suited to be studied with FDA methods.
  • Article
    Full-text available
    We consider quantile regression incorporating polynomial spline approximation for single-index coefficient models. Compared to mean regression, quantile regression for this class of models is more technically challenging and has not been considered before. We use a check loss minimization approach and employed a projection/orthogonalization technique to deal with the theoretical challenges. Compared to previously used kernel estimation approach, which was developed for mean regression only, spline estimation is more computationally expedient and directly produces a smooth estimated curve. Simulations and a real data set is used to illustrate the finite sample properties of the proposed estimator.
  • Article
    Nonparametric regression models with locally stationary covariates have received increasing interest in recent years. As a nice relief of "curse of dimensionality" induced by large dimension of covariates, additive regression model is commonly used. However, in locally stationary context, to catch the dynamic nature of regression function, we adopt a flexible varying-coefficient additive model where the regression function has the form $\alpha_{0}\left(u\right)+\sum_{k=1}^{p}\alpha_{k}\left(u\right)\beta_{k}\left(x_{k}\right).$ For this model, we propose a three-step spline estimation method for each univariate nonparametric function, and show its consistency and $L_{2}$ rate of convergence. Furthermore, based upon the three-step estimators, we develop a two-stage penalty procedure to identify pure additive terms and varying-coefficient terms in varying-coefficient additive model. As expected, we demonstrate that the proposed identification procedure is consistent, and the penalized estimators achieve the same $L_{2}$ rate of convergence as the polynomial spline estimators. Simulation studies are presented to illustrate the finite sample performance of the proposed three-step spline estimation method and two-stage model selection procedure.
  • Article
    Full-text available
    We consider a single-index quantile regression model for longitudinal data. Based on generalized estimating equations, an estimation procedure is proposed by taking into account the correlation within subject. Under mild assumptions, we derive the convergence rate of the estimator of the unknown link function and the asymptotic normality of estimator of the index parameter using the “projection” technique. Since the estimating equations are non-continuous, we further adopt the smoothing approach and show that estimators obtained from the smoothed estimating equations are asymptotically equivalent to that from the unsmoothed estimating equations. It is also shown that the estimator is more efficient when the correlation is correctly specified. Finally, we present numerical examples including simulations and analysis of a lung function data.
  • Article
    Full-text available
    In modern experiments, functional and nonfunctional data are often encountered simultaneously when observations are sampled from random processes and high-dimensional scalar covariates. It is difficult to apply existing methods for model selection and estimation. We propose a new class of partially functional linear models to characterize the regression between a scalar response and covariates of both functional and scalar types. The new approach provides a unified and flexible framework that simultaneously takes into account multiple functional and ultrahigh-dimensional scalar predictors, enables us to identify important features, and offers improved interpretability of the estimators. The underlying processes of the functional predictors are considered to be infinite-dimensional, and one of our contributions is to characterize the effects of regularization on the resulting estimators. We establish the consistency and oracle properties of the proposed method under mild conditions, demonstrate its performance with simulation studies, and illustrate its application using air pollution data.
  • Article
    Full-text available
    Motivated by the Singapore Longitudinal Aging Study (SLAS), we propose a Bayesian approach for the estimation of semiparametric varying-coefficient models for longitudinal continuous and cross-sectional binary responses. These models have proved to be more flexible than simple parametric regression models. Our development is a new contribution towards their Bayesian solution, which eases computational complexity. We also consider adapting all kinds of familiar statistical strategies to address the missing data issue in the SLAS. Our simulation results indicate that a Bayesian imputation (BI) approach performs better than complete-case (CC) and available-case (AC) approaches, especially under small sample designs, and may provide more useful results in practice. In the real data analysis for the SLAS, the results for longitudinal outcomes from BI are similar to AC analysis, differing from those with CC analysis.
  • Article
    Single-index varying coefficient model (SIVCM) is a powerful tool for modelling nonlinearity in multivariate estimation, and has been widely used in the literature due to the fact that it can overcome the well-known phenomenon of “curse-of-dimensionality”. In this paper, we consider the problem of model detection and estimation for SIVCM. Based on the proposed combined penalization procedure, we can identify the true model structure consistently, and obtain a new semiparametric model—partially linear single-index varying coefficient model (PLSIVCM). Under the appropriate conditions, we demonstrate that the proposed penalized estimators of parametric and nonparametric components of PLSIVCM are consistent, but their asymptotic distributions are not available. Hence, we extend the minimum average variance estimation method to PLSIVCM, and establish the asymptotic normality for the refined estimators of index parameters, constant coefficients and varying coefficient functions, respectively. The finite sample performances of the proposed methods are illustrated by a Monte Carlo simulation study and the real data analysis.
  • Article
    Full-text available
    We propose a bivariate quantile regression method for the bivariate varying coefficient model through a directional approach. The varying coefficients are approximated by the B-spline basis and an $L_{2}$ type penalty is imposed to achieve desired smoothness. We develop a multistage estimation procedure based the Propagation-Separation~(PS) approach to borrow information from nearby directions. The PS method is capable of handling the computational complexity raised by simultaneously considering multiple directions to efficiently estimate varying coefficients while guaranteeing certain smoothness along directions. We reformulate the optimization problem and solve it by the Alternating Direction Method of Multipliers~(ADMM), which is implemented using R while the core is written in C to speed it up. Simulation studies are conducted to confirm the finite sample performance of our proposed method. A real data on Diffusion Tensor Imaging~(DTI) properties from a clinical study on neurodevelopment is analyzed.
  • Article
    Full-text available
    Semiparametric model is a kind of important mathematical modeling method of high dimensional biological Big Data in human health and disease. In this paper, we develop a M-type variable selection method based on Laplace Error Penalty (LEP) function for a class of high dimensional semiparametric models using a shrinkage idea. The proposed procedure can simultaneously select significant covariates with functional coefficients and local significant variables with parametric coefficients. The Laplace Error Penalty (LEP) function is constructed as an exponential function with two tuning parameters and is infinitely differentiable everywhere except at the origin. So the LEP oracle estimator can be easily obtained. We also proposed the computational algorithm in order to adapt to our method. Moreover, due to the robustness of the M-type loss function to outliers in the finite samples, our proposed variable selection method is more robust than the ones based on the least squares criterion. Finally, the method is illustrated with numerical simulations.
  • Preprint
    We study the parameter estimation problem for a single-index varying coefficient model in high dimensions. Unlike the most existing works that simultaneously estimate the parameters and link functions, based on the generalized Stein's identity, we propose computationally efficient estimators for the high dimensional parameters without estimating the link functions. We consider two different setups where we either estimate each sparse parameter vector individually or estimate the parameters simultaneously as a sparse or low-rank matrix. For all these cases, our estimators are shown to achieve optimal statistical rates of convergence (up to logarithmic terms in the low-rank setting). Moreover, throughout our analysis, we only require the covariate to satisfy certain moment conditions, which is significantly weaker than the Gaussian or elliptically symmetric assumptions that are commonly made in the existing literature. Finally, we conduct extensive numerical experiments to corroborate the theoretical results.
  • Article
    This article establishes a functional coefficient moving average model (FMA) that allows the coefficient of the classical moving average model to adapt with a covariate. The functional Coefficient is identified as a ratio of two conditional moments. A local linear estimation technique is used for estimation and the asymptotic properties of the resulting estimator are investigated. Its convergence rate depends on whether the underlying function reaches its boundary or not, and the asymptotic distribution can be nonstandard. A model specification test in the spirit of Handle-Mammen (1993) is developed to check the stability of the functional coefficient. Simulations have been conducted to study the finite sample performance of our proposed estimator, and the size and the power of the test. Application is made to CPI data from the China Mainland and to German egg prices to show the efficacy of FMA.
  • Article
    The article considers nonparametric inference for quantile regression models with time-varying coefficients. The errors and covariates of the regression are assumed to belong to a general class of locally stationary processes and are allowed to be cross-dependent. Simultaneous confidence tubes (SCTs) and integrated squared difference tests (ISDTs) are proposed for simultaneous nonparametric inference of the latter models with asymptotically correct coverage probabilities and Type I error rates. Our methodologies are shown to possess certain asymptotically optimal properties. Furthermore, we propose an information criterion that performs consistent model selection for nonparametric quantile regression models of non stationary time series. For implementation, a wild bootstrap procedure is proposed, which is shown to be robust to the dependent and nonstationary data structure. Our method is applied to studying the asymmetric and time-varying dynamic structures of the U.S. unemployment rate since the 1940s. Supplementary materials for this article are available online.
  • Article
    Estimation of high dimensional covariance matrices is an interesting and important research topic. In this paper, we propose a dynamic structure and develop an estimation procedure for high dimensional covariance matrices. Asymptotic properties are derived to justify the estimation procedure and simulation studies are conducted to demonstrate its performance when the sample size is finite. By exploring a financial application, an empirical study shows that portfolio allocation based on dynamic high dimensional covariance matrices can significantly outperform the market from 1995 to 2014. Our proposed method also outperforms portfolio allocation based on the sample covariance matrix and the portfolio allocation proposed in Fan, Fan and Lv (2008).
  • Article
    Full-text available
    In this paper, we propose a new full iteration estimation method for quantile regression (QR) of the single-index model (SIM). The asymptotic properties of the proposed estimator are derived. Furthermore, we propose a variable selection procedure for the QR of SIM by combining the estimation method with the adaptive LASSO penalized method to get sparse estimation of the index parameter. The oracle properties of the variable selection method are established. Simulations with various non-normal errors are conducted to demonstrate the finite sample performance of the estimation method and the variable selection procedure. Furthermore, we illustrate the proposed method by analyzing a real data set.
  • Article
    In this article, motivated by an analysis of the monthly number of tourists visiting Hawaii, we propose a new class of nonparametric seasonal time series models under the framework of the functional coefficient model. The coefficients change over time and consist of the trend and seasonal components to characterize seasonality. A local linear approach is developed to estimate the nonparametric trend and seasonal effect functions. The consistency of the proposed estimators is obtained without specifying the error distribution and the asymptotic normality of the proposed estimators is established under the \(\alpha \)-mixing conditions. A consistent estimator of the asymptotic variance is also provided. The proposed methodologies are illustrated by two simulated examples and the model is applied to characterizing the seasonality of the monthly number of tourists visiting Hawaii.
  • Article
    We consider model (variable) selection in a semi-parametric time series model with functional coefficients. Variable selection in the semi-parametric model must account for the fact that the parametric part of the model is estimated at a faster convergence rate than the nonparametric component. Our variable selection procedures employ a smoothly clipped absolute deviation penalty function and consist of two steps. The first is to select covariates with functional coefficients that enter in the semi-parametric model. Then, we perform variable selection for variables with parametric coefficients. The asymptotic properties, such as consistency, sparsity and the oracle property of these two-step estimators are established. A Monte Carlo simulation study is conducted to examine the finite sample performance of the proposed estimators and variable selection procedures. Finally, an empirical example exploring the predictability of asset returns demonstrates the practical application of the proposed functional index coefficient autoregressive models and variable selection procedures.
  • Article
    This paper proposes using a functional coeffie�cient regression technique to estimate time-varying betas and alpha in the conditional capital asset pricing model (CAPM). Functional coeffi�cient representation relaxes the strict assumptions regarding the structure of betas and alpha by combining the predictors into an index. Appropriate index variables are selected by applying the smoothly clipped absolute deviation penalty. In such a way, estimation and variable selection can be done simultaneously. Based on the empirical studies, the proposed model performs better than the alternatives in explaining asset returns and we find no strong evidence to reject the conditional CAPM.
  • Article
    It has been a long history of using interactions in regression analysis to investigate alterations in covariate-effects on response variables. In this article, we aim to address two kinds of new challenges arising from the inclusion of such high-order effects in the regression model for complex data. The first kind concerns a situation where interaction effects of individual covariates are weak but those of combined covariates are strong, and the other kind pertains to the presence of nonlinear interactive effects directed by low-effect covariates. We propose a new class of semiparametric models with varying index coefficients, which enables us to model and assess nonlinear interaction effects between grouped covariates on the response variable. As a result, most of the existing semiparametric regression models are special cases of our proposed models. We develop a numerically stable and computationally fast estimation procedure using both profile least squares method and local fitting. We establish both estimation consistency and asymptotic normality for the proposed estimators of index coefficients as well as the oracle property for the nonparametric function estimator. In addition, a generalized likelihood ratio test is provided to test for the existence of interaction effects or the existence of nonlinear interaction effects. Our models and estimation methods are illustrated by simulation studies, and by an analysis of child growth data to evaluate alterations in growth rates incurred by mother’s exposures to endocrine disrupting compounds during pregnancy. Supplementary materials for this article are available online.
  • Article
    Nonparametric estimation of probability density functions, both marginal and joint densities, is a very useful tool in statistics. The kernel method is popular and applicable to dependent data, including time series and spatial data. But at least for the joint density, one has had to assume that data are observed at regular time intervals or on a regular grid in space. Though this is not very restrictive in the time series case, it often is in the spatial case. In fact, to a large degree it has precluded applications of nonparametric methods to spatial data because such data often are irregularly positioned over space. In this article, we propose nonparametric kernel estimators for both the marginal and in particular the joint probability density functions for nongridded spatial data. Large sample distributions of the proposed estimators are established under mild conditions, and a new framework of expanding-domain infill asymptotics is suggested to overcome the shortcomings of spatial asymptotics in the existing literature. A practical, reasonable selection of the bandwidths on the basis of cross-validation is also proposed. We demonstrate by both simulations and real data examples of moderate sample size that the proposed methodology is effective and useful in uncovering nonlinear spatial dependence for general, including non-Gaussian, distributions. Supplementary materials for this article are available online.
  • Article
    The varying-coefficient single-index model (VCSIM) with back-fitting algorithm is very important in statistical modeling, but inferences for single-index functions of the model have not been very well developed. There are also few tools available to answer such a frequently asked question, such as whether the relationship between Y and X is nonlinear. In order to address this issue, we extend the generalized likelihood ratio (GLR) tests to the model, using the estimates obtained by a local linear method, the backfitting technique. We demonstrate that under the null hypotheses the proposed GLR statistics follow asymptotically a rescaled chi-squared distribution, with the scale constants and the degree of freedom being independent of the nuisance parameters, what is called the Wilks phenomenon. Both simulated and real data examples are used to illustrate our proposed methodology.
  • Article
    In an earlier work, it was shown that the combination of the leave one out Cross Validation criterion introduced for nonparametric mod-els and the leave-n v -out Cross Validation criterion used for linear re-gressors selection has successfully identified the nonlinear and the lin-ear variables of a partially linear model. This technical report deals with the mathematical issues of the proposed two step procedure. Un-der certain regularity conditions we prove analytically the consistency of the regressors estimators.
  • Article
    This chapter reviews the literature on variable selection in nonparametric and semiparametric regression models via shrinkage. We highlight recent developments on simultaneous variable selection and estimation through the methods of least absolute shrinkage and selection operator (Lasso), smoothly clipped absolute deviation (SCAD), or their variants, but restrict our attention to nonparametric and semiparametric regression models. In particular, we consider variable selection in additive models, partially linear models, functional/varying coefficient models, single index models, general nonparametric regression models, and semiparametric/nonparametric quantile regression models.
  • Article
    Full-text available
    Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional and longitudinal studies. An efficient estimation procedure is developed to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. We systematically examine the asymptotic properties of all estimators including the weak convergence of the estimated varying coefficient functions, the asymptotic distribution of the estimated index parameter vectors, and the uniform convergence rate of the estimated covariance function and their spectrum. Simulation studies are carried out to assess the finite-sample performance of the proposed procedure. We apply FVCSIM to investigating the development of white matter diffusivities along the corpus callosum skeleton obtained from Alzheimer’s Disease Neuroimaging Initiative (ADNI) study.
  • Article
    The varying-coefficient single-index model (VCSIM) is a very general and flexible tool for exploring the relationship between a response variable and a set of predictors. Popular special cases include single-index models and varying-coefficient models. In order to estimate the index-coefficient and the non parametric varying-coefficients in the VCSIM, we propose a two-stage composite quantile regression estimation procedure, which integrates the local linear smoothing method and the information of quantile regressions at a number of conditional quantiles of the response variable. We establish the asymptotic properties of the proposed estimators for the index-coefficient and varying-coefficients when the error is heterogeneous. When compared with the existing mean-regression-based estimation method, our simulation results indicate that our proposed method has comparable performance for normal error and is more robust for error with outliers or heavy tail. We illustrate our methodologies with a real example.
  • Article
    We consider an estimating equations approach to parameter estimation in adaptive varying-coefficient linear quantile model. We propose estimating equations for the index vector of the model in which the unknown nonparametric functions are estimated by minimizing the check loss function, resulting in a profiled approach. The estimating equations have a bias-corrected form that makes undersmoothing of the nonparametric part unnecessary. The estimating equations approach makes it possible to obtain the estimates using a simple fixed-point algorithm. We establish asymptotic properties of the estimator using empirical process theory, with additional complication due to the nuisance nonparametric part. The finite sample performance of the new model is illustrated using simulation studies and a forest fire dataset.
  • We demonstrate that analysis of long series of daily returns should take into account potential long-term variation not only in volatility, but also in parameters that describe asymmetry or tail behaviour. However, it is necessary to use a conditional distribution that is flexible enough, allowing for separate modelling of tail asymmetry and skewness, which requires going beyond the skew- t form. Empirical analysis of 60 years of S&P500 daily returns suggests evidence for tail asymmetry (but not for skewness). Moreover, tail thickness and tail asymmetry is not time-invariant. Tail asymmetry became much stronger at the beginning of the Great Moderation period and weakened after 2005, indicating important differences between the 1987 and the 2008 crashes. This is confirmed by our analysis of out-of-sample density forecasting performance (using LPS and CRPS measures) within two recursive expanding-window experiments covering the events. We also demonstrate consequences of accounting for long-term changes in shape features for risk assessment.
  • Article
    This paper provides a new methodology to analyze unobserved heterogeneity when observed characteristics are modeled nonlinearly. The proposed model builds on varying random coefficients (VRC) that are determined by nonlinear functions of observed regressors and additively separable unobservables. This paper proposes a novel estimator of the VRC density based on weighted sieve minimum distance. The main example of sieve bases are Hermite functions which yield a numerically stable estimation procedure. This paper shows inference results that go beyond what has been shown in ordinary RC models: Only estimation of the joint VRC density is affected by ill-posedness but not that of the varying random slope (VRS) density. We provide in each case the optimal rate of convergence and also establish pointwise limit theory of linear functionals, where a prominent example is the density of potential outcomes. In addition, a multiplier bootstrap procedure is proposed to construct uniform confidence bands. A Monte Carlo study examines finite sample properties of the estimator and shows that it performs well even when the regressors associated to RC are far from being heavy tailed. Finally, the methodology is applied to analyze heterogeneity in income elasticity of demand for housing.
  • Article
    We model conditional market beta and alpha as flexible functions of state variables identified via a formal variable-selection procedure. In the post-1963 sample, the beta of the value premium comoves strongly with unemployment, inflation, and the price–earnings ratio in a countercyclical manner. We also uncover a novel nonlinear dependence of alpha on business conditions: It falls sharply and even becomes negative during severe economic downturns but is positive and flat otherwise. The conditional capital asset pricing model (CAPM) performs better than the unconditional CAPM, but this does not fully explain the value premium. Our findings are consistent with a conditional CAPM with rare disasters.
  • Article
    This article focuses on the modeling of nonlinear interactions between the design and operational variables of a system and the multivariate outside environment in predicting the system's performance. We propose a Sparse Partitioned-Regression (SPR) model that automatically searches for a partition of the environmental variables and fits a sparse regression within each subdivision of the partition, in order to fulfill an optimal criterion. Two optimal criteria are proposed, a penalized and a held-out criterion. We study the theoretical properties of SPR by deriving oracle inequalities to quantify the risks of the penalized and held-out criteria in both prediction and classification problems. An efficient recursive partition algorithm is developed for model estimation. Extensive simulation experiments are conducted to demonstrate the better performance of SPR compared with competing methods. Finally, we present an application of using building design and operational variables, outdoor environmental variables, and their interactions to predict energy consumption based on the Department of Energy's EnergyPlus data sets. SPR produces a high level of prediction accuracy. The result of the application also provides insights into the design, operation, and management of energy-efficient buildings.
  • Article
    In this paper, the problem of extracting a narrow-band signal in strong chaotic background is considered. A method which in simulation can extract narrow-band signal well is put forward. The proposed method is a mixed model which combines the local linear (LL) model and varying-coefficient regression model (LLVCR). We first use LL model to predict the short-term chaotic signal. Since the varying-coefficient model can fit the narrow-band signal well. We mix them and establish a mixed model to estimate the narrow-band signal in strong chaotic background. For estimating simply and effectively, we develop an efficient algorithm to select and optimize the parameters of LLVCR model those are hard to be exhaustively searched for. In the proposed algorithm, based on the short-term predictability and sensitivity to initial conditions of chaos motion, the minimum fitting error criterion is used as the objective function to get the estimation of parameters of the presented LLVCR model. In addition, the center frequencies can be detected from the fitting error of LL model by using periodogram at first. The simulation results show that LLVCR model and its estimation algorithm have appreciable flexibility to extract the narrow-band signal in different chaotic background [Lorenz, Henon and Mackey-Glass (M-G) equations].
  • Article
    This paper investigates the estimation in a class of single-index varying coefficient regression model when some covariates are contaminated with measurement errors. A bias-corrected least square procedure based on the observed data is proposed. By replacing the nonparametric single index part with a local linear approximation, an iterative algorithm for estimating the index parameter is proposed. More importantly, a special case is identified in which the naive procedure provides consistent estimates for the single index parameters. Large sample properties of the proposed estimators are established. The finite sample performance of the proposed estimators are evaluated by simulation studies.
  • Article
    Large spatial time-series data with complex structures collected at irregularly spaced sampling locations are prevalent in a wide range of applications. However, econometric and statistical methodology for nonlinear modeling and analysis of such data remains rare. A semiparametric nonlinear regression is thus proposed for modelling nonlinear relationship between response and covariates, which is location-based and considers both temporal-lag and spatial-neighbouring effects, allowing data-generating process nonstationary over space (but turned into stationary series along time) while the sampling spatial grids can be irregular. A semiparametric method for estimation is also developed that is computationally feasible and thus enables application in practice. Asymptotic properties of the proposed estimators are established while numerical simulations are carried for comparisons between estimates before and after spatial smoothing. Empirical application to investigation of housing prices in relation to interest rates in the United States is demonstrated, with a nonlinear threshold structure identified.
  • Article
    In this paper, we propose a composite minimizing average check loss estimation procedure for composite quantile regression (CQR) in the single-index coefficient model (SICM). The asymptotic normalities of the proposed estimators are established, and the asymptotic relative efficiencies (ARE) of the proposed estimators compared with those by least square method are also discussed. We further investigate a variable selection procedure by combining the proposed estimation method with adaptive LASSO penalized method in CQR of SICM. The oracle property of the proposed variable selection method is also established. Simulations with various non-normal errors and one real data application are conducted to assess the finite sample performance of the proposed estimation and variable selection methods.
  • Article
    In this paper, we discuss the estimation of varying coefficient models based on censored data by wavelet technique when the survival and the censoring times are from a stationary -mixing sequence. For the wavelet estimator of varying coefficient functions, the strong uniform convergence rate is derived and the asymptotic normality is established under the mild conditions. The strong uniform convergence rate we obtained is comparable with the optimal convergence rate of the nonparametric estimation in nonparametric models.
  • Article
    We consider conditional quantile estimation in functional index coefficient models for time series data, using regression splines, which gives more complete information on the conditional distribution than the conditional mean model. An important technical aim is to demonstrate the faster rate and asymptotic normality of the parametric part, which is achieved through an orthogonalization approach. For this class of very flexible models, variable selection is an important problem. We use smoothly clipped absolute deviation (SCAD) penalty to select either the covariates with functional coefficients, or covariates that enter the index, or both. We establish the oracle property of the penalization method under strongly mixing ( -mixing) conditions. Simulations are carried out to investigate the finite-sample performance of estimation and variable selection. A real data analysis is reported to demonstrate the application of the proposed methods.
  • Article
    Motivated from problems in canonical correlation analysis, reduced rank regression and sufficient dimension reduction, we introduce a double dimension reduction model where a single index of the multivariate response is linked to the multivariate covariate through a single index of these covariates, hence the name double single index model. Because nonlinear association between two sets of multivariate variables can be arbitrarily complex and even intractable in general, we aim at seeking a principal one-dimensional association structure where a response index is fully characterized by a single predictor index. The functional relation between the two single-indices is left unspecified, allowing flexible exploration of any potential nonlinear association. We argue that such double single index association is meaningful and easy to interpret, and the rest of the multi-dimensional dependence structure can be treated as nuisance in model estimation. We investigate the estimation and inference of both indices and the regression function, and derive the asymptotic properties of our procedure. We illustrate the numerical performance in finite samples and demonstrate the usefulness of the modelling and estimation procedure in a multi-covariate multi-response problem concerning concrete.
  • Article
    In this paper, a minimizing average check loss estimation (MACLE) procedure is proposed for the single-index coefficient model (SICM) in the framework of quantile regression (QR). The resulting estimators have the asymptotic normality and achieve the best convergence rate. Furthermore, a variable selection method is investigated for the QRSICM by combining MACLE method with the adaptive LASSO penalty, and we also established the oracle property of the proposed variable selection method. Extensive simulations are conducted to assess the finite sample performance of the proposed estimation and variable selection procedure under various error settings. Finally, we present a real-data application of the proposed approach.
  • Article
    In this paper, the problems of blind detection and estimation of harmonic signal in strong chaotic background are analyzed, and new methods by using local linear (LL) model are put forward. The LL model has been exhaustively researched and successfully applied for fitting and forecasting chaotic signal in many chaotic fields. We enlarge the modeling capacity substantially. Firstly, we can predict the short-term chaotic signal and obtain the fitting error based on the LL model. Then we detect the frequencies from the fitting error by periodogram, a property on the fitting error is proposed which has not been addressed before, and this property ensures that the detected frequencies are similar to that of harmonic signal. Secondly, we establish a two-layer LL model to estimate the determinate harmonic signal in strong chaotic background. To estimate this simply and effectively, we develop an efficient backfitting algorithm to select and optimize the parameters that are hard to be exhaustively searched for. In the method, based on sensitivity to initial value of chaos motion, the minimum fitting error criterion is used as the objective function to get the estimation of the parameters of the two-layer LL model. Simulation shows that the two-layer LL model and its estimation technique have appreciable flexibility to model the determinate harmonic signal in different chaotic backgrounds (Lorenz, Henon and Mackey–Glass (M–G) equations). Specifically, the harmonic signal can be extracted well with low SNR and the developed background algorithm satisfies the condition of convergence in repeated 3–5 times.
  • Article
    The paper is concerned with the estimation of a time-varying coefficient time series model, which is used to characterize the nonlinearity and trending phenomenon. We develop the wavelet procedures to estimate the coefficient functions and the error variance. We establish asymptotic properties of the proposed wavelet estimators under the α-mixing conditions and without specifying the error distribution. These results can be used to make asymptotically valid statistical inference. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.
  • Article
    In the last two decades, regularization techniques, in particular penalty-based methods, have become very popular in statistical modelling. Driven by technological developments, most approaches have been designed for high-dimensional problems with metric variables, whereas categorical data has largely been neglected. In recent years, however, it has become clear that regularization is also very promising when modelling categorical data. A specific trait of categorical data is that many parameters are typically needed to model the underlying structure. This results in complex estimation problems that call for structured penalties which are tailored to the categorical nature of the data. This article gives a systematic overview of penalty-based methods for categorical data developed so far and highlights some issues where further research is needed. We deal with categorical predictors as well as models for categorical response variables. The primary interest of this article is to give insight into basic properties of and differences between methods that are important with respect to statistical modelling in practice, without going into technical details or extensive discussion of asymptotic properties.
  • Article
    The construction of novel sufficient dimension folding methods for analyzing matrix-valued data is considered. For a matrix-valued predictor, traditional dimension reduction methods fail to preserve the matrix structure. However, dimension folding methods can preserve the data structure and improve estimation accuracy. Folded-outer product of gradient (folded-OPG) ensemble estimator and two refined estimators, folded-minimum average variance estimation (folded-MAVE) ensemble and folded-sliced regression (folded-SR) ensemble are proposed to recover central dimension folding subspace (CDFS). Due to ensemble idea, estimation accuracies are improved for finite samples by repeatedly using the data. A modified cross validation method is used to determine the structural dimensions of CDFS. Simulated examples demonstrate the performance of folded ensemble methods by comparing with existing inverse dimension folding methods. The efficacy of folded-MAVE ensemble method is also evaluated by comparing with inverse dimension folding methods for analyzing the Standard & Poor’s 500 stock data set.
  • Article
    We propose a generalization of the varying coefficient model for longitudinal data to cases where not only current but also recent past values of the predictor process affect current response. More precisely, the targeted regression coefficient functions of the proposed model have sliding window supports around current time t. A variant of a recently proposed two-step estimation method for varying coefficient models is proposed for estimation in the context of these generalized varying coefficient models, and is found to lead to improvements, especially for the case of additive measurement errors in both response and predictors. The proposed methodology for estimation and inference is also applicable for the case of additive measurement error in the common versions of varying coefficient models that relate only current observations of predictor and response processes to each other. Asymptotic distributions of the proposed estimators are derived, and the model is applied to the problem of predicting protein concentrations in a longitudinal study. Simulation studies demonstrate the efficacy of the proposed estimation procedure.
  • Article
    The varying-coefficient model is an attractive alternative to the additive and other models. One important method in estimating the coefficient functions in this model is the local polynomial fitting approach. In this approach, the choice of bandwidth is crucial. If the unknown curve is spatial homogeneous, a constant bandwidth is sufficient. However, for estimating curves with a more complicated structure, a variable bandwidth is needed. The present article focuses on a variable bandwidth selection procedure, and provides the conditional bias and the conditional variance of the estimator, the convergence rate of the bandwidth, and the asymptotic distribution of its error relative to the theoretical optimal variable bandwidth.
  • One-step Huber estimates in linear models rVarying-coefficient Models79 Cai, Z Efficient estimation and inferences for varying-coefficient models
    • P J Bickel
    • J Fan
    • R Li
    Bickel, P. J. (1975) One-step Huber estimates in linear models. J. Am. Statist. Ass., 70, 428–433. rVarying-coefficient Models79 Cai, Z., Fan, J. and Li, R. (2000) Efficient estimation and inferences for varying-coefficient models. J. Am. Statist. Ass., 95, 888–902
  • Article
    Simple “one-step” versions of Huber’s (M) estimates for the linear model are introduced. Some relevant Monte Carlo results obtained in the Princeton project [1] are singled out and discussed. The large sample behavior of these procedures is examined under very mild regularity conditions.
  • Article
    Full-text available
    In this article we investigate a class of single-index coefficient regression models under dependence. This includes many existing models, such as the smooth transition threshold autoregressive (STAR) model of Chan and Tong, the functional-coefficient autoregressive (FAR) model of Chen and Tsay, and the single-index model of Ichimura. Compared to the varying-coefficient model of Hastie and Tibshirani, our model can avoid the curse of dimensionality in multivariate nonparametric estimations. Another advantage of this model is that a threshold variable is chosen automatically. An estimation method is proposed, and the corresponding estimators are shown to be consistent and asymptotically normal. Some simulations and applications are also reported.
  • Article
    Full-text available
    Estimating equations have found wide popularity recently in parametric problems, yielding consistent estimators with asymptotically valid inferences obtained via the sandwich formula. Motivated by a problem in nutritional epidemiology, we use estimating equations to derive nonparametric estimators of a “parameter” depending on a predictor. The nonparametric component is estimated via local polynomials with loess or kernel weighting; asymptotic theory is derived for the latter. In keeping with the estimating equation paradigm, variances of the nonparametric function estimate are estimated using the sandwich method, in an automatic fashion, without the need (typical in the literature) to derive asymptotic formulas and plug-in an estimate of a density function. The same philosophy is used in estimating the bias of the nonparametric function; that is, an empirical method is used without deriving asymptotic theory on a case-by-case basis. The methods are applied to a series of examples. The application to nutrition is called “nonparametric calibration” after the term used for studies in that field. Other applications include local polynomial regression for generalized linear models, robust local regression, and local transformations in a latent variable model. Extensions to partially parametric models are discussed.
  • Article
    Full-text available
    In this paper we investigate the estimation and testing of the functional coefficient linear models under dependence, which includes the functional coefficient autoregressive model of Chen and Tsay (1993). We use local linear smoothing to estimate the coefficient functions of a functional-coefficient linear model, prove their uniform consistency, and derive their asymptotic distributions in terms of Gaussian processes. From these distributions we can obtain some tests about coefficient functions and the model. Some simulations and a study of real data are reported.
  • Article
    Smoothing splines are well known to provide nice curves which smooth discrete, noisy data. We obtain a practical, effective method for estimating the optimum amount of smoothing from the data. Derivatives can be estimated from the data by differentiating the resulting (nearly) optimally smoothed spline. We consider the model yi(ti)+εi, i=1, 2, ..., n, ti∈[0, 1], where g∈W2(m)={f:f, f′, ..., f(m-1) abs. cont., f(m)∈ℒ2[0,1]}, and the {εi} are random errors with Eεi=0, Eεiεj=σ2δij. The error variance σ2 may be unknown. As an estimate of g we take the solution gn, λ to the problem: Find f∈W2(m) to minimize {Mathematical expression}. The function gn, λ is a smoothing polynomial spline of degree 2 m-1. The parameter λ controls the tradeoff between the "roughness" of the solution, as measured by {Mathematical expression}, and the infidelity to the data as measured by {Mathematical expression}, and so governs the average square error R(λ; g)=R(λ) defined by {Mathematical expression}. We provide an estimate {Mathematical expression}, called the generalized cross-validation estimate, for the minimizer of R(λ). The estimate {Mathematical expression} is the minimizer of V(λ) defined by {Mathematical expression}, where y=(y1, ..., yn)t and A(λ) is the n×n matrix satisfying (gn, λ (t1), ..., gn, λ (tn))t=A (λ) y. We prove that there exist a sequence of minimizers {Mathematical expression} of EV(λ), such that as the (regular) mesh {ti}i=1n becomes finer, {Mathematical expression}. A Monte Carlo experiment with several smooth g's was tried with m=2, n=50 and several values of σ2, and typical values of {Mathematical expression} were found to be in the range 1.01-1.4. The derivative g′ of g can be estimated by {Mathematical expression}. In the Monte Carlo examples tried, the minimizer of {Mathematical expression} tended to be close to the minimizer of R(λ), so that {Mathematical expression} was also a good value of the smoothing parameter for estimating the derivative.
  • Article
    In this article we propose a new class of models for nonlinear time series analysis, investigate properties of the proposed model, and suggest a modeling procedure for building such a model. The proposed modeling procedure makes use of ideas from both parametric and nonparametric statistics. A consistency result is given to support the procedure. For illustration we apply the proposed model and procedure to several data sets and show that the resulting models substantially improve postsample multi-step ahead forecasts over other models.
  • Article
    Fitting local polynomials in nonparametric regression has a number of advantages. The attractive theoretical features are in a partial contradiction to variance properties for random design and to practical experience over a broad range of situations. No upper bound can be given for the conditional variance. The unconditional variance is infinite when using optimal weights with compact support. Properties are better for Gaussian weights. We analyze local polynomials for finite sample size, both theoretically and numerically. It turns out that difficulties arise in sparse regions in the realization of the design, when the realization has locally a small variance and/or a skew empirical distribution. Two small-sample modifications of local polynomials are presented: local increase of bandwidth in sparse regions of the design, and local polynomial ridge regression. Both modifications combine a good finite-sample behavior with the asymptotic advantages of local polynomials.
  • Article
    We consider the estimation of the k + 1-dimensional nonparametric component β(t) of the varying-coefficient model Y(t) = X(t)β(t) + ε(t) based on longitudinal observations (Yij, Xi(tij), tij), i = 1, …, n, j = 1, …, ni, where tij is the jth observed design time point t of the ith subject and Yij and Xi(tij) are the real-valued outcome and R valued covariate vectors of the ith subject at tij. The subjects are independently selected, but the repeated measurements within subject are possibly correlated. Asymptotic distributions are established for a kernel estimate of β(t) that minimizes a local least squares criterion. These asymptotic distributions are used to construct a class of approximate pointwise and simultaneous confidence regions for β(t). Applying these methods to an epidemiological study, we show that our procedures are useful for predicting CD4 (T-helper lymphocytes) cell changes among HIV (human immunodeficiency virus)-infected persons. The finite-sample properties of our procedures are studied through Monte Carlo simulations.
  • Article
    Average derivative functionals of regression are proposed for nonparametric model selection and diagnostics. The functionals are of the integral type, which under certain conditions allows their estimation at the usual parametric rate of n . We analyze asymptotic properties of the estimators of these functionals, based on kernel regression. These estimators can then be used for assessing the validity of various restrictions imposed on the form of regression. In particular, we show how they could be used to reduce the dimensionality of the model, assess the relative importance of predictors, measure the extent of nonlinearity and nonadditivity, and, under certain conditions, help identify projection directions in projection pursuit models and decide on the number of these directions.
  • Article
    Full-text available
    Across the boreal forest of North America, lynx populations undergo 10-year cycles. Analysis of 21 time series from 1821 to the present demonstrates that these fluctuations are generated by nonlinear processes with regulatory delays. Trophic interactions between lynx and hares cause delayed density-dependent regulation of lynx population growth. The nonlinearity, in contrast, appears to arise from phase dependencies in hunting success by lynx through the cycle. Using a combined approach of empirical, statistical, and mathematical modeling, we highlight how shifts in trophic interactions between the lynx and the hare generate the nonlinear process primarily by shifting functional response curves during the increase and the decrease phases.
  • Chapter
    Reviews ways of modelling two interacting populations - either predator-prey, competition, or mutualism.
  • Article
    An improved AIC-based criterion is derived for model selection in general smoothing-basedmodeling, including semiparametric models and additive models. Examples areprovided of applications to goodness-of-fit, smoothing parameter and variable selectionin an additive model and semiparametric models, and variable selection in a model witha nonlinear function of linear terms.
  • Local quasi-likelihood estimation is a useful extension of local least squares methods, but its computational cost and algorithmic convergence problems make the procedure less appealing, particularly when it is iteratively used in methods such as the back-fitting algorithm, cross-validation and bootstrapping. A one-step local quasi-likelihood estimator is introduced to overcome the computational drawbacks of the local quasi-likelihood method. We demonstrate that as long as the initial estimators are reasonably good, the one-step estimator has the same asymptotic behaviour as the local quasi-likelihood method. Our simulation shows that the one-step estimator performs at least as well as the local quasi-likelihood method for a wide range of choices of bandwidths. A data-driven bandwidth selector is proposed for the one-step estimator based on the pre-asymptotic substitution method of Fan and Gijbels. It is then demonstrated that the data-driven one-step local quasi-likelihood estimator performs as well as the maximum local quasi-likelihood estimator by using the ideal optimal bandwidth.
  • Many different methods have been proposed to construct nonparametric estimates of a smooth regression function, including local polynomial, (convolution) kernel and smoothing spline estimators. Each of these estimators uses a smoothing parameter to control the amount of smoothing performed on a given data set. In this paper an improved version of a criterion based on the Akaike information criterion (AIC), termed AICC, is derived and examined as a way to choose the smoothing parameter. Unlike plug-in methods, AICC can be used to choose smoothing parameters for any linear smoother, including local quadratic and smoothing spline estimators. The use of AICC avoids the large variability and tendency to undersmooth (compared with the actual minimizer of average squared error) seen when other ‘classical’ approaches (such as generalized cross-validation (GCV) or the AIC) are used to choose the smoothing parameter. Monte Carlo simulations demonstrate that the AICC-based smoothing parameter is competitive with a plug-in method (assuming that one exists) when the plug-in method works well but also performs well when the plug-in approach fails or is unavailable.
  • Article
    Full-text available
    A data-based local bandwidth selector is proposed for nonparametric regression by local fitting of polynomials. The estimator, called the empirical-bias bandwidth selector (EBBS), is lather simple and easily allows multivariate predictor variables and estimation of any order derivative of the regression function. EBBS minimizes an estimate of mean squared error consisting of a squared bias term plus a variance term. The variance term used is exact, not asymptotic, though it involves the conditional variance of the response given the predictors that must be estimated. The bias term is estimated empirically, not from an asymptotic expression. Thus EBBS is similar to the ''double smoothing'' approach of Hardle, Hall, and Marron and a local bandwidth selector of Schucany, but is developed here for a far wider class of estimation problems than what those authors considered. EBBS is tested on simulated data, and its performance seems quite satisfactory. Local polynomial smoothing of a histogram is a highly effective technique for density estimation, and several of the examples involve density estimation by EBBS applied,to binned data.
  • Article
    Varying coefficient models are useful extensions of the classical linear models. Under the condition that the coefficient functions possess about the same degrees of smoothness, the model can easily be estimated via simple local regression. This leads to the one-step estimation procedure. In this paper, we consider a semivarying coefficient model which is an extension of the varying coefficient model, which is called the semivarying-coefficient model. Procedures for estimation of the linear part and the nonparametric part are developed and their associated statistical properties are studied. The proposed methods are illustrated by some simulation studies and a real example.
  • Article
    Regression analysis is one of the most commonly used techniques in statistics. When the dimension of independent variables is high, it is difficult to conduct efficient non-parametric analysis straightforwardly from the data. As an important alternative to the additive and other non-parametric models, varying-coefficient models can reduce the modelling bias and avoid the "curse of dimensionality" significantly. In addition, the coefficient functions can easily be estimated via a simple local regression. Based on local polynomial techniques, we provide the asymptotic distribution for the maximum of the normalized deviations of the estimated coefficient functions away from the true coefficient functions. Using this result and the pre-asymptotic substitution idea for estimating biases and variances, simultaneous confidence bands for the underlying coefficient functions are constructed. An important question in the varying coefficient models is whether an estimated coefficient function is statistically significantly different from zero or a constant. Based on newly derived asymptotic theory, a formal procedure is proposed for testing whether a particular parametric form fits a given data set. Simulated and real-data examples are used to illustrate our techniques.
  • Article
    There is reliable evidence that simple rules used by traders have some predictive value over the future movement of foreign exchange prices. This paper will review some of this evidence and discuss the economic magnitude of this predictability. The profitability of these trading rules will then be analyzed in connection with central bank activity using intervention data from the Federal Reserve. The objective is to find out to what extent foreign exchange predictability can be confined to periods of central bank activity in the foreign exchange market. The results indicate that after removing periods in which the Federal Reserve is active, exchange rate predictability is dramatically reduced.
  • Article
    For the class of single-index models, I construct a semiparametric estimator of coefficients up to a multiplicative constant that exhibits -consistency and asymptotic normality. This class of models includes censored and truncated Tobit models, binary choice models, and duration models with unobserved individual heterogeneity and random censoring. I also investigate a weighting scheme that achieves the semiparametric efficiency bound.
  • Article
    In this monograph we have considered a class of autoregressive models whose coefficients are random. The models have special appeal among the non-linear models so far considered in the statistical literature, in that their analysis is quite tractable. It has been possible to find conditions for stationarity and stability, to derive estimates of the unknown parameters, to establish asymptotic properties of these estimates and to obtain tests of certain hypotheses of interest.
  • Article
    Single-index models generalize linear regression. They have applications to a variety of fields, such as discrete choice analysis in econometrics and dose response models in biometrics, where high-dimensional regression models are often employed. Single-index models are similar to the first step of projection pursuit regression, a dimension-reduction method. In both cases the orientation vector can be estimated root-n consistently, even if the unknown univariate function (or nonparametric link function) is assumed to come from a large smoothness class. However, as we show in the present paper, the similarities end there. In particular, the amount of smoothing necessary for root-n consistent orientation estimation is very different in the two cases. We suggest a simple, empirical rule for selecting the bandwidth appropriate to single-index models. This rule is studies in a small simulation study and an application in binary response models.
  • Article
    Full-text available
    This paper considers nonparametric estimation in a varying coefficient model with repeated measurements ( Y ij , X ij , t ij ), for i = 1,…, n i and j = 1,…, n i , where X ij =( X ijo ,…, x ijk )T and ( Y ij , X ij , t ij ) denote the j th outcome, covariate and time design points, respectively, of the i th subject. The model considered here is Y ij = X ij Tβ( t ij ) + + i ( t ij ), where β( t ) = (β 0 ( t ), …, Bk( t ))T, for k ≧ 0, are smooth nonparametric functions of interest and ϵ i ( t ) is a zero-mean stochastic process. The measurements are assumed to be independent for different subjects but can be correlated at different time points within each subject. Two nonparametric estimators of β( t ), namely a smoothing spline and a locally weighted polynomial, are derived for such repeatedly measured data. A crossvalidation criterion is proposed for the selection of the corresponding smoothing parameters. Asymptotic properties, such as consistency, rates of convergence and asymptotic mean squared errors, are established for kernel estimators, a special case of the local polynomials. These asymptotic results give useful insights into the reliability of our general estimation methods. An example of predicting the growth of children born to HIV infected mothers based on gender, HIV status and maternal vitamin A levels shows that this model and the corresponding nonparametric estimators are useful in epidemiological studies.
  • Article
    The paper motivates the use of varying coefficient models for diagnostics in regression models with continuous and factorial covariates. The varying coefficient model, which is restricted only by assumptions concerning the smoothness of the effects, is considered as the alternative to a parametric generalised linear model. Estimation is based on local likelihood smoothing and asymptotic properties of the estimators are derived. For the investigation of the discrepancy between the parametric model and the smooth alternative a simple graphical procedure is proposed. Moreover, tests are derived which allow for overall and componentwise investigation. The componentwise tests may be used to examine interaction effects of continuous and factorial regressors. Keywords: Generalised linear model; Local likelihood; Model diagnostic; Varying coefficient model.
  • Article
    It is often documented, based on autocorrelation, variance ratio, and power spectrum, that exchange rates approximately follow a martingale process. Because these data check serial uncorrelatedness rather than martingale difference, they may deliver misleading conclusions in favor of the martingale hypothesis when the test statistics are insignificant. In this paper, we explore whether there exists a gap between serial uncorrelatedness and martingale difference for exchange rate changes, and if so, whether nonlinear time series models admissible in the gap can outperform the martingale model in out-of-sample forecasts. Applying the generalized spectral tests of Hong to five major currencies, we find that the changes of exchange rates are often serially uncorrelated, but there exists strong nonlinearity in conditional mean, in addition to the well-known volatility clustering. To forecast the conditional mean, we consider the linear autoregressive, autoregressive polynomial, artificial neural network, and functional-coefficient models, as well as their combination. The functional coefficient model allows the autoregressive coefficients to depend on investment positions via a moving-average technical trading rule. We evaluate out-of-sample forecasts of these models relative to the martingale model, using four criteria-the mean squared forecast error, the mean absolute forecast error, the mean forecast trading return, and the mean correct forecast direction. White's reality check method is used to avoid data-snooping bias. It is found that suitable nonlinear models, particularly in combination, do have superior predictive ability over the martingale model for some currencies in terms of certain forecast evaluation criteria. Copyright (c) 2003 President and Fellows of Harvard College and the Massachusetts Institute of Technology.
  • Article
    Full-text available
    This paper performs tests on several different foreign exchange series using a methodology inspired by technical trading rules. Moving average based rules are used as specification tests on the process for foreign exchange rates. Several models for regime shifts and persistent trends are simulated and compared with results from the actual series. The results show that these simple models can not capture some aspects of the series studied. Finally, the economic significance of the trading rule results are tested. Returns distributions from the trading rules are compared with returns on risk free assets and returns from the U.S. stock market.
  • Article
    Full-text available
    The typical generalized linear model for a regression of a response Y on predictors (X,Z) has conditional mean function based upon a linear combination of (X,Z). We generalize these models to have a nonparametric component, replacing the linear combination $\alpha_0^T X + \beta_0^T Z$ by $\eta_0(\alpha_0^T X) + \beta_0^T Z$, where $\eta_0(.)$ is an unknown function. These are the {\it generalized partially linear single-index models}. The models also generalize the ``single-index'' models, which have $\beta_03D0$. Using local linear methods, estimates of the unknown parameters $(\alpha_0,\beta_0)$ and the unknown function $\eta_0(.)$ are proposed, and their asymptotic distributions obtained. An example illustrates the algorithms and the models.
  • Article
    Full-text available
    The snowshoe hare and the Canadian lynx in the boreal forests of North America show 9- to 11-year density cycles. These are generally assumed to be linked to each other because lynx are specialist predators on hares. Based on time series data for hare and lynx, we show that the dominant dimensional structure of the hare series appears to be three whereas that of the lynx is two. The three-dimensional structure of the hare time series is hypothesized to be due to a three-trophic level model in which the hare may be seen as simultaneously regulated from below and above. The plant species in the hare diet appear compensatory to one another, and the predator species may, likewise, be seen as an internally compensatory guild. The lynx time series are, in contrast, consistent with a model of donor control in which their populations are regulated from below by prey availability. Thus our analysis suggests that the classic view of a symmetric hare-lynx interaction is too simplistic. Specifically, we argue that the classic food chain structure is inappropriate: the hare is influenced by many predators other than the lynx, and the lynx is primarily influenced by the snowshoe hare.
  • Article
    Full-text available
    Typically, in many studies in ecology, epidemiology, biomedicine and others, we are confronted with panels of short time-series of which we are interested in obtaining a biologically meaningful grouping. Here, we propose a bootstrap approach to test whether the regression functions or the variances of the error terms in a family of stochastic regression models are the same. Our general setting includes panels of time-series models as a special case. We rigorously justify the use of the test by investigating its asymptotic properties, both theoretically and through simulations. The latter confirm that for finite sample size, bootstrap provides a better approximation than classical asymptotic theory. We then apply the proposed tests to the mink-muskrat data across 81 trapping regions in Canada. Ecologically interpretable groupings are obtained, which serve as a necessary first step before a fuller biological and statistical analysis of the food chain interaction.
  • Article
    Full-text available
    Weighted average derivatives are useful parameters for semiparametric index models and nonparametric demand analysis. This paper gives efficiency results for average derivative estimators, including formulating estimators that have high efficiency. The authors derive the efficiency bound for weighted average derivatives of conditional location functionals, such as the conditional mean and median. They also derive the efficiency bound for semiparametric index models, where the location measure depends on indices or linear combinations of the regressors. The authors derive the efficient weight function when the distribution of the regressors is elliptically symmetric. They also discuss how to combine estimators with different known weight functions to achieve efficiency. Copyright 1993 by The Econometric Society.
  • Article
    Regression analysis is one of the most commonly used techniques in statistics. When the dimension of independent variables is high, it is difficult to conduct efficient nonparametric analysis straightforwards from the data. As an important alternative to the additive and other nonparametric models, varying-coefficient models can reduce the modeling bias and avoid "curse of dimensionality" significantly. In addition, the coefficient functions can easily be estimated via a simple local regression. Based on local polynomial techniques, we provide the asymptotic distribution for the maximum of the normalized deviations of the estimated coefficient functions away from the true coefficient functions. Using this result and the pre-asymptotic substitution idea for estimating biases and variances, simultaneous con dence bands for the underlying coefficient functions are constructed. An important question in the varying coefficient models is if an estimated coefficient function is statistically significantly different from ...
  • Functional linear models are useful in longitudinal data analysis. They include many classical and recently proposed statistical models for longitudinal data and other functional data. Recently, smoothing spline and kernel methods have been proposed for estimating their coefficient functions nonparametrically but these methods are either intensive in computation or inefficient in performance. Toovercome these drawbacks, in this paper, a simple and powerful two-step alternativeis proposed. In particular, the implementation of the proposed approach via local polynomial smoothing is discussed. Methods for estimating standard deviations of estimated coefficient functions are also proposed. Some asymptotic results for the local polynomial estimators are established. Two longitudinal data sets, one of which involves time-dependent covariates, are used to demonstrate the proposed approach. Simulation studies show that our two-step approach improves the kernel method proposed in Hoover, et al...
  • Article
    Full-text available
    Likelihood ratio theory has had tremendous success in parametric inference, due to the fundamental theory of Wilks. Yet, there is no general applicable approach for nonparametric inferences based on function estimation. Maximum likelihood ratio test statistics in general may not exist in nonparametric function estimation setting. Even if they exist, they are hard to find and can not be optimal as shown in this paper. We introduce the generalized likelihood statistics to overcome the drawbacks of nonparametric maximum likelihood ratio statistics. A new Wilks phenomenon is unveiled. We demonstrate that a class of the generalized likelihood statistics based on some appropriate nonparametric estimators are asymptotically distribution free and follow x2-distributions under null hypotheses for a number of useful hypotheses and a variety of useful models including Gaussian white noise models, nonparametric regression models, varying coefficient models and generalized varying coefficient models. We further demonstrate that generalized likelihood ratio statistics are asymptotically optimal in the sense that they achieve optimal rates of convergence given by Ingster. They can even be adaptively optimal in the sense of Spokoiny by using a simple choice of adaptive smoothing parameter. Our work indicates that the generalized likelihood ratio statistics are indeed general and powerful for nonparametric testing problems based on function estimation.
  • Article
    Full-text available
    This paper deals with statistical inferences based on the varying-coefficient models proposed by Hastie and Tibshirani (1993). Local polynomial regression techniques are used to estimate coefficient functions and the asymptotic normality of the resulting estimators is established. The standard error formulas for estimated coefficients are derived and are empirically tested. A goodness-of-fit test technique, based on a nonparametric maximum likelihood ratio type of test, is also proposed to detect whether certain coefficient functions in a varying-coefficient model are constant or whether any covariates are statistically significant in the model. The null distribution of the test is estimated by a conditional bootstrap method. Our estimation techniques involve solving hundreds of local likelihood equations. To reduce computational burden, a onestep Newton-Raphson estimator is proposed and implemented. We show that the resulting one-step procedure can save computational cost in an order of tens without d...
  • Article
    Full-text available
    We apply the local linear regression technique for estimation of functional-coefficient regression models for time series data. The models include threshold autoregressive models (Tong 1990) and functional-coefficient autoregressive models (Chen and Tsay 1993) as special cases but with the added advantages such as depicting finer structure of the underlying dynamics and better post-sample forecasting performance. We have also proposed a new bootstrap test for the goodness of fit of models and a bandwidth selector based on newly defined cross-validatory estimation for the expected forecasting errors. The proposed methodology is data-analytic and is of appreciable flexibility to analyze complex and multivariate nonlinear structures without suffering from the "curse of dimensionality". The asymptotic properties of the proposed estimators are investigated under the ff-mixing condition. Both simulated and real data examples are used for illustration. Keywords: ff-mixing; Asymptotic normalit...
  • Article
    Full-text available
    Varying-coefficient models are a useful extension of the classical linear models. The appeal of these models is that the coefficient functions can easily be estimated via a simple local regression. This yields a simple one-step estimation procedure. Weshow that such a one-step method can not be optimal when different coefficient functions admit different degrees of smoothness. This drawback can be repaired by using our proposed two-step estimation procedure. The asymptotic mean-squared errors for the two-step procedure is obtained and is shown to achieve the optimal rate of convergence. A few simulation studies show that the gain by the two-step procedure can be quite substantial. The methodology is illustrated by an application to an environmental dataset.