Economic modeling in a data-rich environment is often challenging. To allow for enough flexibility and to model heterogeneity, models might have parameters with dimensionality growing with (or even much larger than) the sample size of the data. Learning these high-dimensional parameters requires new methodologies and theories. We consider three important high-dimensional models and propose novel methods for estimation and inference. Empirical applications in economics and finance are also studied.
In Chapter 1, we consider high-dimensional panel data models (large cross sections and long time horizons) with interactive fixed effects and allow the covariate/slope coefficients to vary over time without any restrictions. The parameter of interest is the vector that contains all the covariate effects across time. This vector has dimensionality tending to infinity, potentially much faster than the cross-sectional sample size. We develop methods for the estimation and inference of this high-dimensional vector, i.e., the entire trajectory of time variation in covariate effects. We show that both the consistency of our estimator and the asymptotic accuracy of the proposed inference procedure hold uniformly in time. Our methodology can be applied to several important issues in econometrics, such as constructing confidence bands for the entire path of covariate coefficients across time, testing the time-invariance of slope coefficients and estimation and inference of patterns of time variations, including structural breaks and regime switching. An important feature of our method is that it provides inference procedures for the time variation in pre-specified components of slope coefficients while allowing for arbitrary time variation in other components. Computationally, our procedures do not require any numerical optimization and are very simple to implement. Monte Carlo simulations demonstrate favorable properties of our methods in finite samples. We illustrate our methods through empirical applications in finance and economics.
In Chapter 2, we consider large factor models with unobserved factors. We formalize the notion of common factors between different groups of variables and propose to use it as a general approach to study the structure of factors, i.e., which factors drive which variables. The spanning hypothesis, which states that factors driving one group are spanned by those driving another group, can be studied as a special case under our framework. We develop a statistical procedure for testing the number of common factors. Our inference procedure is built upon recent results on high-dimensional bootstrap and is shown to be valid under the asymptotic framework of large $n$ and large $T$. In Monte Carlo simulations, our procedure performs well in finite samples. As an empirical application, we construct confidence sets for the number of common factors between the macroeconomy and the financial markets.
Chapter 3 is joint work with Jelena Bradic. We propose a methodology for testing linear hypothesis in high-dimensional linear models. The proposed test does not impose any restriction on the size of the model, i.e. model sparsity or the loading vector representing the hypothesis. Providing asymptotically valid methods for testing general linear functions of the regression parameters in high-dimensions is extremely challenging -- especially without making restrictive or unverifiable assumptions on the number of non-zero elements. We propose to test the moment conditions related to the newly designed restructured regression, where the inputs are transformed and augmented features. These new features incorporate the structure of the null hypothesis directly. The test statistics are constructed in such a way that lack of sparsity in the original model parameter does not present a problem for the theoretical justification of our procedures. We establish asymptotically exact control on Type I error without imposing any sparsity assumptions on model parameter or the vector representing the linear hypothesis. Our method is also shown to achieve certain optimality in detecting deviations from the null hypothesis. We demonstrate the favorable finite-sample performance of the proposed methods, via a number of numerical and a real data example.