Large-Scale Inference of Multivariate Regression for Heavy-Tailed and Asymmetric Data

July 2023

Abstract

Large-scale multivariate regression is a fundamental statistical tool that finds applications in a wide range of areas. This paper considers the problem of simultaneously testing a large number of general linear hypotheses, encompassing covariate-effect analysis, analysis of variance, and model comparisons. The new challenge that comes along with the overwhelmingly large number of tests is the ubiquitous presence of heavy-tailed and/or highly skewed measurement noise, which is the main reason for the failure of conventional least squares based methods. For large-scale multivariate regression, we develop a set of robust inference methods to explore data features, such as heavy tailedness and skewness, which are invisible to the scope of least squares. The new testing procedure is built on data-adaptive Huber regression, and a new covariance estimator of regression estimates. Under mild conditions, we show that our methods produce consistent estimates of the false discovery proportion. Extensive numerical experiments, along with an empirical study on quantitative linguistics, demonstrate the advantage of our proposal compared to many state-of-the-art methods when the data are generated from heavy-tailed and/or skewed distributions.

Type

Journal article

Publication

Statistica Sinica

Large-Scale Inference of Multivariate Regression for Heavy-Tailed and Asymmetric Data

Abstract

Youngseok Song

Assistant Professor in the Department of Statistics