# MathSciDoc: An Archive for Mathematician ∫

#### ProbabilityStatistics Theory and Methodsmathscidoc:2110.28004

2021.10
In this paper, we prove a CLT for the sample canonical correlation coefficients between two high-dimensional random vectors with finite rank correlations. More precisely, consider two random vectors $\wt{\bx}=\mathbf x + A \mathbf z$ and $\wt{\by}=\mathbf y + B \mathbf z$, where $\mathbf x \in \R^p$, $\mathbf y \in \R^q$ and $\mathbf z\in \R^r$ are independent random vectors with i.i.d.\;entries of mean zero and variance one, and $A \in \R^{p\times r}$ and $B\in \R^{q\times r}$ are two arbitrary deterministic matrices. Given $n$ samples of $\wt{\bx}$ and $\wt{\by}$, we stack them into two matrices $\cal X= X+AZ$ and $\cal Y= Y+BZ$, where $X\in \R^{p\times n}$, $Y\in \R^{q\times n}$ and $Z\in \R^{r\times n}$ are random matrices with i.i.d.\;entries of mean zero and variance one. Let $\wt\lambda_1 \ge \wt\lambda_2\ge \cdots \ge \wt\lambda_{r}$ be the largest $r$ eigenvalues of the sample canonical correlation (SCC) matrix $\cal C_{\cal X\cal Y}=(\cal X\cal X^\top)^{-1/2}\cal X\cal Y^\top (\cal Y\cal Y^\top)^{-1}\cal Y \cal X^\top (\cal X\cal X^\top)^{-1/2}$, and let $t_1\ge t_2 \ge \cdots\ge t_r$ be the squares of the population canonical correlation coefficients between $\wt{\bx}$ and $\wt{\by}$. Under certain moment assumptions, we show that there exists a threshold $t_c \in(0, 1)$ such that if $t_i>t_c$, then \smash{$\sqrt{n} (\wt\lambda_i-\theta_i)$} converges weakly to a centered normal distribution, where $\theta_i$ is a fixed outlier location determined by $t_i$. Our proof uses a self-adjoint linearization of the SCC matrix and a sharp local law on the inverse of the linearized matrix.
@inproceedings{fan2021limiting,