Distributed estimation of principal eigenspaces

Jianqing Fan Dong Wang Kaizheng Wang Ziwei Zhu

Statistics Theory and Methods mathscidoc:1912.43359

The Annals of Statistics, 47, (6), 3009-3031, 2019
Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can prohibit the computation of PCA in a central location and distributed algorithms for PCA are thus needed. This paper proposes and studies a distributed PCA algorithm: each node machine computes the top K eigenvectors and transmits them to the central server; the central server then aggregates the information from all the node machines and conducts a PCA based on the aggregated information. We investigate the bias and variance for the resulting distributed estimator of the top K eigenvectors. In particular, we show that for distributions with symmetric innovation, the empirical top eigenspaces are unbiased, and hence the distributed PCA is unbiased. We derive the
No keywords uploaded!
[ Download ] [ 2019-12-21 11:39:27 uploaded by Jianqing_Fan ] [ 747 downloads ] [ 0 comments ]
  title={Distributed estimation of principal eigenspaces},
  author={Jianqing Fan, Dong Wang, Kaizheng Wang, and Ziwei Zhu},
  booktitle={The Annals of Statistics},
Jianqing Fan, Dong Wang, Kaizheng Wang, and Ziwei Zhu. Distributed estimation of principal eigenspaces. 2019. Vol. 47. In The Annals of Statistics. pp.3009-3031. http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20191221113927472549919.
Please log in for comment!
Contact us: office-iccm@tsinghua.edu.cn | Copyright Reserved