Towards Ultrahigh Dimensional Feature Selection for Big Data

Mingkui Tan South China University of Technology Ivor W. Tsang University of Technology Sydney Li Wang University of Texas at Arlington

Numerical Analysis and Scientific Computing Machine Learning mathscidoc:1904.25012

Best Paper Award in 2019

Journal of Machine Learning Research, 15, 59, 2014
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an efficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some efficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(10^14) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training efficiency.
big data, ultrahigh dimensionality, feature selection, nonlinear feature selection, multiple kernel learning, feature generation
[ Download ] [ 2019-04-27 03:23:44 uploaded by liwangucsd ] [ 1495 downloads ] [ 0 comments ]
@inproceedings{mingkui2014towards,
  title={Towards Ultrahigh Dimensional Feature Selection for Big Data},
  author={Mingkui Tan, Ivor W. Tsang, and Li Wang},
  url={http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20190427032344602832278},
  booktitle={Journal of Machine Learning Research},
  volume={15},
  pages={59},
  year={2014},
}
Mingkui Tan, Ivor W. Tsang, and Li Wang. Towards Ultrahigh Dimensional Feature Selection for Big Data. 2014. Vol. 15. In Journal of Machine Learning Research. pp.59. http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20190427032344602832278.
Please log in for comment!
 
 
Contact us: office-iccm@tsinghua.edu.cn | Copyright Reserved