Statistical challenges with high dimensionality: Feature selection in knowledge discovery

Jianqing Fan Runze Li

Statistics Theory and Methods mathscidoc:1912.43267

arXiv preprint math/0602133, 2006.2
Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of high-dimensionality arise in diverse fields of sciences and the humanities, ranging from computational biology and health studies to financial engineering and risk management. In all of these fields, variable selection and feature extraction are crucial for knowledge discovery. We first give a comprehensive overview of statistical challenges with high dimensionality in these diverse disciplines. We then approach the problem of variable selection and feature extraction using a unified framework: penalized likelihood methods. Issues relevant to the choice of penalty functions are addressed. We demonstrate that for a host of statistical problems, as long as the dimensionality is not excessively large, we can estimate the model parameters as well as if the best model is known in advance. The persistence property in risk minimization is also addressed. The applicability of such a theory and method to diverse statistical problems is demonstrated. Other related problems with high-dimensionality are also discussed.
No keywords uploaded!
[ Download ] [ 2019-12-21 11:33:58 uploaded by Jianqing_Fan ] [ 281 downloads ] [ 0 comments ]
@inproceedings{jianqing2006statistical,
  title={Statistical challenges with high dimensionality: Feature selection in knowledge discovery},
  author={Jianqing Fan, and Runze Li},
  url={http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20191221113358231027827},
  booktitle={arXiv preprint math/0602133},
  year={2006},
}
Jianqing Fan, and Runze Li. Statistical challenges with high dimensionality: Feature selection in knowledge discovery. 2006. In arXiv preprint math/0602133. http://archive.ymsc.tsinghua.edu.cn/pacm_paperurl/20191221113358231027827.
Please log in for comment!
 
 
Contact us: office-iccm@tsinghua.edu.cn | Copyright Reserved