Principal component analysis (PCA) is one of the most commonly used statistical procedures with a wide range of applications. This paper considers both minimax and adaptive estimation of the principal subspace in the high dimensional setting. Under mild technical conditions, we first establish optimal rates of convergence for estimating the principal subspace which sharp with respect to all the parameters, thus providing a complete character ization of the difficulty of the estimation problem in term of the convergence rate. The lower bound is obtained by calculating the local metric entropy an application of Fano's lemma. The rate optimal estimator is constructed using aggregation, which, however, might not be computationally feasible. We then introduce an adaptive procedure for estimating the principal sub space which is fully data driven and can be computed efficiently. It is shown that the estimator attains the optimal rates of convergence simultaneously over a large collection of the parameter spaces. A key idea in our construc tion is a reduction scheme which reduces the sparse PCA problem to a high dimensional multivariate regression problem. This method is potentially useful for other related