The EM algorithm is a widely used tool in maximum-likelihood estimation
in incomplete data problems. Existing theoretical work has focused on
conditions under which the iterates or likelihood values converge, and the
associated rates of convergence. Such guarantees do not distinguish whether
the ultimate fixed point is a near global optimum or a bad local optimum of
the sample likelihood, nor do they relate the obtained fixed point to the global
optima of the idealized population likelihood (obtained in the limit of infinite
data). This paper develops a theoretical framework for quantifying when and
how quickly EM-type iterates converge to a small neighborhood of a given
global optimum of the population likelihood. For correctly specified models,
such a characterization yields rigorous guarantees on the performance of certain
two-stage estimators in which a suitable initial pilot estimator is refined
with iterations of the EM algorithm. Our analysis is divided into two parts:
a treatment of the EM and first-order EM algorithms at the population level,
followed by results that apply to these algorithms on a finite set of samples.
Our conditions allow for a characterization of the region of convergence of
EM-type iterates to a given population fixed point, that is, the region of the
parameter space over which convergence is guaranteed to a point within a
small neighborhood of the specified population fixed point. We verify our
conditions and give tight characterizations of the region of convergence for
three canonical problems of interest: symmetric mixture of two Gaussians,
symmetric mixture of two regressions and linear regression with covariates
missing completely at random.