Expectation-maximization algorithm

An expectation-maximization (EM) algorithm is an algorithm for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved (latent) variables. EM alternates between performing an expectation (E) step, which computes the expected value of the latent variables, and an maximization (M) step, which computes the maximum likelihood estimates of the parameters given the data and setting the latent variables to their expectation.

It can be shown that an EM iteration does not decrease the observed data likelihood function, and that the only stationary points of the iteration are the stationary points of the observed data likelihood function. In practice, this means that an EM algorithm will converge to a local maximum of the observed data likelihood function.

"Expectation-maximization" is a description of a class of related algorithms, not a particular algorithm; EM is a recipe or meta-algorithm which is used to devise particular algorithms. The Baum-Welch algorithm is an example of an EM algorithm applied to hidden Markov models. Another example is the EM algorithm for fitting a mixture density model.

An EM algorithm can also find maximum a posteriori (MAP) estimates, by performing MAP estimation in the M step, rather than maximum likelihood.

There are other methods for finding maximum likelihood estimates, such as gradient descent, conjugate gradient or variations of the Gauss-Newton method.

References

Arthur Dempster, Nan Laird, and Donald Rubin. "Maximum likelihood from incomplete data via the EM algorithm". Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.

Radford Neal, Geoffrey Hinton. "A view of the EM algorithm that justifies incremental, sparse, and other variants". In Michael I. Jordan (editor), Learning in Graphical Models pp 355-368. Cambridge, MA: MIT Press, 1999.