Baum Welch Algorithm is an EM algorithm

Given an stochastic model $P(v,h|\theta)$ , where $v$ is ‘Visible’ data, $h$ is ‘Hidden’ data, $\theta$ is parameter(s).

General EM algorithm consists of two steps, E (expectation) step and M (Maximization) step.

E step: Calculate the following ‘Q function’

$Q(\theta|\theta^{(t)}) = E_{\theta^{(t)}}[\log P(v,h|\theta)] = \sum_t P(h|v, \theta^{(t)}) \log P(v,h|\theta)$

M step: Calculate new parameter(s) $\theta^{(t+1)}$ that maximize the ‘Q function’

$\theta^{(t+1)} = argmax_{\theta} Q(\theta|\theta^{(t)})$

On an HMM, the joint probability of the output sequence (‘Visible’ data) $x$ and the hidden state sequence (‘Hidden’ data) $\pi$ is written as

$P(x, \pi | \theta ) = a_{0,\pi_1} \prod_{i} e_{\pi_i}(x_i) a_{\pi_i, \pi_{i+1}} =\prod_k \prod_{\ell} (a_{k\ell}^{A_{k\ell}(\pi)}) \prod_k \prod_b (e_k(b)^{E_k(b,\pi)})$
using ‘real’ parameters

$\theta = (\{a_{k\ell}\}, \{e_k(b)\})$ and the ‘real counts’ on the ‘real state transitions’

$\pi$ of emissions

$E_k(b,\pi)$ and that of transitions

$A_{k\ell}(\pi)$ .

E step of HMM

The Q function of the HMM is calculated as

$\begin{align} Q(\theta|\theta^{(t)}) &= \sum_\pi P(\pi|x, \theta^{(t)}) \left( \sum_k \sum_{\ell} A_{k\ell}(\pi) \log{a_{k\ell}} +\sum_k \sum_b E_k(b,\pi) \log{e_k(b)} \right) \\ &= \sum_k \sum_{\ell} A_{k\ell} \log{a_{k\ell}} + \sum_k \sum_b E_k(b) \log{e_k(b)} \end{align}$

by using the values obtained from forward-backward algorithm,

$A_{k\ell}^{(t)} = E_{\theta^{(t)}}[A_{k\ell}(\pi)] = \sum_\pi P(\pi|x,\theta^{(t)}) A_{k\ell}(\pi) \\ E_k(b)^{(t)} = E_{\theta^{(t)}}[E_k(b,\pi)] = \sum_\pi P(\pi|x,\theta^{(t)}) E_k(b,\pi)$

M step of HMM

Under the constraint $\sum_\ell a_{k\ell} = 1$ and $\sum_b e_k(b) = 1$ , $Q(\theta|\theta^{(t)})$ is maximized by setting

$\theta^{(t+1)} = (\{a_{k\ell}\}^{(t+1)}, \{e_k(b)\}^{(t+1)}) = ( \{ \frac{A_{k\ell}^{(t)}}{\sum_j A_{kj}^{(t)}} \}, \{ \frac{e_k(b)^{(t)}}{\sum_a e_k(a)^{(t)}} \} )$