Baum Welch Algorithm is an EM algorithm

Given an stochastic model \( P(v,h|\theta) \), where \( v \) is ‘Visible’ data, \( h \) is ‘Hidden’ data, \( \theta \) is parameter(s).

General EM algorithm consists of two steps, E (expectation) step and M (Maximization) step.

E step: Calculate the following ‘Q function’
\[ Q(\theta|\theta^{(t)})
= E_{\theta^{(t)}}[\log P(v,h|\theta)]
= \sum_t P(h|v, \theta^{(t)}) \log P(v,h|\theta) \]

M step: Calculate new parameter(s) \( \theta^{(t+1)} \) that maximize the ‘Q function’
\[ \theta^{(t+1)} = argmax_{\theta} Q(\theta|\theta^{(t)}) \]

On an HMM, the joint probability of the output sequence (‘Visible’ data) \( x \) and the hidden state sequence (‘Hidden’ data) \( \pi \) is written as
\[ P(x, \pi | \theta ) = a_{0,\pi_1} \prod_{i} e_{\pi_i}(x_i) a_{\pi_i, \pi_{i+1}}
=\prod_k \prod_{\ell} (a_{k\ell}^{A_{k\ell}(\pi)})
\prod_k \prod_b (e_k(b)^{E_k(b,\pi)})
\]
using ‘real’ parameters \( \theta = (\{a_{k\ell}\}, \{e_k(b)\}) \) and the ‘real counts’ on the ‘real state transitions’ \( \pi \) of emissions \( E_k(b,\pi) \) and that of transitions \( A_{k\ell}(\pi) \).

E step of HMM

The Q function of the HMM is calculated as
\[ \begin{align}
Q(\theta|\theta^{(t)}) &= \sum_\pi P(\pi|x, \theta^{(t)})
\left(
\sum_k \sum_{\ell} A_{k\ell}(\pi) \log{a_{k\ell}}
+\sum_k \sum_b E_k(b,\pi) \log{e_k(b)}
\right) \\
&= \sum_k \sum_{\ell} A_{k\ell} \log{a_{k\ell}}
+ \sum_k \sum_b E_k(b) \log{e_k(b)}
\end{align} \]

by using the values obtained from forward-backward algorithm,
\[
A_{k\ell}^{(t)} = E_{\theta^{(t)}}[A_{k\ell}(\pi)]
= \sum_\pi P(\pi|x,\theta^{(t)}) A_{k\ell}(\pi) \\
E_k(b)^{(t)} = E_{\theta^{(t)}}[E_k(b,\pi)]
= \sum_\pi P(\pi|x,\theta^{(t)}) E_k(b,\pi)
\]

M step of HMM

Under the constraint \( \sum_\ell a_{k\ell} = 1 \) and \( \sum_b e_k(b) = 1 \), \( Q(\theta|\theta^{(t)}) \) is maximized by setting
\[
\theta^{(t+1)} = (\{a_{k\ell}\}^{(t+1)}, \{e_k(b)\}^{(t+1)})
= ( \{ \frac{A_{k\ell}^{(t)}}{\sum_j A_{kj}^{(t)}} \},
\{ \frac{e_k(b)^{(t)}}{\sum_a e_k(a)^{(t)}} \} )
\]