# Baum Welch Algorithm is an EM algorithm

Given an stochastic model $P(v,h|\theta)$, where $v$ is ‘Visible’ data, $h$ is ‘Hidden’ data, $\theta$ is parameter(s).

General EM algorithm consists of two steps, E (expectation) step and M (Maximization) step.

E step: Calculate the following ‘Q function’
$Q(\theta|\theta^{(t)}) = E_{\theta^{(t)}}[\log P(v,h|\theta)] = \sum_t P(h|v, \theta^{(t)}) \log P(v,h|\theta)$

M step: Calculate new parameter(s) $\theta^{(t+1)}$ that maximize the ‘Q function’
$\theta^{(t+1)} = argmax_{\theta} Q(\theta|\theta^{(t)})$

On an HMM, the joint probability of the output sequence (‘Visible’ data) $x$ and the hidden state sequence (‘Hidden’ data) $\pi$ is written as
$P(x, \pi | \theta ) = a_{0,\pi_1} \prod_{i} e_{\pi_i}(x_i) a_{\pi_i, \pi_{i+1}} =\prod_k \prod_{\ell} (a_{k\ell}^{A_{k\ell}(\pi)}) \prod_k \prod_b (e_k(b)^{E_k(b,\pi)})$
using ‘real’ parameters $\theta = (\{a_{k\ell}\}, \{e_k(b)\})$ and the ‘real counts’ on the ‘real state transitions’ $\pi$ of emissions $E_k(b,\pi)$ and that of transitions $A_{k\ell}(\pi)$.

# E step of HMM

The Q function of the HMM is calculated as
\begin{align} Q(\theta|\theta^{(t)}) &= \sum_\pi P(\pi|x, \theta^{(t)}) \left( \sum_k \sum_{\ell} A_{k\ell}(\pi) \log{a_{k\ell}} +\sum_k \sum_b E_k(b,\pi) \log{e_k(b)} \right) \\ &= \sum_k \sum_{\ell} A_{k\ell} \log{a_{k\ell}} + \sum_k \sum_b E_k(b) \log{e_k(b)} \end{align}

by using the values obtained from forward-backward algorithm,
$A_{k\ell}^{(t)} = E_{\theta^{(t)}}[A_{k\ell}(\pi)] = \sum_\pi P(\pi|x,\theta^{(t)}) A_{k\ell}(\pi) \\ E_k(b)^{(t)} = E_{\theta^{(t)}}[E_k(b,\pi)] = \sum_\pi P(\pi|x,\theta^{(t)}) E_k(b,\pi)$

# M step of HMM

Under the constraint $\sum_\ell a_{k\ell} = 1$ and $\sum_b e_k(b) = 1$, $Q(\theta|\theta^{(t)})$ is maximized by setting
$\theta^{(t+1)} = (\{a_{k\ell}\}^{(t+1)}, \{e_k(b)\}^{(t+1)}) = ( \{ \frac{A_{k\ell}^{(t)}}{\sum_j A_{kj}^{(t)}} \}, \{ \frac{e_k(b)^{(t)}}{\sum_a e_k(a)^{(t)}} \} )$