Baum Welch Algorithm is an EM algorithm

Given an stochastic model P(v,h|θ), where v is ‘Visible’ data, h is ‘Hidden’ data, θ is parameter(s).

General EM algorithm consists of two steps, E (expectation) step and M (Maximization) step.

E step: Calculate the following ‘Q function’
Q(θ|θ(t))=Eθ(t)[logP(v,h|θ)]=tP(h|v,θ(t))logP(v,h|θ)

M step: Calculate new parameter(s) θ(t+1) that maximize the ‘Q function’
θ(t+1)=argmaxθQ(θ|θ(t))

On an HMM, the joint probability of the output sequence (‘Visible’ data) x and the hidden state sequence (‘Hidden’ data) π is written as
P(x,π|θ)=a0,π1ieπi(xi)aπi,πi+1=k(aAk(π)k)kb(ek(b)Ek(b,π))


using ‘real’ parameters θ=({ak},{ek(b)}) and the ‘real counts’ on the ‘real state transitions’ π of emissions Ek(b,π) and that of transitions Ak(π).

E step of HMM

The Q function of the HMM is calculated as
Q(θ|θ(t))=πP(π|x,θ(t))(kAk(π)logak+kbEk(b,π)logek(b))=kAklogak+kbEk(b)logek(b)

by using the values obtained from forward-backward algorithm,
A(t)k=Eθ(t)[Ak(π)]=πP(π|x,θ(t))Ak(π)Ek(b)(t)=Eθ(t)[Ek(b,π)]=πP(π|x,θ(t))Ek(b,π)

M step of HMM

Under the constraint ak=1 and bek(b)=1, Q(θ|θ(t)) is maximized by setting
θ(t+1)=({ak}(t+1),{ek(b)}(t+1))=({A(t)kjA(t)kj},{ek(b)(t)aek(a)(t)})