Baum Welch Algorithm is an EM algorithm
Given an stochastic model P(v,h|θ), where v is ‘Visible’ data, h is ‘Hidden’ data, θ is parameter(s).
General EM algorithm consists of two steps, E (expectation) step and M (Maximization) step.
E step: Calculate the following ‘Q function’
Q(θ|θ(t))=Eθ(t)[logP(v,h|θ)]=∑tP(h|v,θ(t))logP(v,h|θ)M step: Calculate new parameter(s) θ(t+1) that maximize the ‘Q function’
θ(t+1)=argmaxθQ(θ|θ(t))
On an HMM, the joint probability of the output sequence (‘Visible’ data) x and the hidden state sequence (‘Hidden’ data) π is written as
P(x,π|θ)=a0,π1∏ieπi(xi)aπi,πi+1=∏k∏ℓ(aAkℓ(π)kℓ)∏k∏b(ek(b)Ek(b,π))
using ‘real’ parameters θ=({akℓ},{ek(b)}) and the ‘real counts’ on the ‘real state transitions’ π of emissions Ek(b,π) and that of transitions Akℓ(π).
E step of HMM
The Q function of the HMM is calculated as
Q(θ|θ(t))=∑πP(π|x,θ(t))(∑k∑ℓAkℓ(π)logakℓ+∑k∑bEk(b,π)logek(b))=∑k∑ℓAkℓlogakℓ+∑k∑bEk(b)logek(b)
by using the values obtained from forward-backward algorithm,
A(t)kℓ=Eθ(t)[Akℓ(π)]=∑πP(π|x,θ(t))Akℓ(π)Ek(b)(t)=Eθ(t)[Ek(b,π)]=∑πP(π|x,θ(t))Ek(b,π)
M step of HMM
Under the constraint ∑ℓakℓ=1 and ∑bek(b)=1, Q(θ|θ(t)) is maximized by setting
θ(t+1)=({akℓ}(t+1),{ek(b)}(t+1))=({A(t)kℓ∑jA(t)kj},{ek(b)(t)∑aek(a)(t)})