Dynamic programing for global pairwise alignment

Problem

Calculate the optimal global alignment of the two sequences, \( x_{1\ldots m} = x_1 \ldots x_m\) and \( y_{1\ldots n} = y_1 \ldots y_n\), with substitution matrix \(s(a,b)\) and gap penelty \(g(k)\) for a gap with the length \(k\).

Definition

\( H_{i,j} \) is the maximum score of the (sub)sequences \( x_{1…i} = x_1 \ldots x_i \) and \(y_{1\ldots j} = y_1 \ldots y_j \).

Pairwise global alignment with linear gap penalty, \(O(N^2)\) (Needleman-Wunsch 1970; \(g(k) = -uk\))

Initialization

\[
H_{0,0} =0, \\
H_{i,0} = -iu \mbox{ for } i=1, \ldots m \\
H_{0,j} = -ju \mbox{ for } j=1, \ldots n \\
\]

Iteration

\[
\mbox{ for } i=1, \ldots m\\
\mbox{ for } j=1, \ldots n\\
H_{i,j}= \min
\left[ \begin{array}{l}
H_{i-1,j-1} + s(a_i,b_j), \\
H_{i-1,j} -u, \\
H_{i,j-1} -u
\end{array} \right.
\]

Pairwise global alignment with general gap penalty, \(O(N^3)\) (Waterman, Smith, & Bayer 1976)

\[ \newcommand{\max}{\mathop{\rm max}\limits}
\mbox{ (1) } H_{i,j} = \max_{a=1\ldots 3} H^a_{i,j},
\] where
\[
\begin{array}{ll}
\mbox{ (2) }H^1_{i,j} &= H_{i-1,j-1} + s(a_i,b_j), \\
\mbox{ (3) }H^2_{i,j} &= \max_{k=1\ldots i} \left( H_{i-k,j} + g(k) \right), \\
\mbox{ (4) }H^3_{i,j} &= \max_{k=1\ldots j} \left( H_{i,j-k} + g(k) \right)
\end{array}
\]

Pairwise global alignment with Affine gap penalty \\

(\(g(k)= -uk – v\)), \(O(N^2)\) (Gotoh, 1982; the NWG)
\[ \newcommand{\max}{\mathop{\rm max}\limits}
\mbox{ (1) } H_{i,j} = \max_{a=1\ldots 3} H^a_{i,j},
\] where
\[
\begin{array}{ll}
\mbox{ (2) }H^1_{i,j} &= H_{i-1,j-1} + s(a_i,b_j), \\
\mbox{ (3) }H^2_{i,j} &= max \left( H_{i-1,j} + g(1), H^2_{i-1,j} -u \right), \\
\mbox{ (4) }H^3_{i,j} &= max \left( H_{i,j-1} + g(1), H^3_{i,j-1} -u \right)
\end{array}
\]

Problem 1 of Prof. Gotoh’s lecture

Derive the NWG algorithm in page 34 from general algorithm in page 33.

answer

By affine gap definition, \(g(k) = -uk -v = g(k-1) – u\) holds.
\[
\newcommand{\max}{\mathop{\rm max}\limits}
\begin{array}{lll}
\mbox{From (3)}\\
&H^2_{i,j}
&= \max_{k=1\ldots i} \left( H_{i-k,j} + g(k) \right) \\
&&= max \left( H_{i-1,j} + g(1), \max_{k=2\ldots i} \left(H_{i-k,j} + g(k)\right) \right) \\
&&= max \left( H_{i-1,j} + g(1), \max_{k=1\ldots i-1} \left(H_{(i-1)-k,j} + g(k-1) – u \right) \right)\\
&&= max \left( H_{i-1,j} + g(1), H^2_{i-1,j} – u \right) \\
\mbox{Similary from (4)}\\
&H^3_{i,j} &= max \left( H_{i,j-1} + g(1), H^3_{i,j-1} – u \right)
\end{array}
\]

Problem 2 of Prof. Gotoh’s lecture

Show the initial conditions for DP matrices of NWG algorithm

answer

\[
H_{0,0} =0, \\
H_{i,0} = -iu -v \mbox{ for } i=1, \ldots m \\
H_{0,j} = -ju -v \mbox{ for } j=1, \ldots n
\]