# Dynamic programing for global pairwise alignment

## Problem

Calculate the optimal global alignment of the two sequences, $x_{1\ldots m} = x_1 \ldots x_m$ and $y_{1\ldots n} = y_1 \ldots y_n$, with substitution matrix $s(a,b)$ and gap penelty $g(k)$ for a gap with the length $k$.

## Definition

$H_{i,j}$ is the maximum score of the (sub)sequences $x_{1…i} = x_1 \ldots x_i$ and $y_{1\ldots j} = y_1 \ldots y_j$.

### Pairwise global alignment with linear gap penalty, $O(N^2)$ (Needleman-Wunsch 1970; $g(k) = -uk$)

#### Initialization

$H_{0,0} =0, \\ H_{i,0} = -iu \mbox{ for } i=1, \ldots m \\ H_{0,j} = -ju \mbox{ for } j=1, \ldots n \\$

#### Iteration

$\mbox{ for } i=1, \ldots m\\ \mbox{ for } j=1, \ldots n\\ H_{i,j}= \min \left[ \begin{array}{l} H_{i-1,j-1} + s(a_i,b_j), \\ H_{i-1,j} -u, \\ H_{i,j-1} -u \end{array} \right.$

### Pairwise global alignment with general gap penalty, $O(N^3)$ (Waterman, Smith, & Bayer 1976)

$\newcommand{\max}{\mathop{\rm max}\limits} \mbox{ (1) } H_{i,j} = \max_{a=1\ldots 3} H^a_{i,j},$ where
$\begin{array}{ll} \mbox{ (2) }H^1_{i,j} &= H_{i-1,j-1} + s(a_i,b_j), \\ \mbox{ (3) }H^2_{i,j} &= \max_{k=1\ldots i} \left( H_{i-k,j} + g(k) \right), \\ \mbox{ (4) }H^3_{i,j} &= \max_{k=1\ldots j} \left( H_{i,j-k} + g(k) \right) \end{array}$

### Pairwise global alignment with Affine gap penalty \\

($g(k)= -uk – v$), $O(N^2)$ (Gotoh, 1982; the NWG)
$\newcommand{\max}{\mathop{\rm max}\limits} \mbox{ (1) } H_{i,j} = \max_{a=1\ldots 3} H^a_{i,j},$ where
$\begin{array}{ll} \mbox{ (2) }H^1_{i,j} &= H_{i-1,j-1} + s(a_i,b_j), \\ \mbox{ (3) }H^2_{i,j} &= max \left( H_{i-1,j} + g(1), H^2_{i-1,j} -u \right), \\ \mbox{ (4) }H^3_{i,j} &= max \left( H_{i,j-1} + g(1), H^3_{i,j-1} -u \right) \end{array}$

### Problem 1 of Prof. Gotoh’s lecture

Derive the NWG algorithm in page 34 from general algorithm in page 33.

By affine gap definition, $g(k) = -uk -v = g(k-1) – u$ holds.
$\newcommand{\max}{\mathop{\rm max}\limits} \begin{array}{lll} \mbox{From (3)}\\ &H^2_{i,j} &= \max_{k=1\ldots i} \left( H_{i-k,j} + g(k) \right) \\ &&= max \left( H_{i-1,j} + g(1), \max_{k=2\ldots i} \left(H_{i-k,j} + g(k)\right) \right) \\ &&= max \left( H_{i-1,j} + g(1), \max_{k=1\ldots i-1} \left(H_{(i-1)-k,j} + g(k-1) – u \right) \right)\\ &&= max \left( H_{i-1,j} + g(1), H^2_{i-1,j} – u \right) \\ \mbox{Similary from (4)}\\ &H^3_{i,j} &= max \left( H_{i,j-1} + g(1), H^3_{i,j-1} – u \right) \end{array}$
$H_{0,0} =0, \\ H_{i,0} = -iu -v \mbox{ for } i=1, \ldots m \\ H_{0,j} = -ju -v \mbox{ for } j=1, \ldots n$