Piecewise Linear Gap(区間線形ギャップ)

General Gap Cost (Waterman et al. 1976)

\[
\begin{eqnarray}
D(i,j) &=& \min
\left[ \begin{array}{l}
D(i-1,j-1) + \delta(a_i,b_j), \\
F(i,j),\\
G(i,j),
\end{array} \right.
\tag{genome.R2.1}\label{genome.R2.1}\\
\\
\mbox{ where } & & F(i,j) = \min_{1 \le k \le i} \{D(i-k,j) + g(k)\},\tag{genome.R2.2}\label{genome.R2.2}\\
& & G(i,j) = \min_{1 \le k \le j} \{D(i,j-k) + g(k)\}.\tag{genome.R2.3}\label{genome.R2.3}
\end{eqnarray}
\]

Affine Gap (Gotoh 1982)

区間線形ギャップ(Piecewise Linear Gap)を議論する前に,その特殊な場合であるアフィンギャップ
\[
g(k) = uk + v
\tag{genome.R2.4}\label{genome.R2.4}
\]
の場合の漸化式をGeneral Gap Costの漸化式から導く.
式\(\eqref{genome.R2.4}\)の\(g(k)\)を,式\(\eqref{genome.R2.2}\),式\(\eqref{genome.R2.3}\)に代入すると,
\[
\begin{eqnarray}
F(i,j) &=& \min_{1 \le k \le i} \{D(i-k,j) + uk + v\},\\
&=& \min \left[ D(i-1,j) + u + v, \min_{2 \le k \le i} \{D(i-k,j) + uk + v\} \right]\\
&=& \min \left[ D(i-1,j) + u + v, \min_{1 \le k \le i-1} \{D((i-1)-k,j) + uk + v + u \} \right]\\
&=& \min \left[ D(i-1,j) + u + v, F(i-1,j) + u \} \right]
\tag{genome.R2.5}\label{genome.R2.5}\\
G(i,j) &=& \min \left[ D(i,j-1) + u + v, G(i,j-1) + u \} \right].
\tag{genome.R2.6}\label{genome.R2.6}
\end{eqnarray}
\]
式\(\eqref{genome.R2.5}\),式\(\eqref{genome.R2.6}\)を式\(\eqref{genome.R2.1}\)に統合すると,
\[
\begin{eqnarray}
D(i,j) &=& \min
\left[ \begin{array}{l}
D(i-1,j-1) + \delta(a_i,b_j), \\
D(i-1,j) + u + v,\\
D(i,j-1) + u + v,
\end{array} \right.
F(i,j) &=& \min
\left[ \begin{array}{l}
D(i-1,j) + u + v, \\
F(i-1,j) + u,\\
\end{array} \right.
G(i,j) &=& \min
\left[ \begin{array}{l}
D(i,j-1) + u + v, \\
G(i,j-1) + u.
\end{array} \right.
\end{eqnarray}
\]

Piecewise Liner Gap (Gotoh 1990)

式\(\eqref{genome.R2.2}\)における\(g(k)\)を\(L\)個の区間,
\((K_0=0,K_1], \ldots, (K_{\alpha-1},K_\alpha], \ldots, (K_{L-1},K_L=\infty]\) , 
毎に以下の線形関数に値をとる区間線形関数とする.
\[
g(k) = g_\alpha(k) \equiv u_{\alpha}k + v_\alpha, \hspace{1cm}
\mbox{ for } k \in (K_{\alpha-1},K_\alpha] \hspace{2mm}(1 \le i \le L)\\
\hspace{2cm} \mbox{ where }
\hspace{1cm} u_\alpha > u_{\alpha+1} \ge 0,
\hspace{5mm} g_\alpha(K_\alpha) = g_{\alpha+1}(K_\alpha),
\hspace{1cm} \mbox{ for} \hspace{0.5cm} 1 \le \alpha \le L-1
\tag{genome.R2.7}\label{genome.R2.7}
\]
ここで\(g_\alpha(k)\)は,\(g(k) = g_\alpha(k)\)となる区間では\(L\)個の線形関数の中で最少となるから,以下が成り立つ.
\[
g_\alpha(k) = \min_\beta g_\beta(k) \hspace{2cm} \mbox{ if } \alpha \in (K_{\alpha-1},K_\alpha]
\tag{genome.R2.8}\label{genome.R2.8}
\]
式\(\eqref{genome.R2.4}\)の\(g(k)\)を,式\(\eqref{genome.R2.2}\)に代入すると(ただし,単純化するために\(i > K_{L-1}\)とする),
\[
\begin{eqnarray}
F(i,j) &=& \min_{1 \le k \le i} \{D(i-k,j) + g(k)\}\\
&=& \min
\left[ \begin{array}{l}
\min_{1 \le k \le K_1}\{D(i-k,j) + g_1(k)\},\\
\ldots\\
\min_{K_{\alpha-1} \le k \le K_\alpha}\{D(i-k,j) + g_\alpha(k)\},\\
\ldots\\
\min_{K_{L-1} \le k \le i}\{D(i-k,j) + g_L(k)\}.
\end{array} \right.
\end{eqnarray}
\]
式\(\eqref{genome.R2.8}\)から,内側minの範囲外では値がより小さい他の行が必ずあるから,内側minの範囲を\([1,i]\)にして良いので,
\[
\begin{eqnarray}
F(i,j) &=& \min_\alpha \left[\min_{1 \le k \le i} \{D(i-k,j) + g_\alpha(k)\}\right],
\end{eqnarray}
\]
同様に,式\(\eqref{genome.R2.7}\)の\(g(k)\)を,式\(\eqref{genome.R2.3}\)に代入すると(同様に\(j > K_{L-1}\)とする),
\[
\begin{eqnarray}
G(i,j) &=& \min_\alpha \left[\min_{1 \le k \le i} \{D(i,j-k) + g_\alpha(k)\}\right],
\end{eqnarray}
\]
ここで, 以下のように定義する.
\[
F_\alpha(i,j) \equiv \min_{1 \le k \le i} \{D(i-k,j) + g_\alpha(k)\}\\
G_\alpha(i,j) \equiv \min_{1 \le k \le j} \{D(i,j-k) + g_\alpha(k)\}
\]
\(F_\alpha(i,j), G_\alpha(i,j)\)に関して以下の漸化式が成り立つ.
\[
\begin{eqnarray}
F_\alpha(i,j) &=& \min
\left[ \begin{array}{l}
D(i-1,j) + u_\alpha + v_\alpha\\
F_\alpha(i-1,j) + u_\alpha.
\end{array} \right.\\
G_\alpha(i,j) &=& \min
\left[ \begin{array}{l}
D(i,j-1) + u_\alpha + v_\alpha\\
G_\alpha(i,j-1) + u_\alpha.
\end{array} \right.
\end{eqnarray}
\]
これを,式\(\eqref{genome.R2.1}\)と統合すると,
\[
\begin{eqnarray}
F_\alpha(i,j) &=& \min
\left[ \begin{array}{l}
D(i-1,j) + u_\alpha + v_\alpha\\
F_\alpha(i-1,j) + u_\alpha.
\end{array} \right.\\
G_\alpha(i,j) &=& \min
\left[ \begin{array}{l}
D(i,j-1) + u_\alpha + v_\alpha\\
G_\alpha(i,j-1) + u_\alpha.
\end{array} \right.\\
D(i,j) &=& \min
\left[ \begin{array}{l}
D(i-1,j-1) + \delta(a_i,b_j), \\
\min_\alpha F_\alpha(i,j),\\
\min_\alpha G_\alpha(i,j),
\end{array} \right.
\end{eqnarray}
\]

Reference
Waterman, M. S., T. F. Smith and W. A. Beyer (1976) Some biological sequence metrics.
Adv. Math. 20, 367-387.
Gotoh, O, An improved algorithm for matching biological sequences, J Mol Biol. 1982 Dec 15;162(3):705-8
Gotoh, O, Optimal sequence alignment allowing for long gaps, Bltn Mathcal Biology (1990) 52: 359. https://doi.org/10.1007/BF02458577