Nussinov Algorithm and its associated SCFG

Nussinov Algorithm for maximum number of possible base-pairs

Initialization:
for \( i=1 \) to \( L \)
\[ M(i,i) = 0, \]
Recurtion:
for \( \ell=1 \) to \( L-1 \), for \( i=1 \) to \( L-\ell\), \( j=i+\ell \)
\[
M(i,j) = \max
\left[ \begin{array}{l}
M(i+1,j) \\
M(i,j-1) \\
M(i+1,j-1) + \delta(i,j)\\
\max_{h} M(i,h) + M(h+1,j)
\end{array} \right.
\]

Traceback of Nussinov Algorithm

初期化: \((1,L)\)をスタックにpushする.
再帰: スタックが空になるまで繰り返す.
— pop \((i,j)\)
— if \(i>=j\) continue;
— else if \(\gamma(i+1,j) = \gamma(i,j)\), \((i+1,j)\) を push;
— else if \(\gamma(i,j-1) = \gamma(i,j)\), \((i,j-1)\) を push;
— else if \(\gamma(i+1,j-1)+\delta_{i,j} = \gamma(i,j)\),
—— 塩基対\((i,j)\)を記録.
—— \((i+1,j-1)\) を push
else for \(k=i+1\) to \(j-1\): if \(\gamma(i,k)+\gamma(k+1,j)=\gamma(i,j)\):
—— \((k+1,j)\) を push
—— \((i,k)\) を push
—— break

Production rules of the SCFG associated to the above Nassinov Algorithm

\[
\begin{eqnarray}
M &\longrightarrow xM & : P(M\rightarrow xM), & x\in\{a,c,g,u\}\\
M &\longrightarrow Mx & : P(M\rightarrow Mx), & x\in\{a,c,g,u\}\\
M &\longrightarrow xM\bar{x} & : P(M\rightarrow xM\bar{x}), &(x,\bar{x})\in \{(a,u),(c,g),(g,c),(u,a)\}\\
M &\longrightarrow MM & : P(M\rightarrow MM)\\
M &\longrightarrow x & : P(M\rightarrow x), &x\in\{a,c,g,u\}
\end{eqnarray}
\]

CYK Algorithm of the SCFG

Initialization:
for \( i=1 \) to \( L-1 \)
\[ \gamma(i,i,M) = P(M \rightarrow x_i) \]
Recurtion:
for \( \ell=1 \) to \( L-1 \), for \( i=1 \) to \( L-\ell\), \( j=i+\ell \)
\[
\gamma(i,j,M) = \max
\left[ \begin{array}{l}
\gamma(i+1,j,M) + \log P(M \rightarrow x_i M) \\
\gamma(i,j-1,M) + \log P(M \rightarrow M x_j) \\
\gamma(i+1,j-1,M) + \log P(M \rightarrow x_i M x_j) \\
\max_{h} \gamma(i,h) + \gamma(h+1,j) + \log P(M \rightarrow MM)
\end{array} \right.
\]

Inside algorithm of the SCFG

Initialization:
for \( i=1 \) to \( L-1 \)
\[ \alpha(i,i,M) = P(M \rightarrow x_i) \]
Recurtion:
for \( \ell=1 \) to \( L-1 \), for \( i=1 \) to \( L-\ell \), \( j=i+\ell \)
\[
\alpha(i,j,M) =
\left[ \begin{array}{l}
\alpha(i+1,j,M) P(M \rightarrow x_i M) \\
+ \alpha(i,j-1,M) P(M \rightarrow M x_j) \\
+ \alpha(i+1,j-1,M) P(M \rightarrow x_i M x_j) \\
+ \sum_{h=1}^{L-1} \alpha(i,h,M) \alpha(h+1,j,M) P(M \rightarrow MM)
\end{array} \right.
\]

Outside Algorithm of the SCFG

Initialization:
\[ \beta(1,L,M) = 1 \]
Recurtion:
for \( \ell=L-1 \) to \( 1 \), for \( i=1 \) to \( L-\ell \), \( j=i+\ell \)
\[
\beta(i,j,M) =
\left[ \begin{array}{l}
\beta(i-1,j,M) P(M \rightarrow x_{i-1} M) \\
+ \beta(i,j+1,M) P(M \rightarrow M x_{j+1}) \\
+ \beta(i-1,j+1,M) P(M \rightarrow x_{i-1} M x_{j+1}) \\
+ \sum_{h=1}^{i-1} \alpha(h,i-1,M) \beta(h,j,M) P(M \rightarrow MM) \\
+ \sum_{h=j+1}^{L} \alpha(j+1,h,M) \beta(i,h,M) P(M \rightarrow MM)
\end{array} \right.
\]

Partition Function on number of base-pairs

2次構造\(\sigma\)に含まれる塩基対の数\(N_\sigma\)を2次構造\(\sigma\)の自由エネルギーの符号を反転した値の乱暴な近似と考えると、2次構造\(\sigma\)の熱力学的確率は以下のボルツマン分布で与えられる。
\[ \begin{eqnarray}
P(\sigma |x) =\frac{1}{Z_N(x,\sigma)} e^{N_\sigma/k_N T}\\
Z_N(x,\sigma) = \sum_{\sigma \in S(x)} e^{N_\sigma/k_N T}
\end{eqnarray} \]
このボルツマン分布に対応するSCFGがどのようになるか、考察せよ