Conditional Log-linear Model (CLLM)

A generative stochastic grammar comprises a set of stochastic production rules.
Given a sequence \(x\), the conditional probability of a parse \(\sigma\) is
\[
P(\sigma|x) = \frac{P(x,\sigma)}{\sum_{\sigma’ \in \Omega(x)}P(x,\sigma’)}
\]
If the grammar is ambiguous, which means a structure may corresponds to multiple parses,
conditional probability of a structure should be written as
\[
P(y|x) = \sum_{\sigma \in y} P(\sigma|x)
= \frac{\sum_{\sigma \in y}P(x,\sigma)}{\sum_{\sigma’ \in \Omega(x)}P(x,\sigma’)}
\]
Let \(F_i(x,\sigma)\) the number of occurrences of each production rule \(i\) in parse \(\sigma\),
the joint probability of \(x\) and \(\sigma\) is written as
\[
P(x,\sigma) = \prod_{i}^{n} p_i^{F_i(x,\sigma)}
= \exp \left[ \sum_{i}^{n} F_i(x,\sigma) \log p_i \right]
= \exp ({\bf w}^T {\bf F}(x,\sigma))
\]
where \(p_i\) is the probability of the \(i\)-th production rule,
and \(w_i = \log p_i\) is regarded as the coefficient of this log linear model.
The conditional probability is rewritten as
\[
P(y|x) = \frac{1}{Z(x)}\sum_{\sigma \in y} \exp ({\bf w}^T {\bf F}(x,\sigma))
\]
where \( \sum_{\sigma{‘} \in \Omega(x)} \exp ({\bf w}^T {\bf F}(x,\sigma{‘})) \) is the partition function of this Boltzmann distribution.

判別モデルとしての一般の対数線形モデルは,以下の形式に書ける.
\[\begin{align}
p(y|x;w) = \frac{\exp{\sum_{j=1}^{J}w_jF_j(x,y)}}{Z(x,w)} \label{eq:general-log-linear}
\end{align} \]
ここで分母は、以下のような分配関数である.
\[\begin{align}
Z(x,w) = \sum_{y’}\exp\sum_{j=1}^{J}w_jF_j(x,y’)
\end{align}\]
与えられたデータ\(x\)に対して最尤推定\(\hat{y}{MLE}\)を求める場合は,分母は\(x\)のみに依存して共通だから,
線形重みの部分を最大にする\(y\)を求めればよい.
\[\begin{align}
\hat{y}^{MLE} = argmax_{y}p(y|x;w) = argmax_{y}\sum_{j=1}^{J}w_jF_j(x,y) \label{eq:mle_general_llm}
\end{align}\]