Exponential Family (指数分布族)
多くの確率分布は、指数分布族というファミリーに属することが知られている.
指数分布族の確率分布の一般型. パラメータ\(\theta\), 確率変数\(x\).
\[
p(x|\theta) = \exp [ \theta\cdot\phi(x) – \psi(\theta) ] \tag{1.1}\label{eq:exp-family}
\]
ここで、\(\psi(\theta)\)は確率の和が1となる様にするための正規化項で,分配関数の対数である.
\[
\psi(\theta) = \log{Z(\theta)}\\
Z(\theta) = \int \exp [ \theta \cdot \phi(x) ] dx
\]
Examples
1-Dimensional Normal Distribution (1次元正規分布)
\[
p(x) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left\{-\frac{(x -\mu)^2}{2\sigma^2} \right\}
\]
\[
\begin{eqnarray}
\log p(x) &=& -\log \left(\sqrt{2\pi}\sigma\right) – \frac{(x -\mu)^2}{2\sigma^2}\\
&=& -\frac{1}{2\sigma^2}x^2 + \frac{\mu}{\sigma^2}x -\log \left(\sqrt{2\pi}\sigma\right) – \frac{\mu^2}{2\sigma^2}
\end{eqnarray}
\]
By defining as below, the distribution has the exponentail family form \eqref{eq:exp-family}
\[
\phi(x) = (x^2, x)\\
\theta = (-\frac{1}{2\sigma^2}, \frac{\mu}{\sigma^2})\\
\psi(\theta) = \log \left(\sqrt{2\pi}\sigma\right) + \frac{\mu^2}{2\sigma^2}
\]
Multinominal Distribution (多項分布) of Exponential Family (指数分布族)
Multinominal distribution on \([0,\ldots,n-1]\) whose probability of \(i\) is \(p_i\). The logarithm of probability distribution can be written as:
\[
\begin{eqnarray}
\log p(x) &=& \log \left( \sum_{i=0}^{n-1} \delta(i,x) p_x \right)\\
&=& \sum_{i=0}^{n-1} \delta(i,x) \log p_x\\
&=& \left(1 -\sum_{i=1}^{n-1}\delta(i,x)\right)\log p_0 +\sum_{i=1}^{n-1} \delta(i,x) \log p_x\\
&=& \sum_{i=1}^{n-1} \delta(i,x) \log \frac{p_i}{p_0} + \log p_0
\end{eqnarray}
\]
By defining as below, the distribution has the exponentail family form \eqref{eq:exp-family}
\[
\phi(x) = \{\delta(i,x)\},\\
\theta = \{\log \frac{p_i}{p_0}\},\\
\psi(\theta) = -\log p_0 = \log \left[1+\sum_{i=1}^{n-1}\exp(\theta_i) \right].
\]
Expectation, covariance of Exponential Family
From now on, rewrite \(\phi(x)\) as just \(x\) in general form of exponential family, assuming conversion of random variable and probability measure, as follows.
\[
p(x|\theta) = \exp [ \theta\cdot x – \psi(\theta) ] \tag{1.2}\label{eq:exp-family2}
\]
By taking partial derivative of \(\log p(x|\theta)\) by \(\theta\),
\[
\frac{\partial}{\partial \theta_i} \log p(x|\theta)
= \frac{\partial}{\partial \theta_i} \left\{\theta\cdot x – \psi(\theta)\right\}
= x_i – \frac{\partial}{\partial \theta_i}\psi(\theta)\\
% \frac{\partial^2}{\partial \theta_i \partial \theta_j} \log p(x|\theta)
% = \frac{\partial}{\partial \theta_i} \left\{x_j – \frac{\partial}{\partial \theta_j}\psi(\theta)\right\}
% = – \frac{\partial^2}{\partial \theta_i \theta_j}\psi(\theta)
\]
At the same time, assuming that the order of integral and derivative can be switched,
\[
\begin{eqnarray}
E\left[ \frac{\partial}{\partial \theta_i} \log p(x|\theta) \right]
&=& \int \left\{\frac{\partial}{\partial \theta_i} \log p(x|\theta)\right\} p(x|\theta) dx\\
&=& \int \frac{\partial}{\partial \theta_i} p(x|\theta) dx
= \frac{\partial}{\partial \theta_i} \int p(x|\theta) dx = 0\\
% E\left[ \frac{\partial^2}{\partial \theta_i \partial \theta_j} \log p(x|\theta) \right]
% &=& \int \left\{\frac{\partial}{\partial \theta_j} \log p(x|\theta)\right\} p(x|\theta) dx\\
% &=& \int \frac{\partial^2}{\partial \theta_i \partial \theta_j} p(x|\theta) dx
% = \frac{\partial^2}{\partial \theta_i \partial \theta_j} \int p(x|\theta) dx = 0
\end{eqnarray}
\]
Therefore,
\[
E[x_i] = \frac{\partial}{\partial \theta_i} \psi_i
\]