# Exponential Family （指数分布族）

$p(x|\theta) = \exp [ \theta\cdot\phi(x) – \psi(\theta) ] \tag{1.1}\label{eq:exp-family}$
ここで、$\psi(\theta)$は確率の和が１となる様にするための正規化項で，分配関数の対数である.
$\psi(\theta) = \log{Z(\theta)}\\ Z(\theta) = \int \exp [ \theta \cdot \phi(x) ] dx$

### Examples

#### 1-Dimensional Normal Distribution （１次元正規分布）

$p(x) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left\{-\frac{(x -\mu)^2}{2\sigma^2} \right\}$
$\begin{eqnarray} \log p(x) &=& -\log \left(\sqrt{2\pi}\sigma\right) – \frac{(x -\mu)^2}{2\sigma^2}\\ &=& -\frac{1}{2\sigma^2}x^2 + \frac{\mu}{\sigma^2}x -\log \left(\sqrt{2\pi}\sigma\right) – \frac{\mu^2}{2\sigma^2} \end{eqnarray}$
By defining as below, the distribution has the exponentail family form \eqref{eq:exp-family}
$\phi(x) = (x^2, x)\\ \theta = (-\frac{1}{2\sigma^2}, \frac{\mu}{\sigma^2})\\ \psi(\theta) = \log \left(\sqrt{2\pi}\sigma\right) + \frac{\mu^2}{2\sigma^2}$

#### Multinominal Distribution （多項分布） of Exponential Family （指数分布族）

Multinominal distribution on $[0,\ldots,n-1]$ whose probability of $i$ is $p_i$. The logarithm of probability distribution can be written as:
$\begin{eqnarray} \log p(x) &=& \log \left( \sum_{i=0}^{n-1} \delta(i,x) p_x \right)\\ &=& \sum_{i=0}^{n-1} \delta(i,x) \log p_x\\ &=& \left(1 -\sum_{i=1}^{n-1}\delta(i,x)\right)\log p_0 +\sum_{i=1}^{n-1} \delta(i,x) \log p_x\\ &=& \sum_{i=1}^{n-1} \delta(i,x) \log \frac{p_i}{p_0} + \log p_0 \end{eqnarray}$
By defining as below, the distribution has the exponentail family form \eqref{eq:exp-family}
$\phi(x) = \{\delta(i,x)\},\\ \theta = \{\log \frac{p_i}{p_0}\},\\ \psi(\theta) = -\log p_0 = \log \left[1+\sum_{i=1}^{n-1}\exp(\theta_i) \right].$

### Expectation, covariance of Exponential Family

From now on, rewrite $\phi(x)$ as just $x$ in general form of exponential family, assuming conversion of random variable and probability measure, as follows.
$p(x|\theta) = \exp [ \theta\cdot x – \psi(\theta) ] \tag{1.2}\label{eq:exp-family2}$
By taking partial derivative of $\log p(x|\theta)$ by $\theta$,
$\frac{\partial}{\partial \theta_i} \log p(x|\theta) = \frac{\partial}{\partial \theta_i} \left\{\theta\cdot x – \psi(\theta)\right\} = x_i – \frac{\partial}{\partial \theta_i}\psi(\theta)\\ % \frac{\partial^2}{\partial \theta_i \partial \theta_j} \log p(x|\theta) % = \frac{\partial}{\partial \theta_i} \left\{x_j – \frac{\partial}{\partial \theta_j}\psi(\theta)\right\} % = – \frac{\partial^2}{\partial \theta_i \theta_j}\psi(\theta)$
At the same time, assuming that the order of integral and derivative can be switched,
$\begin{eqnarray} E\left[ \frac{\partial}{\partial \theta_i} \log p(x|\theta) \right] &=& \int \left\{\frac{\partial}{\partial \theta_i} \log p(x|\theta)\right\} p(x|\theta) dx\\ &=& \int \frac{\partial}{\partial \theta_i} p(x|\theta) dx = \frac{\partial}{\partial \theta_i} \int p(x|\theta) dx = 0\\ % E\left[ \frac{\partial^2}{\partial \theta_i \partial \theta_j} \log p(x|\theta) \right] % &=& \int \left\{\frac{\partial}{\partial \theta_j} \log p(x|\theta)\right\} p(x|\theta) dx\\ % &=& \int \frac{\partial^2}{\partial \theta_i \partial \theta_j} p(x|\theta) dx % = \frac{\partial^2}{\partial \theta_i \partial \theta_j} \int p(x|\theta) dx = 0 \end{eqnarray}$
Therefore,
$E[x_i] = \frac{\partial}{\partial \theta_i} \psi_i$