Substitution Matrix

Suppose that sequences \(x\) and \(y\) are generated based on

(1) Random model R

Letter \(a\) occurs independently with some frequency \(q_a\)

Probability of the two sequence

\[ P(x,y|R) = \prod_i q_{x_i} \prod_j q_{y_j} \]

(2) Match model M

Aligned pair of residues \(a\) and \(b\) occur with a joint probability \(p_{ab}\)

Probability of the whole alignment
\[ P(x,y|M) = \prod_i p_{x_i y_i} \]
The ratio of these likelihoods is known as the odds ratio:

For the additive scoring system, take the logarithm (log-odds ratio)
(2.2)
where (2.3)
The scores can be arranged in a matrix,
called as a score matrix or a substitution matrix