Substitution Matrix
Suppose that sequences \(x\) and \(y\) are generated based on
(1) Random model R
Letter \(a\) occurs independently with some frequency \(q_a\)
Probability of the two sequence
\[ P(x,y|R) = \prod_i q_{x_i} \prod_j q_{y_j} \]
(2) Match model M
Aligned pair of residues \(a\) and \(b\) occur with a joint probability \(p_{ab}\)
Probability of the whole alignment
\[ P(x,y|M) = \prod_i p_{x_i y_i} \]
The ratio of these likelihoods is known as the odds ratio:
For the additive scoring system, take the logarithm (log-odds ratio)
(2.2)
where (2.3)
The scores can be arranged in a matrix,
called as a score matrix or a substitution matrix