Estimation on binary space
Problem 1 (Pairwise alignment of two biological sequences)
Given a pair of biological (DNA, RNA, protein) sequences \(x\) and \(x′\), predict their alignment as a point in \(A(x,x′)\), the space of all the possible alignments of \(x\) and \(x′\).
A point in \(A(x,x′)\), can be represented as a binary vector of \(|x||x′|\) dimensions by denoting the aligned bases across the two sequences as ”1” and the remaining pairs of bases as ”0”.
Problem 2 (Prediction of the secondary structure of an RNA sequence)
Given an RNA sequence \(x\), predict its secondary structure as a point in \(S(x)\), the space of all the possible secondary structures of \(x\).
A point in \(S(x)\) can also be represented as a binary vector of \(|x|(|x| − 1)/2\) dimensions, which represent all the pairs of the base positions in x, by denoting the base pairs in the secondary structures as ”1”.
In each problem, the predictive space (\(A(x,x′)\) or \(S(x))\) is a subset of a binary space (\(\{0,1\}^{|x||x′|}\) or \(\{0,1\}^{|x|(|x|−1)/2}\)) because the combinations of aligned bases or base pairs are restricted. Therefore, Problem 1 and Problem 2 are special cases of the following general problem:
Problem 3 (Estimation problem on a binary space)
Given a data set D and a predictive space Y (the set of all candidates of prediction), which is a subset of n-dimensional binary \(\{0, 1\}^n\) , that is, \(Y ⊂ \{0, 1\}^n\) , predict a point \(y\) in the predictive space \(Y\) .