# Estimation on binary space

### Problem 1 (Pairwise alignment of two biological sequences)

Given a pair of biological (DNA, RNA, protein) sequences $x$ and $x′$, predict their alignment as a point in $A(x,x′)$, the space of all the possible alignments of $x$ and $x′$.
A point in $A(x,x′)$, can be represented as a binary vector of $|x||x′|$ dimensions by denoting the aligned bases across the two sequences as ”1” and the remaining pairs of bases as ”0”.

### Problem 2 (Prediction of the secondary structure of an RNA sequence)

Given an RNA sequence $x$, predict its secondary structure as a point in $S(x)$, the space of all the possible secondary structures of $x$.
A point in $S(x)$ can also be represented as a binary vector of $|x|(|x| − 1)/2$ dimensions, which represent all the pairs of the base positions in x, by denoting the base pairs in the secondary structures as ”1”.

In each problem, the predictive space ($A(x,x′)$ or $S(x))$ is a subset of a binary space ($\{0,1\}^{|x||x′|}$ or $\{0,1\}^{|x|(|x|−1)/2}$) because the combinations of aligned bases or base pairs are restricted. Therefore, Problem 1 and Problem 2 are special cases of the following general problem:

### Problem 3 (Estimation problem on a binary space)

Given a data set D and a predictive space Y (the set of all candidates of prediction), which is a subset of n-dimensional binary $\{0, 1\}^n$ , that is, $Y ⊂ \{0, 1\}^n$ , predict a point $y$ in the predictive space $Y$ .