Cato Markov models for conditional probabilities  over 10 years ago

Commit id: c070aae1da112a0d0c5897e8b47a6998b9c5f7dd

deletions | additions      

       

Not that people are often a little loose with vocabulary -- entropy is a property of {\it distributions}, not (technically) of individual sequences. We observe the sequence as a variate from some parent distribution, and estimate the entropy of the parent. For this approach to succeed in complicated cases, we will need a long long sequence (see e.g. Farach etal. 1994).  When bits are written randomly and independently, the entropy of the source is given by  \begin{equation} \begin{equation}\label{eqn:indep_entropy}  H = -\sum_i p_i \log_2 p_i.  \end{equation} 

\item The bits come in chunks. Then we have to ``renormalise'' by using a different basis. E.g. DNA code should be expressed in base 4. For an alphabet of $m$ letters (binary: $m=2$; DNA: $m=4$) the corresponding informational entropy is  \begin{equation}  H = -\sum_i p_i \log_m p_i.  \end{equation}\\ \end{equation}  We can be more sophisticated than this with dictionary-definition algorithm e.g. Lempel-Ziv comma-counting, which captures repetitive structures (but perhaps not the right ones!). Entropy can become very low as chunks get larger (do we talk about a book in the set of books, or the information content of a book?).  \item If the patterns that arise don't correspond to larger units of information as in point 1, perhaps we have to experiment with conditional probabilities. Develop probabilities, i.e. develop  probability strings with memory (e.g. $p(1|0)$, $p(1|110)$), and try {\it memory}. The simplest way  to quantify do this seems to be the so-called ``Markov model'':   \begin{itemize}   \item A zeroth-order Markov model assumes each character is independent of the one before, so  the entropy (re-read Lesne 2011, Farach 1994). is given by equation~\ref{eqn:indep_entropy}.   \item In a first-order Markov model, the probability of each character ($j$) depends on the character immediately preceding it ($i$). Then,   \begin{equation}   H = -\sum_i p(i) \sum_j p(j|i) \log_2 p(j|i).   \end{equation}   \item And so on for higher orders.   \end{itemize}  \end{enumerate}  %%--------------------------------------------------------------------------------------