this is for holding javascript data
Cato edited Gibbs.tex
over 10 years ago
Commit id: 903b1d36ca639379f1e4eb4de90b5ecbd91180ba
deletions | additions
diff --git a/Gibbs.tex b/Gibbs.tex
index 0388027..0dd6937 100644
--- a/Gibbs.tex
+++ b/Gibbs.tex
...
H = -\sum_i p_i \log_2 p_i.
\end{equation}
However, there may be larger-scale patterns in the string. Then we have to make more
effort. effort (WHY?). For example, \texttt{0101010101\cdots} has the same entropy as \texttt{010011000111\cdots}, but clearly there's something different going on! I see two
cases cases, which have to be treated differently:
\begin{enumerate}
\item The bits come in chunks. Then we have to ``renormalise'' by using a different basis. E.g. DNA code should be expressed in base 4. For an alphabet of $m$ letters (binary: $m=2$; DNA: $m=4$) the corresponding informational entropy is
...
\end{equation}
We can be more sophisticated than this with dictionary-definition algorithm e.g. Lempel-Ziv comma-counting, which captures repetitive structures (but perhaps not the right ones!). Entropy can become very low as chunks get larger (do we talk about a book in the set of books, or the information content of a book?).
\item If the patterns that arise don't correspond to larger units of information
as that can be re-labelled (as in point
1, 1), perhaps we have to experiment with conditional probabilities, i.e.
develop assess probability strings with {\it
memory}. memory} of whole sections of the string. \\
The simplest way to do this seems to be the so-called ``Markov
model'': model'' (see section~\ref{sec:markov} below). But what we really need is to construct the {\it correlation function} of the data.
\end{enumerate}
\subsubsection{Markov model}
\label{sec:markov}
This is actually much less useful than I originally thought, but here it is anyway.
\begin{itemize}
\item A zeroth-order Markov model assumes each character is independent of the one before, so the entropy is given by equation~\ref{eqn:indep_entropy}.
\item In a first-order Markov model, the probability of each character ($j$) depends on the character immediately preceding it ($i$). Then,
\begin{equation}
H = -\sum_i p(i) \sum_j p(j|i) \log_2 p(j|i).
\end{equation}
\item And so on for higher orders.
\end{itemize}
\end{enumerate}
%%--------------------------------------------------------------------------------------
\subsection{Aside: notes on DNA}