Authorea

Kim H. Parker edited subsection_information_theory_Information_theory__.tex over 8 years ago

Commit id: 96e33a0845396caf3ed964291c9f5765f738ac0e

deletions | additions

Information theory was introduced originally in the context of transmitting information through noisy channels and introduced the idea of the entropy of a signal as a measure of its uncertainty.[refs] The concept is also useful in statistical physics where it is related to the entropy of the system originally introduced in thermodynamics. The theory is very well-developed and the reader is referred to almost any text on information theory for a thorough discussion of the concepts involved. We will use only a small faction of results of information theory: a measure of the 'distance' between two probability density functions which can be related to their entropy. Given a signals $X(x)$, $A(x)$, its entropy $H(X_x)$ $H(A_x)$ is defined as \[ H(X_x) H(A_x) = -\sum_{x \in X} \phi(x) A} \phi(A_x) \log \phi(x) \phi(S_x) \] where $\phi(x)$ $\phi(A_x)$ is the probability density function of $x$. $A$. It is a measure of the uncertainty of $x$ $A$ and its units depend on the base of the logarithm. We will use log base 2 which means that the units unit of entropy is bits. Given two probability density functions $X(x)$ $A(x)$ and $Y(x)$ $B(x)$ which are defined over the same variable $x$, the distance between them can be measured in several different ways. One of the first measures of the difference is the Kullback-Leibler divergence \[ D(A|B) D(A||B) = \sum_x \phi(X_x) \phi(A_x) \log \frac{\phi(X_x}{\phi(Y_x} \frac{\phi(A_x}{\phi(B_x} \] which is a This measure of uncertainty if $x$ distance has several disadvantages; it is not symmetric and it is treated as not a random variable. Similarly metric. The Jensen-Shannon divergence is defined using the joint entropy Kullback-Leibler divergence in a way that makes it symmetric \[ H(X,Y) JS(A;B) = - \sum_{x \in X} \sum_{y \in Y} p(x,y) \log p(x,y) \frac{1}{2} D(A||M) + \frac{1}{2} D(B||M) \] is a measure of uncertainty when $x$ and $y$ are considered jointly. It can be shown that where $\phi(M_x) = \frac{1}{2}\phi(A_x) + \frac{1}{2}\phi(B_x)$ \[ I(X,Y) From its definition it can easily be shown that \[ JS(A,B) = H(X) + H(Y) H(M) - H(X,Y) \frac{1}{2}H(A) - \frac{1}{2}H(B) \] That is, the mutual information Jensen-Shannon divergence is equal to the uncertainty in entropy of the average distribution of two signals considered separately distributions minus the uncertainties average of the two signals considered jointly. It follows from these definitions entropies of the individual distributions. Finally, it has been shown that $I(X,Y) the Jensen-Shannon distance defined as $\Delta_{JS}(A;B) = I(Y,X)$ JS(A;B)^{1/2}$ is a metric that satisfies symmetry and the triangular inequality. We will use this metric of the distance between probability density functions in the following. We also note that $I(X,X) = H(X)$. the definition of entropy involves knowledge of the probability density function and there are well-known problems in the estimation of the underlying pdf from a single sample of variables from the distribution, e.g. through the use of binned histograms. Some of these problems can be overcome by methods based on the mearest neighbour statistices of the sample. These methods will be discussed later. If we apply this measure to $dP_+$ and $dP_-$, quantities that depend on the measured $dP$ and $dU$ and the product $\rho c$, we can find the value of $\rho c$ that minimises the mutual information between $dP_+$ and $dP_-$.