Authorea

Kim H. Parker edited subsection_information_theory_Information_theory__.tex about 8 years ago

Commit id: 03f83e15c86f5bba2bd63486bde6fbc0e3a7dc69

deletions | additions

\[ H(A_x) = -\sum_{x \in A} \phi(A_x) \log \phi(A_x) \] where $\phi(A_x)$ is the probability density function of $A$. It is a measure of the uncertainty of $A$ and its units depend on the base of the logarithm. We will use log base 2 in which means that case the unit of entropy is bits. Given two probability density functions $A(x)$ and $B(x)$ which are defined over the same variable $x$, the distance between them can be measured in several different ways. One of the first measures of the difference is the Kullback-Leibler divergence \[

\] This measure of distance has several disadvantages; it is not symmetric and it is not a metric. The Jensen-Shannon divergence is defined using the Kullback-Leibler divergence in a way that makes it symmetric \[ JS(A;B) = \frac{1}{2} \big(D(A||M) D(A||M) + D(B||M)\big) \frac{1}{2} D(B||M) \] where $\phi(M_x) = \frac{1}{2}\big(\phi(A_x) + \phi(B_x)\big)$ From its definition it can easily be shown that \[ JS(A,B) = H(M) - \frac{1}{2}\big(H(A) + H(B)\big) \frac{1}{2}(H(A) - H(B)) \] That is, the Jensen-Shannon divergence is equal to the entropy of the average distribution of the two distributions minus the average of the entropies of the individual distributions.