Kim H. Parker edited subsection_information_theory_Information_theory__.tex  about 8 years ago

Commit id: 03f83e15c86f5bba2bd63486bde6fbc0e3a7dc69

deletions | additions      

       

\[  H(A_x) = -\sum_{x \in A} \phi(A_x) \log \phi(A_x)  \]  where $\phi(A_x)$ is the probability density function of $A$. It is a measure of the uncertainty of $A$ and its units depend on the base of the logarithm. We will use log base 2 in  which means that case  the unit of entropy is bits. Given two probability density functions $A(x)$ and $B(x)$ which are defined over the same variable $x$, the distance between them can be measured in several different ways. One of the first measures of the difference is the Kullback-Leibler divergence  \[ 

\]  This measure of distance has several disadvantages; it is not symmetric and it is not a metric. The Jensen-Shannon divergence is defined using the Kullback-Leibler divergence in a way that makes it symmetric  \[  JS(A;B) = \frac{1}{2} \big(D(A||M) D(A||M)  + D(B||M)\big) \frac{1}{2} D(B||M)  \]  where $\phi(M_x) = \frac{1}{2}\big(\phi(A_x) + \phi(B_x)\big)$  From its definition it can easily be shown that   \[  JS(A,B) = H(M) - \frac{1}{2}\big(H(A) + H(B)\big) \frac{1}{2}(H(A) - H(B))  \]  That is, the Jensen-Shannon divergence is equal to the entropy of the average distribution of the two distributions minus the average of the entropies of the individual distributions.