Kim H. Parker added One_of_the_problems_encountered__.tex  about 8 years ago

Commit id: 8589b2447cafe584750cc859f4da4b4754650c92

deletions | additions      

         

One of the problems encountered when using information theory with measured data is how to estimate the underlying distribution function from the measured sample. We have used the Maximum Likelihood Estimator (MLE) which simply estimates the underlying distribution as the normalised histogram of the measured samples. This has the advantage that it is easy to implement and can be shown to approach the theoretical distribution as the number of samples approaches infinity. However, it has the disadvantage of depending upon the number of bins used in calculating the histogram. Too few bins results in a loss of resolution and too many results in a very noisy estimate for the distribution. The effect of the number of bins on the calculation of $\Delta$ is shown in Figure 5 where plots identical to Figure 4 (calculated for 32 bins) for different numbers of bins.   We see that the value of $c$ corresponding to the minimum value of $\Delta$ is unacceptably dependent on the number of bins. For 8 bins $c$ for $\Delta$ minimum is 8 m/s which is greater than $c_{SS}$. For 16 bins it is 6.5 m/s which is almost the same as $c_{SS}$. For larger numbers of bins $c$ continues to decrease as the number of bins increases, finally reaching 3 m/s for 256 bins.  There are various empirical rules for the choice of the number of bins based on the average/minimum number of samples per bin. For the data used to generate Figure 5, the total number of samples $N = 7001$ and so for the maximum number 256 bins shown there are an average of approximately 30 samples per bin. For smaller numbers of bins there are many samples per bin and the problem becomes one of resolution rather than sample size.  We have not found a solution to this problem to date and it may be necessary to explore other methods for estimating distributions to find a robust algorithm based on information theoretical arguments.