authorea.com/100324

MODESTA Machine Project Specifications

The simplest property that pixels in a region can share is intensity. Therefore, a natural way to segment such regions is to find a threshold that would separate the light and dark regions. This creates binary images from gray-level ones by turning all pixels below some threshold to zero (background pixels) and all pixels above that threshold to one (foreground pixels).

A simple way to find a suitable threshold is to find each of the modes (local maxima) and then find the valley (minimum) between them. One way to think of it is to consider the values in the two regions as two clusters. Our goal is to make each cluster as tight as possible, thus minimizing the overlap.

We can’t change the distribution but we can adjust where we separate them (the threshold). As we adjust the threshold one way, we increase the spread of one and decrease the spread of the other. The goal then is to select the threshold that minimizes the combined spread.

We can define the within-class variance as the weighted sum of the variances of each cluster:

\[\sigma^2_{\text{Within}}(T) = n_B(T)\sigma^2_B(T)+n_O(T)\sigma^2_O(T)\]

where

\[n_B(T) = \sum_{i=0}^{T-1}p(i)\] \[n_O(T) = \sum_{i=T}^{N-1}p(i)\] \[\sigma^2_B(T) = \text{the variance of the pixels in the background (below the threshold)}\] \[\sigma^2_O(T) = \text{the variance of the pixels in the foreground (above the threshold)}\]

and \([0,N-1]\) is the range of intensity levels.

Computing this within-class variance for each of the two classes for each possible threshold involves a lot of computation, but there is an easier way.

If you subtract the within-class variance of the combined distribution, you get something called the between-class variance:

\[\begin{aligned} \sigma_{\text{Between}}(T) &= \sigma^2 - \sigma^2_{\text{Within}}(T) \\ & = n_B(T)[\mu_B(T)-\mu]^2+n_O(T)[\mu_O(T)-\mu]^2\end{aligned}\]

where \(\sigma^2\) is the combined variance and \(\mu\) is the combined mean. Notice that the between-class variance is simply the weighted variance of the cluster means themselves around the overall mean. Substituting \(\mu = n_B(T)\mu_B(T)+n_O(T)\mu_O(T)\) and simplifying, we get \[\sigma^2_{\text{Between}}(T) = n_B(T)n_O(T)[\mu_B(T)-\mu_O(T)]^2\]

So, for each potential Threshold \(T\) we

Separate the pixels into two clusters according to the threshold.

Find the mean of each cluster.

Square the difference between the means.

Multiply by the number of pixels in one cluster times the number in the other.

This depends only on the difference between the means of the two clusters, thus avoiding having to calculate differences between individual intensities and the cluster means. The optimal threshold is the one that maximizes the between-class variance (or, conversely, minimizes the within-class variance).

This still sounds like a lot of work,

## Share on Social Media