Welcome to Authorea!

Sebastian Raschka 10/11/2015

Sequential Forward Selection (SFS) is a classic feature selection algorithm -- a greedy search algorithm -- that has been developed as a suboptimal solution to the computationally often not feasible exhaustive search. In a nutshell, SFS adds one feature from the original feature set at the time, based on the classifier performance, until a feature subset of the desired size k is reached.

Now, The Sequential Floating Forward Selection (SFFS) algorithm can be considered as an extension of the simpler Sequential Forward Selection algorithm. In contrast to SFS, the SFFS algorithm has an additional exclusion step to remove features once they were included, so that a larger number of feature subset combinations can be sampled. It is important to emphasize that the removal of included features is conditional. The conditional exclusion in SFFS only occurs if the resulting feature subset is assessed as "better" by the criterion function after removal of a particular feature. Furthermore, I added an optional check to skip the conditional exclusion step if the algorithm gets stuck in cycles.


Sequential Forward Selection

Input: the set of all features, $Y = {y_1, y_2, ..., y_d}$

  • The SFFS algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions (d = 10).


Output: a subset of features, $X_k = {x_j ; | ;j = 1, 2, ..., k; ; x_j \in Y}$, where $k = (0, 1, 2, ..., d)$

  • The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space (k = 5, d = 10).


Initialization: $X_0 = Y$, $k = d$

  • We initialize the algorithm with an empty set ("null set") so that the k = 0 (where k is the size of the subset)


Step 1 (Inclusion):


     $x^+ = \text{ arg max } J(x_k + x), \text{ where } x \in Y - X_k$
     $X_k+1 = X_k + x^+$
     $k = k + 1$
    Go to Step 2

  • In step 1, we include the feature from the feature space that leads to the best performance increase for our feature subset (assessed by the criterion function). Then, we go over to step 2

Step 2 (Conditional Exclusion):


     $x^- = \text{ arg max } J(x_k - x), \text{ where } x \in X_k$
    $if ; J(x_k - x) > J(x_k - x)$:
         $X_k-1 = X_k - x^- $
         $k = k - 1$
    Go to Step 1

  • In step 1, we include the feature from the feature space that leads to the best performance increase for our feature subset (assessed by the criterion function). Then, we go over to step 2
  • In step 2, we only remove a feature if the resulting subset would gain an increase in performance. We go back to step 1.
  • Steps 1 and 2 are reapeated until the Termination criterion is reached.


Termination: stop when k equals the number of desired features

Citing other papers is easy. Voilà: (CMS/CERN 2012) or (Holstein 2009). Click on the cite button in the toolbar to search articles and cite them. Authorea also comes with a powerful commenting system. Don't agree that \(E = mc^{3}\)?!? Highlight the text you want to discuss or click the comment button. Find out more about using Authorea on our help page.

References

  1. CMS/CERN. A New Boson with a Mass of 125 GeV Observed with the CMS Experiment at the Large Hadron Collider. Science 338, 1569-1575 (2012). Link

  2. Barry R Holstein. The mysterious disappearance of Ettore Majorana. J. Phys.: Conf. Ser. 173, 012019 IOP Publishing, 2009. Link