Sebastian Raschka 10/11/2015

Sequential Forward Selection (SFS) is a classic feature selection algorithm -- a greedy search algorithm -- that has been developed as a suboptimal solution to the computationally often not feasible exhaustive search. In a nutshell, SFS adds one feature from the original feature set at the time, based on the classifier performance, until a feature subset of the desired size *k* is reached.

Now, The Sequential Floating Forward Selection (SFFS) algorithm can be considered as an extension of the simpler Sequential Forward Selection algorithm. In contrast to SFS, the SFFS algorithm has an additional exclusion step to remove features once they were included, so that a larger number of feature subset combinations can be sampled. It is important to emphasize that the removal of included features is conditional. The *conditional exclusion* in SFFS only occurs if the resulting feature subset is assessed as "better" by the criterion function after removal of a particular feature. Furthermore, I added an optional check to skip the conditional exclusion step if the algorithm gets stuck in cycles.

**Input:** the set of all features, $Y = {y_1, y_2, ..., y_d}$

- The
algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions (**SFFS**).**d = 10**

**Output:** a subset of features, $X_k = {x_j ; | ;j = 1, 2, ..., k; ; x_j \in Y}$, where $k = (0, 1, 2, ..., d)$

- The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space (
).**k = 5, d = 10**

**Initialization:** $X_0 = Y$, $k = d$

- We initialize the algorithm with an empty set ("null set") so that the
(where**k = 0**is the size of the subset)**k**

**Step 1 (Inclusion):**

$x^+ = \text{ arg max } J(x_k + x), \text{ where } x \in Y - X_k$

$X_k+1 = X_k + x^+$

$k = k + 1$

*Go to Step 2*

- In step 1, we include the feature from the
that leads to the best performance increase for our**feature space**(assessed by the**feature subset**). Then, we go over to step 2**criterion function**

**Step 2 (Conditional Exclusion):**

$x^- = \text{ arg max } J(x_k - x), \text{ where } x \in X_k$

$if ; J(x_k - x) > J(x_k - x)$:

$X_k-1 = X_k - x^- $

$k = k - 1$

*Go to Step 1*

- In step 1, we include the feature from the
that leads to the best performance increase for our**feature space**(assessed by the**feature subset**). Then, we go over to step 2**criterion function** - In step 2, we only remove a feature if the resulting subset would gain an increase in performance. We go back to step 1.
- Steps 1 and 2 are reapeated until the
**Termination**criterion is reached.

**Termination:** stop when * k* equals the number of desired features

`cite`

button in the toolbar to search articles and cite them. Authorea also comes with a powerful commenting system. Don't agree that \(E = mc^{3}\)?!? Highlight the text you want to discuss or click the comment button. Find out more about using Authorea on our help page.