Figure 1. Left: Typical (weak) tree classifier. A group of patients with
both responders and non-responders is to be separated based on given
predictive variables. The first split happens with a variable which
gives the best split for the two groups. The algorithms split the groups
until it gets leaf nodes (tree bottom) as pure as possible for the aimed
classes. Right: Simplified scheme of the random forest algorithm. The
trees represent weak classifiers which are aggregated via voting to form
a strong one. Every tree trains on a random part of the training data
(bootstrapping). The AdaBoost classifier trains the trees sequentially
instead of parallel.
Figure 2. Boxplot of the classification results by means of the MCC
(x-axis). The comparison includes classification results for the four
targets (FEV1, LOAC, FENO, MEF50), two classification algorithms (AB –
Ada Boost) and two sampling methods (Oversampling- OS and Cluster
Centroids- CC) compared to no-sampling (base). The best models assigned
per target by a red square surrounding the box.
Figure 3. An exemplary decision tree classifier where the treatment
outcome LOAC after six months was predicted by three predictive
variables (LOAC baseline, Asthma severity baseline, IGE_total). The
responders are assigned as R-LOAC, while the non-responders are assigned
as NR-LOAC. Asthma severity baseline is the first split. Most of the
responders will respond well to treatment if their Asthma severity was
estimated to have a value of 1. In an ensemble classifier a few hundreds
of these are trained on bootstrapped samples and averaged for prediction
which is explained in Figure 1. Ass_asthma_sev_basline- asthma
severity (according to GINA) grade assessed at baseline,
ass_asthma_ctrl_basline- asthma control assessed at baseline,
biom_ige_total- total serum IgE.