Figure 1. Left: Typical (weak) tree classifier. A group of patients with both responders and non-responders is to be separated based on given predictive variables. The first split happens with a variable which gives the best split for the two groups. The algorithms split the groups until it gets leaf nodes (tree bottom) as pure as possible for the aimed classes. Right: Simplified scheme of the random forest algorithm. The trees represent weak classifiers which are aggregated via voting to form a strong one. Every tree trains on a random part of the training data (bootstrapping). The AdaBoost classifier trains the trees sequentially instead of parallel.
Figure 2. Boxplot of the classification results by means of the MCC (x-axis). The comparison includes classification results for the four targets (FEV1, LOAC, FENO, MEF50), two classification algorithms (AB – Ada Boost) and two sampling methods (Oversampling- OS and Cluster Centroids- CC) compared to no-sampling (base). The best models assigned per target by a red square surrounding the box.
Figure 3. An exemplary decision tree classifier where the treatment outcome LOAC after six months was predicted by three predictive variables (LOAC baseline, Asthma severity baseline, IGE_total). The responders are assigned as R-LOAC, while the non-responders are assigned as NR-LOAC. Asthma severity baseline is the first split. Most of the responders will respond well to treatment if their Asthma severity was estimated to have a value of 1. In an ensemble classifier a few hundreds of these are trained on bootstrapped samples and averaged for prediction which is explained in Figure 1. Ass_asthma_sev_basline- asthma severity (according to GINA) grade assessed at baseline, ass_asthma_ctrl_basline- asthma control assessed at baseline, biom_ige_total- total serum IgE.