Results and discussion

Performance in prediction of treatment outcomes

Table 5 presents the best achieved classification results for each particular treatment outcome. The highest prediction accuracy is achieved for LOAC. This is due to a high number of correctly predicted outcomes for responders (indicated by specificity) and non-responders (indicated by sensitivity), as well as a high MCC. Although this trait is less objective than lung function and FENO as it encompasses symptom self-assessment, it reflects real-life treatment success best. This is in concordance with the control-based management approach, focusing on achieving adequate control of symptoms and minimizing future risks of exacerbations12.
When predicting response according to FEV1, FENO and MEF50, an average accuracy between 65% and 70% was achieved. This suggests that lung function is not a preferred tool to be used to guide treatment in children with asthma, which is highlighted in current GINA guidelines. Lung function is a complex trait and reflects a number of structural and functional changes to the airways due to chronic inflammation. It does not correlate with symptom occurrence or severity well, especially in children, as certain patients with poor lung function may not exhibit severe symptoms and vice versa, certain patients with normal lung function may experience symptom aggravation35,36. Moreover, children with mild-to-moderate asthma using controller treatment exhibit a slower decline in lung function in comparison with deterioration of symptom control37, which is probably why the model predicts these traits poorly, given the fact that most of the patients in our study had milder disease forms. Compared to lung function-based treatment outcomes predicting outcomes assessed by changes in FENO performed better as it showed a slightly higher accuracy and much better sensitivity (good prediction performance for responders), see Table 5. This suggests that FENO can be used as a predictor of steroid responsiveness even more consistently than other parameters, e.g. lung function16,38. FENO is a good biomarker of Th2- related allergic inflammatory response, as interleukin-13 promotes nitric oxide (NO)- synthase activity and NO production39. Moreover, the latest GINA guidelines12 suggest that treatment guided by FENO in children and young adults is associated with a significant reduction in exacerbation rates and that it may be a good complementary approach compatible with control-based asthma management. Additionally, since FENO-based response was able to distinguish true responders quite well, it may be useful in identifying patients with ineffective or suboptimal treatment- those that require treatment adjustment12and those with poor adherence to treatment40.
However, for treatment outcomes according to lung function and FENO, a much lower MCC (21%-26%) was achieved when compared to LOAC. This suggests that the model generates a significant proportion of false responders and non-responders for lung function- and FENO-based outcomes, which further supports the control-guided asthma management approach as a preferred option in guiding asthma treatment in children. Additional results by means of Receiver Operating Characteristic (ROC) curves41 and confusion matrices are presented in the supplementary Figure s1.

Differences in the utilized classification algorithms and sampling method

We utilized two different classification methods and three sampling techniques. Figure 2 shows the distribution of MCC across sampling methods and classifiers. AdaBoost was the better performing classifier for LOAC, FEV1, FENO, except of MEF50 where RF outperformed marginally. This is of no surprise since boosting algorithms generally show good performance with imbalanced sets42. Overall, not sampling the data has in our case led to the best prediction results in combination with AdaBoost. Using oversampling resulted in a better MCC only for MEF50, which could indicate that when designing experiments like these one has to take care of non-uniform feature spaces for the rare or minor classes like responders here42. Even though the differences are marginal, in medicine even the slightest improvement may be important. These results can be explained by the advantage of AdaBoost which learns sequentially on misclassification from previous weak learners in the sequence and while over/under sampling improves the results for RF, it is not the case with AdaBoost. Additionally, since RF is trained in parallel, it is much faster in practice, hence for training fast and large data sets RF with oversampling could be used.

Model interpretation

For each target modeled in the experimental matrix, we calculated the average PI for predictive variables (Table 6). The only predictive variable passing the 1% threshold for the classification of LOAC-based outcomes is “asthma severity”, highlighting the importance of an expert assessment in asthma management from the start as well as the impact of asthma heterogeneity and its phenotypes on treatment success. Additionally, since both asthma control and severity encompass the patient‘s (caregiver‘s) self-estimation on symptom frequency and severity, these findings emphasize the importance of the patient‘s involvement in the management plan, an essential part of the ”shared-care approach” in asthma management associated with improved outcomes12.
Additionally, LOAC appears to be associated with the least complexity in regard to prediction, with only one important predictor, in spite of its great power of prediction (Table 6). For this reason, we submitted this subset and the target LOAC to a decision tree classifier (Figure 1, left) to follow the decisions of the complex classifiers which consist of many such weak classifiers, see Figure 3. This exemplary decision tree (only a sub-part of an ensemble classifier) shows that children with milder forms of the disease respond well to treatment with ICS as well as that severe patients do not respond to treatment adequately, even though their treatment was adjusted according to disease severity, i.e. severe patients at baseline remain uncontrolled after 6 months of medication use. This suggests that the model is capable of identifying severe patients quite accurately, but the shortfall is that it does not inform about the potential mechanisms underlying treatment failure. Also, it seems that prominent markers of atopy (total IgE) are highly predictive of treatment success. The vast majority of childhood asthma patients have allergic asthma and a number of studies have shown that it is sensitive to treatment with ICS. More specifically, T helper 2 lymphocyte, T2-high endotypes respond best to ICS. It also seems that very high total IgE values are predictive of treatment failure (Figure 3), which is in consistency with previous findings that high serum IgE is observed in children with severe asthma.43–45
Overall, the PI for any of the targets did not include any available treatment variables, meaning the models did not use treatment variables in creating decisions on treatment outcomes. Even though treatment follows guidelines, these are not definite nor objective sensu stricto . Guidelines actually provide general choice recommendations and the physician is left to choose between several treatment options. Although this may represent a potential bias in identifying true responders vs. non-responders, it actually reflects the model‘s power of prediction and favors the current symptom control-guided asthma management approach.
Although FENO had a substantially lower accuracy and MCC than LOAC, treatment outcomes according to FENO changes was capable of identifying true responders quite well, indicating that FENO-guided treatment may be a complementary tool in guiding asthma management in children. PI revealed that predicting FENO is more complex in comparison to LOAC (Table 6). This may be due to the fact that FENO reflects the level and type of airway inflammation that drives the chronicity of the disease. Elevated IgE and sensitization to inhaled allergens are common markers of T2-high inflammation43 which is known to respond better to anti-inflammatory treatment.
When comparing MEF50 and FEV1 to each other, MEF50-based outcomes were predicted by FEV1 to a lesser extent than FEV1-based outcomes by MEF50. Additionally, MEF50-based outcomes were also predicted by hsCRP (Table 6) which is a marker of subtly elevated systemic inflammation in asthma. Evidence shows that increased hsCRP is associated with more severe asthma outcomes46. This, in addition to the fact that the model predicting MEF50-related response performed better in almost all parameters (except specificity) compared to FEV1-related response (see Table 5) and the fact that oversampling further improved the models power in predicting true responders and non-responders (see Figure 2) for MEF50, highlights the importance of the distal airways in children with asthma47. The peripheral airways are the predominant site of airway inflammation48 and may very well be a predominant site of airflow obstruction in asthmatic children, involved in the pathophysiology and resistance to treatment with ICS49. Moreover, distal airways impairment may be present despite rare and mild asthma symptoms and normal FEV1 in pediatric patients.
Our results are in concordance with those of Ross et al.11 who also identified asthma control as the strongest predictive variable for LOAC. These authors only focused on one type of ICS (budesonide) and chromones (nedocromil), while our study encompassed all commonly used classes of anti-inflammatory controller medication (ICS, LABA and LTRA). Ross et al.11 only evaluated response to treatment according to symptom control, while our study involved lung function- and FENO-based treatment outcomes. Although the homogeneity of the population studied (a real-life situation with most of the children having allergic asthma and milder disease forms) could have been an advantage in identifying certain phenotypes and genetic traits associated with treatment outcomes, this was a disadvantage in identifying clear pathophysiological mechanisms involved. Moreover, the sample size in our study might have been small (N= 365 vs. N= 1019 in Ross et al.), possibly further hindering more detailed endotype characterization. Ross et al. identified serum eosinophils as one of the most predictive variables for asthma control, while we identified IgE, supporting previous findings that children with T2-high allergic asthma responds best to anti-inflammatory treatment.50 Finally, even though GINA guidelines suggest treatment response review every 3-6 months, the assessment period in this study may have been too short to reflect biologically significant and measurable effects, especially on complex traits such as lung function changes in response to treatment.