Results and
discussion
Performance in prediction of treatment
outcomes
Table 5 presents the best achieved classification results for each
particular treatment outcome. The highest prediction accuracy is
achieved for LOAC. This is due to a high number of correctly predicted
outcomes for responders (indicated by specificity) and non-responders
(indicated by sensitivity), as well as a high MCC. Although this trait
is less objective than lung function and FENO as it encompasses symptom
self-assessment, it reflects real-life treatment success best. This is
in concordance with the control-based management approach, focusing on
achieving adequate control of symptoms and minimizing future risks of
exacerbations12.
When predicting response according to FEV1, FENO and MEF50, an average
accuracy between 65% and 70% was achieved. This suggests that lung
function is not a preferred tool to be used to guide treatment in
children with asthma, which is highlighted in current GINA guidelines.
Lung function is a complex trait and reflects a number of structural and
functional changes to the airways due to chronic inflammation. It does
not correlate with symptom occurrence or severity well, especially in
children, as certain patients with poor lung function may not exhibit
severe symptoms and vice versa, certain patients with normal lung
function may experience symptom aggravation35,36.
Moreover, children with mild-to-moderate asthma using controller
treatment exhibit a slower decline in lung function in comparison with
deterioration of symptom control37, which is probably
why the model predicts these traits poorly, given the fact that most of
the patients in our study had milder disease forms. Compared to lung
function-based treatment outcomes predicting outcomes assessed by
changes in FENO performed better as it showed a slightly higher accuracy
and much better sensitivity (good prediction performance for
responders), see Table 5. This suggests that FENO can be used as a
predictor of steroid responsiveness even more consistently than other
parameters, e.g. lung function16,38. FENO is a good
biomarker of Th2- related allergic inflammatory response, as
interleukin-13 promotes nitric oxide (NO)- synthase activity and NO
production39. Moreover, the latest GINA
guidelines12 suggest that treatment guided by FENO in
children and young adults is associated with a significant reduction in
exacerbation rates and that it may be a good complementary approach
compatible with control-based asthma management. Additionally, since
FENO-based response was able to distinguish true responders quite well,
it may be useful in identifying patients with ineffective or suboptimal
treatment- those that require treatment adjustment12and those with poor adherence to treatment40.
However, for treatment outcomes according to lung function and FENO, a
much lower MCC (21%-26%) was achieved when compared to LOAC. This
suggests that the model generates a significant proportion of false
responders and non-responders for lung function- and FENO-based
outcomes, which further supports the control-guided asthma management
approach as a preferred option in guiding asthma treatment in children.
Additional results by means of Receiver Operating Characteristic (ROC)
curves41 and confusion matrices are presented in the
supplementary Figure s1.
Differences in the utilized classification algorithms and
sampling
method
We utilized two different classification methods and three sampling
techniques. Figure 2 shows the distribution of MCC across sampling
methods and classifiers. AdaBoost was the better performing classifier
for LOAC, FEV1, FENO, except of MEF50 where RF outperformed marginally.
This is of no surprise since boosting algorithms generally show good
performance with imbalanced sets42. Overall, not
sampling the data has in our case led to the best prediction results in
combination with AdaBoost. Using oversampling resulted in a better MCC
only for MEF50, which could indicate that when designing experiments
like these one has to take care of non-uniform feature spaces for the
rare or minor classes like responders here42. Even
though the differences are marginal, in medicine even the slightest
improvement may be important. These results can be explained by the
advantage of AdaBoost which learns sequentially on misclassification
from previous weak learners in the sequence and while over/under
sampling improves the results for RF, it is not the case with AdaBoost.
Additionally, since RF is trained in parallel, it is much faster in
practice, hence for training fast and large data sets RF with
oversampling could be used.
Model interpretation
For each target modeled in the experimental matrix, we calculated the
average PI for predictive variables (Table 6). The only predictive
variable passing the 1% threshold for the classification of LOAC-based
outcomes is “asthma severity”, highlighting the importance of an
expert assessment in asthma management from the start as well as the
impact of asthma heterogeneity and its phenotypes on treatment success.
Additionally, since both asthma control and severity encompass the
patient‘s (caregiver‘s) self-estimation on symptom frequency and
severity, these findings emphasize the importance of the patient‘s
involvement in the management plan, an essential part of the
”shared-care approach” in asthma management associated with improved
outcomes12.
Additionally,
LOAC appears to be associated with the least complexity in regard to
prediction, with only one important predictor, in spite of its great
power of prediction (Table 6). For this reason, we submitted this subset
and the target LOAC to a decision tree classifier (Figure 1, left) to
follow the decisions of the complex classifiers which consist of many
such weak classifiers, see Figure 3. This exemplary decision tree (only
a sub-part of an ensemble classifier) shows that children with milder
forms of the disease respond well to treatment with ICS as well as that
severe patients do not respond to treatment adequately, even though
their treatment was adjusted according to disease severity, i.e. severe
patients at baseline remain uncontrolled after 6 months of medication
use. This suggests that the model is capable of identifying severe
patients quite accurately, but the shortfall is that it does not inform
about the potential mechanisms underlying treatment failure. Also, it
seems that prominent markers of atopy (total IgE) are highly predictive
of treatment success. The vast majority of childhood asthma patients
have allergic asthma and a number of studies have shown that it is
sensitive to treatment with ICS. More specifically, T helper 2
lymphocyte, T2-high endotypes respond best to ICS. It also seems that
very high total IgE values are predictive of treatment failure (Figure
3), which is in consistency with previous findings that high serum IgE
is observed in children with severe asthma.43–45
Overall, the PI for any of the targets did not include any available
treatment variables, meaning the models did not use treatment variables
in creating decisions on treatment outcomes. Even though treatment
follows guidelines, these are not definite nor objective sensu
stricto . Guidelines actually provide general choice recommendations and
the physician is left to choose between several treatment options.
Although this may represent a potential bias in identifying true
responders vs. non-responders, it actually reflects the model‘s power of
prediction and favors the current symptom control-guided asthma
management approach.
Although FENO had a substantially lower accuracy and MCC than LOAC,
treatment outcomes according to FENO changes was capable of identifying
true responders quite well, indicating that FENO-guided treatment may be
a complementary tool in guiding asthma management in children. PI
revealed that predicting FENO is more complex in comparison to LOAC
(Table 6). This may be due to the fact that FENO reflects the level and
type of airway inflammation that drives the chronicity of the disease.
Elevated IgE and sensitization to inhaled allergens are common markers
of T2-high inflammation43 which is known to respond
better to anti-inflammatory treatment.
When comparing MEF50 and FEV1 to each other, MEF50-based outcomes were
predicted by FEV1 to a lesser extent than FEV1-based outcomes by MEF50.
Additionally, MEF50-based outcomes were also predicted by hsCRP (Table
6) which is a marker of subtly elevated systemic inflammation in asthma.
Evidence shows that increased hsCRP is associated with more severe
asthma outcomes46. This, in addition to the fact that
the model predicting MEF50-related response performed better in almost
all parameters (except specificity) compared to FEV1-related response
(see Table 5) and the fact that oversampling further improved the models
power in predicting true responders and non-responders (see Figure 2)
for MEF50, highlights the importance of the distal airways in children
with asthma47. The peripheral airways are the
predominant site of airway inflammation48 and may very
well be a predominant site of airflow obstruction in asthmatic children,
involved in the pathophysiology and resistance to treatment with
ICS49. Moreover, distal airways impairment may be
present despite rare and mild asthma symptoms and normal FEV1 in
pediatric patients.
Our results are in concordance with those of Ross et
al.11 who also identified asthma control as the
strongest predictive variable for LOAC. These authors only focused on
one type of ICS (budesonide) and chromones (nedocromil), while our study
encompassed all commonly used classes of anti-inflammatory controller
medication (ICS, LABA and LTRA). Ross et al.11 only
evaluated response to treatment according to symptom control, while our
study involved lung function- and FENO-based treatment outcomes.
Although the homogeneity of the population studied (a real-life
situation with most of the children having allergic asthma and milder
disease forms) could have been an advantage in identifying certain
phenotypes and genetic traits associated with treatment outcomes, this
was a disadvantage in identifying clear pathophysiological mechanisms
involved. Moreover, the sample size in our study might have been small
(N= 365 vs. N= 1019 in Ross et al.), possibly further hindering more
detailed endotype characterization. Ross et al. identified serum
eosinophils as one of the most predictive variables for asthma control,
while we identified IgE, supporting previous findings that children with
T2-high allergic asthma responds best to anti-inflammatory
treatment.50 Finally, even though GINA guidelines
suggest treatment response review every 3-6 months, the assessment
period in this study may have been too short to reflect biologically
significant and measurable effects, especially on complex traits such as
lung function changes in response to treatment.