Heterogeneity and endotype discovery
There is an increasing awareness that allergic diseases (asthma, eczema,
rhinitis, food allergy) are umbrella terms of subtypes characterized by
distinct disease mechanisms (endotypes). Developments in ML techniques
provide new ways to capture the heterogeneity in longitudinal patterns
of the development of distinct symptoms of allergic diseases in
individual patients. For example, childhood wheezing illness has been
extensively investigated using ML approaches to derive more homogenous
groups for genetic, mechanistic, and therapeutic studies. Most studies
modelled repeated measurements of wheeze through the life-course to
derive classes. These different symptom patterns may indicate distinct
biological mechanisms, and their discovery may facilitate stratified
treatment, but this is not certain (i.e., the classes may not directly
translate to endotypes). The derived classes . However, recent studies
from the US CREW and UK UNICORN consortia demonstrated that LCA using
binary information on wheezing might classify individuals imprecisely,
and children with identical wheezing patterns can be assigned to
different phenotypes. Recently, a novel data-driven method suggested a
potential way to improve assignment to wheeze “phenotypes”. Repeated
observations of current wheezing were transformed to derive
multidimensional indicators of wheezing spells (reflecting duration,
temporal sequencing, and the extent of persistence/recurrence).
Clustering these indicators resulted in a structure that was much more
robust to data imputation, and with a remarkably high agreement between
cluster assignment of individual children when using complete or imputed
data.
Similarly, over the past five years, longitudinal data on eczema was
clustered using data-driven approaches. There were notable differences
in the estimated prevalence of each phenotype, and inconsistent
associations with the filaggrin (FLG ) genotype.
Bayesian machine learning approach has been used to model the
development of eczema, wheeze, and rhinitis from birth to school-age.
The developmental profiles were heterogeneous, and the progression of
the symptoms fitting the atopic march profile was rare among those with
atopic comorbidities . The findings revealed eight latent profiles of
symptom development, each with different temporal patterns of their
co-manifestation, and distinct genetic associates. Further studies
indicated that atopic march, as initially described, occurs rarely, that
most 2-disease combinations occur by chance, but that there is a very
important cluster of multimorbidity (affecting ~8% of
the population that have a high disease burden).
Numerous studies have applied ML clustering to identify asthma subtypes
too. Different endotypes may have a specific response to treatment,
making this differentiation potentially clinically significant. Using
k-means clustering, researchers identified four distinct clusters of
asthma patients in the Severe Asthma Research Program with different
responses to corticosteroids (CS). One cluster involves patients, that
despite severe baseline airflow limitations, have the lowest response to
CS with almost no improvement in lung function, suggesting that this
group would benefit from alternative treatment options. The authors also
show that the variables that characterize the clusters robustly predict
cluster assignment in an independent test set.
A hypothesis-generating unbiased analysis which included data on lower
airway inflammation and infection from bronchoalveolar lavage in
preschool children with severe wheeze revealed four distinct
pathophysiological clusters of approximately equal size: (1) Atopic; (2)
Non-atopic, low infection rate; (3) Non-atopic, high infection rate; and
(4) Non-atopic, low infection rate, no inhaled corticosteroids (ICS),
with marked differences in BAL microbial profiles between the clusters.
In a multicenter prospective study, authors used clustering on
integrated clinical, virus, and serum proteome data to identify a
cluster in children with bronchiolitis with a significantly higher risk
of developing asthma by age six. Multi-omics has also been employed in
this domain, such as the novel and open-source method Merged Affinity
Network Association Clustering (MANAclust), which provides an automated
pipeline to integrate clinical and omics data. The authors identified
clinically and molecularly distinct asthma clusters that responded
differently to treatment, and substantial heterogeneity in healthy
controls. In another recent study, researchers used unsupervised
clustering on proteomics data of infants hospitalized with
bronchiolitis. They identified two distinct clusters with dysregulated
pathways and a higher risk for developing asthma. ML approaches have
also shown utility for clustering exhaled volatile organic compounds
(VOCs) in exhaled breath (breathomics), an exciting non-invasive
biomarker for airway disease sensitive to inflammation.
Pathways and disease mechanismsMulti-omics and system biology are comprehensive approaches expected to
increase insight into the complex biological mechanisms underlying
allergic and immunological diseases. The level of detail of such studies
can be increased further using single-cell methods, analyzing gene
expression profiles, chromatin accessibility, CpG methylation, or the
proteome in thousands of cells individually195,196. A
fully integrated reference atlas has recently been released for the
lung, with consensus annotations for 61 cell types based on data from
more than 100 healthy tissue donors. Using a trained model of this fully
integrated healthy lung cell atlas, the dataset was expanded by
projection and transfer learning using scArches to a dataset of more
than 2.4 million cells from more than 480 individuals. This illustrates
the use of deep learning in biology, to define cell types and states.
This extended Lung Cell Atlas allowed direct comparison of cell types
across datasets based on consensus labels, leading to the identification
of disease-associated cell states common to multiple lung
diseases197,198.
Drug and therapy development and precision medicine
AI has the potential to accelerate drug discovery and development
throughout the whole pipeline and contribute to precision medicine.
Precision medicine promises to enable personalized and more effective
treatments based on an individual’s genetic variability, environmental
exposures, and lifestyle. We here highlight promising examples for
treatment response analysis and drug repurposing.
Treatment responseIn a pediatric cohort, asthma control after six months of medication
could be accurately predicted using an AdaBoost classification
algorithm, outperforming traditional logistic regression.187 Wu et al. (2022) developed a supervised ML
model to predict low response to dupilumab in atopic dermatitis
patients. The authors identified various indicators of nonresponse,
including a high Quan-Charlson Comorbidity Index value, a claim for
ibuprofen, or no claims for prednisone medication before dupilumab
initiation. Similar approaches have been pursued to analyze nonresponse
to Type 2-directed biologics in asthma patients.
Drug repurposingArtificial intelligence has been used extensively in drug repurposing to
overcome the immense time and investments required for new drug
development. AI has been applied for virtual drug screening, treatment
combination optimization, and drug-target interaction predictions.
Patrick et al. (2019) developed a workflow to model drug-disease
relationships using unsupervised text analysis and supervised
classification for cutaneous diseases, including atopic dermatitis. They
created word embeddings – a dimensionality reduction method that
creates a lower-dimension projection of high-dimensional text data –
from 20 million abstracts in PubMed. Some of the strongest identified
associations were not directly mentioned in any research article,
demonstrating how the analysis of large-scale textual data can unveil
novel repurpose opportunities. Despite promising results in other
medical fields, we identify a research gap in target discovery and
clinical trial optimization applied in the allergy and immunology
domain. Also, of over 10,000 clinical studies related to allergic
diseases, we could only identify five with a fundamental role for
artificial intelligence (search ClinicalTrial.gov, performed March 13,
2023).