Heterogeneity and endotype discovery
There is an increasing awareness that allergic diseases (asthma, eczema, rhinitis, food allergy) are umbrella terms of subtypes characterized by distinct disease mechanisms (endotypes). Developments in ML techniques provide new ways to capture the heterogeneity in longitudinal patterns of the development of distinct symptoms of allergic diseases in individual patients. For example, childhood wheezing illness has been extensively investigated using ML approaches to derive more homogenous groups for genetic, mechanistic, and therapeutic studies. Most studies modelled repeated measurements of wheeze through the life-course to derive classes. These different symptom patterns may indicate distinct biological mechanisms, and their discovery may facilitate stratified treatment, but this is not certain (i.e., the classes may not directly translate to endotypes). The derived classes . However, recent studies from the US CREW and UK UNICORN consortia demonstrated that LCA using binary information on wheezing might classify individuals imprecisely, and children with identical wheezing patterns can be assigned to different phenotypes. Recently, a novel data-driven method suggested a potential way to improve assignment to wheeze “phenotypes”. Repeated observations of current wheezing were transformed to derive multidimensional indicators of wheezing spells (reflecting duration, temporal sequencing, and the extent of persistence/recurrence). Clustering these indicators resulted in a structure that was much more robust to data imputation, and with a remarkably high agreement between cluster assignment of individual children when using complete or imputed data.
Similarly, over the past five years, longitudinal data on eczema was clustered using data-driven approaches. There were notable differences in the estimated prevalence of each phenotype, and inconsistent associations with the filaggrin (FLG ) genotype.
Bayesian machine learning approach has been used to model the development of eczema, wheeze, and rhinitis from birth to school-age. The developmental profiles were heterogeneous, and the progression of the symptoms fitting the atopic march profile was rare among those with atopic comorbidities . The findings revealed eight latent profiles of symptom development, each with different temporal patterns of their co-manifestation, and distinct genetic associates. Further studies indicated that atopic march, as initially described, occurs rarely, that most 2-disease combinations occur by chance, but that there is a very important cluster of multimorbidity (affecting ~8% of the population that have a high disease burden).
Numerous studies have applied ML clustering to identify asthma subtypes too. Different endotypes may have a specific response to treatment, making this differentiation potentially clinically significant. Using k-means clustering, researchers identified four distinct clusters of asthma patients in the Severe Asthma Research Program with different responses to corticosteroids (CS). One cluster involves patients, that despite severe baseline airflow limitations, have the lowest response to CS with almost no improvement in lung function, suggesting that this group would benefit from alternative treatment options. The authors also show that the variables that characterize the clusters robustly predict cluster assignment in an independent test set.
A hypothesis-generating unbiased analysis which included data on lower airway inflammation and infection from bronchoalveolar lavage in preschool children with severe wheeze revealed four distinct pathophysiological clusters of approximately equal size: (1) Atopic; (2) Non-atopic, low infection rate; (3) Non-atopic, high infection rate; and (4) Non-atopic, low infection rate, no inhaled corticosteroids (ICS), with marked differences in BAL microbial profiles between the clusters. In a multicenter prospective study, authors used clustering on integrated clinical, virus, and serum proteome data to identify a cluster in children with bronchiolitis with a significantly higher risk of developing asthma by age six. Multi-omics has also been employed in this domain, such as the novel and open-source method Merged Affinity Network Association Clustering (MANAclust), which provides an automated pipeline to integrate clinical and omics data. The authors identified clinically and molecularly distinct asthma clusters that responded differently to treatment, and substantial heterogeneity in healthy controls. In another recent study, researchers used unsupervised clustering on proteomics data of infants hospitalized with bronchiolitis. They identified two distinct clusters with dysregulated pathways and a higher risk for developing asthma. ML approaches have also shown utility for clustering exhaled volatile organic compounds (VOCs) in exhaled breath (breathomics), an exciting non-invasive biomarker for airway disease sensitive to inflammation.
Pathways and disease mechanismsMulti-omics and system biology are comprehensive approaches expected to increase insight into the complex biological mechanisms underlying allergic and immunological diseases. The level of detail of such studies can be increased further using single-cell methods, analyzing gene expression profiles, chromatin accessibility, CpG methylation, or the proteome in thousands of cells individually195,196. A fully integrated reference atlas has recently been released for the lung, with consensus annotations for 61 cell types based on data from more than 100 healthy tissue donors. Using a trained model of this fully integrated healthy lung cell atlas, the dataset was expanded by projection and transfer learning using scArches to a dataset of more than 2.4 million cells from more than 480 individuals. This illustrates the use of deep learning in biology, to define cell types and states. This extended Lung Cell Atlas allowed direct comparison of cell types across datasets based on consensus labels, leading to the identification of disease-associated cell states common to multiple lung diseases197,198.
Drug and therapy development and precision medicine AI has the potential to accelerate drug discovery and development throughout the whole pipeline and contribute to precision medicine. Precision medicine promises to enable personalized and more effective treatments based on an individual’s genetic variability, environmental exposures, and lifestyle. We here highlight promising examples for treatment response analysis and drug repurposing.
Treatment responseIn a pediatric cohort, asthma control after six months of medication could be accurately predicted using an AdaBoost classification algorithm, outperforming traditional logistic regression.187 Wu et al. (2022) developed a supervised ML model to predict low response to dupilumab in atopic dermatitis patients. The authors identified various indicators of nonresponse, including a high Quan-Charlson Comorbidity Index value, a claim for ibuprofen, or no claims for prednisone medication before dupilumab initiation. Similar approaches have been pursued to analyze nonresponse to Type 2-directed biologics in asthma patients.
Drug repurposingArtificial intelligence has been used extensively in drug repurposing to overcome the immense time and investments required for new drug development. AI has been applied for virtual drug screening, treatment combination optimization, and drug-target interaction predictions. Patrick et al. (2019) developed a workflow to model drug-disease relationships using unsupervised text analysis and supervised classification for cutaneous diseases, including atopic dermatitis. They created word embeddings – a dimensionality reduction method that creates a lower-dimension projection of high-dimensional text data – from 20 million abstracts in PubMed. Some of the strongest identified associations were not directly mentioned in any research article, demonstrating how the analysis of large-scale textual data can unveil novel repurpose opportunities. Despite promising results in other medical fields, we identify a research gap in target discovery and clinical trial optimization applied in the allergy and immunology domain. Also, of over 10,000 clinical studies related to allergic diseases, we could only identify five with a fundamental role for artificial intelligence (search ClinicalTrial.gov, performed March 13, 2023).