F. Javier Moreno

and 4 more

BACKGROUND: With a society increasingly demanding alternative protein food sources, new strategies for evaluating protein safety issues, such as their allergenic potential, are needed. Large-scale and systemic studies on allergenic proteins are hindered by the limited and non-harmonized clinical information available for these substances in dedicated databases. A clearly missing key information is that representing the symptomatology of the allergens, especially given in terms of standard vocabularies, that would allow connecting with other biomedical resources to carry out different studies related to human health. In this work, we have generated the first resource with a comprehensive annotation of allergens’ symptomatology, using a text-mining approach that extracts significant co-mentions between these entities from the scientific literature. METHODS: The main resource of biomedical literature (PubMed, ~36 million abstracts) was mined to automatically extract relationships between allergens and clinical symptoms. The annotations are given in terms of standard vocabularies in widely used biomedical databases. The method identifies statistically significant co-mentions between the textual descriptions of the two types of entities in the literature as indication of relationship. RESULTS: 1,180 clinical signs extracted from the Human Phenotype Ontology (HPO), the Medical Subject Heading (MeSH) terms of PubMed together with other allergen-specific symptoms, were linked to 1,036 unique allergens annotated in the two main allergen-related public databases via 14,009 relationships. CONCLUSIONS: This resource could serve as a starting point for a future manually curated compilation of allergen symptomatology. The annotations are publicly available through an interactive web interface at [https://csbg.cnb.csic.es/CoMent_allergen/](https://csbg.cnb.csic.es/CoMent_allergen/).