Automatic Nutrition Extraction From Text Recipes


The science of nutrition deals with all the effects on people of any component found in food. This starts with the physiological and biochemical processes involved in nourishment — how substances in food provide energy or are converted into body tissues, and the diseases that result from insufficiency or excess of essential nutrients (malnutrition). The role of food components in the development of chronic degenerative disease like coronary heart disease, cancers, dental caries, etc., are major targets of research activity nowadays. There is growing interaction between nutritional science and molecular biology (esp. nutrigenomics) which may help to explain the action of food components at the cellular level and the diversity of human biochemical responses. Howvever in our daily lives we cook recipes made of ingredients, instead of focusing on raw food components. Beyond dietitians’ advice and guidelines, it’s difficult to continuously measure our daily nutritional intake, without manually entering weight and amount of each constituent ingredients. Apart from this manual process, effective nutritional intake also depends on the cooking process, retention factors of the individual ingredients. To alleviate such difficulties we propose an algorithm and an accompanying web-based tool to automatically extract nutritional information from any text-based recipes


Recipes show a tremendous amount of diversity in cooking styles and ingredients some of which are highly community or culture or even country-specific. This diversity makes it challenging to design a system which can infer nutritional information without much manual intervention and with substantial accuracy. Although it’s possible to manually enter each ingredient from an enormous database, it’s often time consuming and impractical in our day-to-day lives. To automatically deduce nutritional information from textual recipes we’ve segmented the core procedure into following steps

  • Information Extraction (IE) from text recipes, using Rule-based or NLP (Natural Language Processing) parser

  • Conversion to structured data - amount, unit, ingredient name and any modifiers (ex. “lightly beaten”)

  • Mapping of each ingredient to an existing food ontology (USDA Food Database is used for demonstrative purpose. It can be extended to other food databases like NUTTAB)

  • Deduction of weights from various lexical clues and ingredient densities

  • Deduction of final nutritional information and