ROUGH DRAFT authorea.com/7114
Main Data History
Export
Show Index Toggle 0 comments
  •  Quick Edit
  • Automatic Nutrition Extraction From Text Recipes

    Abstract

    The science of nutrition deals with all the effects on people of any component found in food. This starts with the physiological and biochemical processes involved in nourishment — how substances in food provide energy or are converted into body tissues, and the diseases that result from insufficiency or excess of essential nutrients (malnutrition). The role of food components in the development of chronic degenerative disease like coronary heart disease, cancers, dental caries, etc., are major targets of research activity nowadays. There is growing interaction between nutritional science and molecular biology (esp. nutrigenomics) which may help to explain the action of food components at the cellular level and the diversity of human biochemical responses. Howvever in our daily lives we cook recipes made of ingredients, instead of focusing on raw food components. Beyond dietitians’ advice and guidelines, it’s difficult to continuously measure our daily nutritional intake, without manually entering weight and amount of each constituent ingredients. Apart from this manual process, effective nutritional intake also depends on the cooking process, retention factors of the individual ingredients. To alleviate such difficulties we propose an algorithm and an accompanying web-based tool to automatically extract nutritional information from any text-based recipes

    Introduction

    Recipes show a tremendous amount of diversity in cooking styles and ingredients some of which are highly community or culture or even country-specific. This diversity makes it challenging to design a system which can infer nutritional information without much manual intervention and with substantial accuracy. Although it’s possible to manually enter each ingredient from an enormous database, it’s often time consuming and impractical in our day-to-day lives. To automatically deduce nutritional information from textual recipes we’ve segmented the core procedure into following steps

    • Information Extraction (IE) from text recipes, using Rule-based or NLP (Natural Language Processing) parser

    • Conversion to structured data - amount, unit, ingredient name and any modifiers (ex. “lightly beaten”)

    • Mapping of each ingredient to an existing food ontology (USDA Food Database is used for demonstrative purpose. It can be extended to other food databases like NUTTAB)

    • Deduction of weights from various lexical clues and ingredient densities

    • Deduction of final nutritional information and

    Core Proedue \label{fig:core_procedure}

    Procedure

    Information Extraction

    Instead of adopting a full-ML approach, we’ve tried to capture linguistic rules by representing them using regular expressions. The general idea behind this technique is specifying regular expressions that capture certain types of information. For example, the expression (watched|seen) (NP), where (NP) denotes a noun phrase, might capture the names of movies (represented by the noun phrase) in a set of documents. By specifying a set of rules like this, it is possible to extract a significant amount of information. The set of regular expressions are often implemented using finite-state transducers which consist of a series of finite-state automata. We’ve used Citrus to declaratively define the parsing expression grammer (PEG) using the Ruby language. For example