Ingredients Extraction and Analysis: Difference between revisions

From Open Food Facts wiki
(Created page with "This page describes how the ingredients list extraction and ingredients analysis is done on Open Food Facts and points to ressources that could be used to improve it. == Obje...")
 
No edit summary
Line 7: Line 7:
The goal of ingredients list extraction is to get the text of the ingredients list of each product in exactly the same form as it appears on the product package and label.
The goal of ingredients list extraction is to get the text of the ingredients list of each product in exactly the same form as it appears on the product package and label.


=== Ingredients analysis ===


Once the ingredients list is available, we need to analyze it to recognize the actual ingredients and indications of quantity, labels, processing etc. There is a lot of variety in how ingredients are listed on products, with many different synonyms, ways to indicate sub-ingredients etc.


The analysis needs to work for ingredients lists written in many different languages.
The output is structured data that links to our multilingual ingredients taxonomy.


== Why it's important ==
== Why it's important ==


== Current process ==
Ingredients list extraction and analysis is necessary for many tasks:
 
* Detecting food additives and allergens
* Determining the degree of processing of food products (NOVA classification)
* Identifying food products that can be or cannot be eaten by people following specific diets:
** Vegetarian, vegan
** Casher, Halal
** Palm oil
* Estimating the carbon impact of ingredients
* Translating ingredients lists


=== Ingredients list extraction ===
== Ingredients list extraction ==


==== Data sources for ingredients lists ====
=== Data sources for ingredients lists ===


The possible input sources for the ingredients lists are:
The possible input sources for the ingredients lists are:
Line 29: Line 43:
*** Some products are hard to photograph (round cans and bottles, foil bags etc.)
*** Some products are hard to photograph (round cans and bottles, foil bags etc.)
*** Sometimes very poor lighting, orientation, camera, focus etc.
*** Sometimes very poor lighting, orientation, camera, focus etc.
**
* High resolution images or PDFs of the printable package
** Available for a few producers
** Perfect quality
** Needs cropping and/or rotation to select the ingredients list

Revision as of 13:39, 10 September 2019

This page describes how the ingredients list extraction and ingredients analysis is done on Open Food Facts and points to ressources that could be used to improve it.

Objectives

Ingredients list extraction

The goal of ingredients list extraction is to get the text of the ingredients list of each product in exactly the same form as it appears on the product package and label.

Ingredients analysis

Once the ingredients list is available, we need to analyze it to recognize the actual ingredients and indications of quantity, labels, processing etc. There is a lot of variety in how ingredients are listed on products, with many different synonyms, ways to indicate sub-ingredients etc.

The analysis needs to work for ingredients lists written in many different languages.

The output is structured data that links to our multilingual ingredients taxonomy.

Why it's important

Ingredients list extraction and analysis is necessary for many tasks:

  • Detecting food additives and allergens
  • Determining the degree of processing of food products (NOVA classification)
  • Identifying food products that can be or cannot be eaten by people following specific diets:
    • Vegetarian, vegan
    • Casher, Halal
    • Palm oil
  • Estimating the carbon impact of ingredients
  • Translating ingredients lists

Ingredients list extraction

Data sources for ingredients lists

The possible input sources for the ingredients lists are:

  • Ingredients lists typed in by users
    • Time consuming and not pleasant task, especially on mobile
    • Can contain typos, but usually typed ingredients lists are very close to what is written on the product
  • Ingredients lists given by manufacturers in data files
    • Usually of very good quality, but depending on manufacturers, can contain typos and sometimes formatting errors
  • Photos of product labels
    • Photo quality varies a lot
      • Some products are hard to photograph (round cans and bottles, foil bags etc.)
      • Sometimes very poor lighting, orientation, camera, focus etc.
  • High resolution images or PDFs of the printable package
    • Available for a few producers
    • Perfect quality
    • Needs cropping and/or rotation to select the ingredients list