How to improve ingredients analysis: Difference between revisions

From Open Food Facts wiki
(Created page with "This page describes the concrete actions that can be taken to improve ingredients analysis in a specific language. == Read those first * Ingredients Extraction and Analysi...")
 
No edit summary
Line 1: Line 1:
This page describes the concrete actions that can be taken to improve ingredients analysis in a specific language.
This page describes the concrete actions that can be taken to improve ingredients analysis in a specific language.


== Read those first
== Read this first ==


* [[Ingredients Extraction and Analysis]]: high level view and detailed description of all the steps
* [[Ingredients Extraction and Analysis]]: high level view and detailed description of all the steps
* [[Ingredients Analysis Quality]]: Metrics for ingredients analysis.
* [[Ingredients Analysis Quality]]: Metrics for ingredients analysis.


== Determining the actions that will have the biggest impact to ingredients analysis in a specific language
== Determining the actions that will have the biggest impact to ingredients analysis in a specific language ==


On the Open Food Facts web site, it is possible to see the result of ingredient analysis for a subset of product by adding /ingredients to the URL.
=== Finding out which ingredient we were not able to recognize ===


e.g. on https://de.openfoodfacts.org you can add /ingredients to see all ingredients for products in Germany: https://de.openfoodfacts.org/ingredients
==== All ingredients ====
 
On the Open Food Facts web site, it is possible to see the result of ingredient analysis for a subset of product by adding /ingredients to the URL. It's also available in the "Explore products" drilldown button by selecting "ingredients".
 
e.g. on https://de.openfoodfacts.org you can add /ingredients to see all ingredients for products in Germany: https://de.openfoodfacts.org/ingredients (warning, it's a very long page that can make your browser hang)
 
Here is a more manageable result on a smaller subset (products sold in Germany that have a fair trade label): https://de.openfoodfacts.org/label/fairer-handel/ingredients
 
<pre>
998 Inhaltsstoffe:
 
Inhaltsstoff Produkte *
Zucker 185
Kakao 167
Emulgator 125
E322 122
Kakaobutter 114
Kakaomasse 109
Speiseöle und -fette 107
Milcherzeugnisse 101
Pflanzliche Öle und Fette 92
Speisesalz 90
Kakaopulver 78
Aroma 74
</pre>
 
If you go down the list, you will see ingredients in italics, with a * in the right column:
 
<pre>
fr:carrageen-aus-biologischem-anbau 1 *
kirschlikör-füllung 1 *
natūrliche-aromen 1 *
produkt-kann-spuren-von-erdnüssen 1 *
teilweise-hydriertes-sojaöl 1 *
milchpulvererzeugnis 1 *
en:zucker 1 *
fettarmes-kakaopuler 1 *
</pre>
 
Those are ingredients that we did not recognize.
 
==== Ingredients stats ====
 
Adding ?stats=1 to the URL enables to see a summary count of known and unknown ingredients.
 
e.g. https://de.openfoodfacts.org/ingredients?stats=1
 
==== Unknown ingredients ====
 
You can then see only unknown ingredients by clicking on "unknown":
 
e.g. https://de.openfoodfacts.org/ingredients?status=unknown
 
Ingredients are ordered by the number of products they appear in, so it makes it easier to focus on the biggest problems first.
 
==== Filtering ingredients list ====
 
You can also add ?filter=something (or &filter=something if you already have another parameter in the url).
 
e.g. https://de.openfoodfacts.org/ingredients?status=unknown&filter=aroma
 
=== Reasons why we didn't recognize some ingredients ===
 
* The ingredient is not in the language of the ingredients list
** The ingredients list in one language is put in the field for the ingredients list in another language
*** This is often the case when the "main language" of the product was set to a wrong language, e.g. French for a German product.
*** e.g. fr:carrageen-aus-biologischem-anbau (German ingredients in French ingredients list)
** The ingredient list was not cut correctly and we have lists of ingredients in multiple languages instead of separated lists
*** This is often the case when apps send us the result of the OCR of a big blob of text on the product label
* The input ingredient list is incorrect (mispellings etc.)
* The ingredients list contains other things than ingredients and we have not been able to split it at the right place.
* We have not been able to parse a particular sentence structure or formating (e.g. "a drop of delicious honey with a shower of powder sugar")
* The ingredient is not yet present in our taxonomy (or not with the right synonym or in the right language)
 
=== Concrete steps to improve ingredients analysis ===
 
* Analyze the results of the ingredients analysis: https://de.openfoodfacts.org/ingredients?status=unknown
** Classify the biggest problems you see
*** Are many ingredients correct but missing in the taxonomy?
**** Complete the ingredients taxonomy
***** Verify if we have an existing entry in English, so that you can add a translation
***** Else add a new entry in English + your target language (and possibly other languages too if you know the translation)
*** Is there a particular structure that is not parsed correctly?
**** e.g. multiple ingredients compounded together "Ingredient A and Ingredient B"
**** e.g. an ingredient plus something that specify how it was processed, or a label "Organic something", "cooked sliced something"
*** Open bugs with examples for the most common structures that we should better support
**** e.g. https://github.com/openfoodfacts/openfoodfacts-server/issues/3020 for "raw something".
**** bugs should be filed in https://github.com/openfoodfacts/openfoodfacts-server/issues , tag them with the "ingredients-analysis" label.

Revision as of 10:36, 13 March 2020

This page describes the concrete actions that can be taken to improve ingredients analysis in a specific language.

Read this first

Determining the actions that will have the biggest impact to ingredients analysis in a specific language

Finding out which ingredient we were not able to recognize

All ingredients

On the Open Food Facts web site, it is possible to see the result of ingredient analysis for a subset of product by adding /ingredients to the URL. It's also available in the "Explore products" drilldown button by selecting "ingredients".

e.g. on https://de.openfoodfacts.org you can add /ingredients to see all ingredients for products in Germany: https://de.openfoodfacts.org/ingredients (warning, it's a very long page that can make your browser hang)

Here is a more manageable result on a smaller subset (products sold in Germany that have a fair trade label): https://de.openfoodfacts.org/label/fairer-handel/ingredients

998 Inhaltsstoffe:

Inhaltsstoff	Produkte	*
Zucker	185	
Kakao	167	
Emulgator	125	
E322	122	
Kakaobutter	114	
Kakaomasse	109	
Speiseöle und -fette	107	
Milcherzeugnisse	101	
Pflanzliche Öle und Fette	92	
Speisesalz	90	
Kakaopulver	78	
Aroma	74	

If you go down the list, you will see ingredients in italics, with a * in the right column:

fr:carrageen-aus-biologischem-anbau	1	*
kirschlikör-füllung	1	*
natūrliche-aromen	1	*
produkt-kann-spuren-von-erdnüssen	1	*
teilweise-hydriertes-sojaöl	1	*
milchpulvererzeugnis	1	*
en:zucker	1	*
fettarmes-kakaopuler	1	*

Those are ingredients that we did not recognize.

Ingredients stats

Adding ?stats=1 to the URL enables to see a summary count of known and unknown ingredients.

e.g. https://de.openfoodfacts.org/ingredients?stats=1

Unknown ingredients

You can then see only unknown ingredients by clicking on "unknown":

e.g. https://de.openfoodfacts.org/ingredients?status=unknown

Ingredients are ordered by the number of products they appear in, so it makes it easier to focus on the biggest problems first.

Filtering ingredients list

You can also add ?filter=something (or &filter=something if you already have another parameter in the url).

e.g. https://de.openfoodfacts.org/ingredients?status=unknown&filter=aroma

Reasons why we didn't recognize some ingredients

  • The ingredient is not in the language of the ingredients list
    • The ingredients list in one language is put in the field for the ingredients list in another language
      • This is often the case when the "main language" of the product was set to a wrong language, e.g. French for a German product.
      • e.g. fr:carrageen-aus-biologischem-anbau (German ingredients in French ingredients list)
    • The ingredient list was not cut correctly and we have lists of ingredients in multiple languages instead of separated lists
      • This is often the case when apps send us the result of the OCR of a big blob of text on the product label
  • The input ingredient list is incorrect (mispellings etc.)
  • The ingredients list contains other things than ingredients and we have not been able to split it at the right place.
  • We have not been able to parse a particular sentence structure or formating (e.g. "a drop of delicious honey with a shower of powder sugar")
  • The ingredient is not yet present in our taxonomy (or not with the right synonym or in the right language)

Concrete steps to improve ingredients analysis

  • Analyze the results of the ingredients analysis: https://de.openfoodfacts.org/ingredients?status=unknown
    • Classify the biggest problems you see
      • Are many ingredients correct but missing in the taxonomy?
        • Complete the ingredients taxonomy
          • Verify if we have an existing entry in English, so that you can add a translation
          • Else add a new entry in English + your target language (and possibly other languages too if you know the translation)
      • Is there a particular structure that is not parsed correctly?
        • e.g. multiple ingredients compounded together "Ingredient A and Ingredient B"
        • e.g. an ingredient plus something that specify how it was processed, or a label "Organic something", "cooked sliced something"
      • Open bugs with examples for the most common structures that we should better support