How to improve ingredients analysis: Difference between revisions
No edit summary |
No edit summary |
||
Line 98: | Line 98: | ||
**** e.g. https://github.com/openfoodfacts/openfoodfacts-server/issues/3020 for "raw something". | **** e.g. https://github.com/openfoodfacts/openfoodfacts-server/issues/3020 for "raw something". | ||
**** bugs should be filed in https://github.com/openfoodfacts/openfoodfacts-server/issues , tag them with the "ingredients-analysis" label. | **** bugs should be filed in https://github.com/openfoodfacts/openfoodfacts-server/issues , tag them with the "ingredients-analysis" label. | ||
[[Category:Ingredients]] |
Revision as of 07:49, 23 October 2023
This page describes the concrete actions that can be taken to improve ingredients analysis in a specific language.
Read this first
- Ingredients Extraction and Analysis: high level view and detailed description of all the steps
- Ingredients Analysis Quality: Metrics for ingredients analysis.
Determining the actions that will have the biggest impact to ingredients analysis in a specific language
Finding out which ingredient we were not able to recognize
All ingredients
On the Open Food Facts web site, it is possible to see the result of ingredient analysis for a subset of product by adding /ingredients to the URL. It's also available in the "Explore products" drilldown button by selecting "ingredients".
e.g. on https://de.openfoodfacts.org you can add /ingredients to see all ingredients for products in Germany: https://de.openfoodfacts.org/ingredients (warning, it's a very long page that can make your browser hang)
Here is a more manageable result on a smaller subset (products sold in Germany that have a fair trade label): https://de.openfoodfacts.org/label/fairer-handel/ingredients
998 Inhaltsstoffe: Inhaltsstoff Produkte * Zucker 185 Kakao 167 Emulgator 125 E322 122 Kakaobutter 114 Kakaomasse 109 Speiseöle und -fette 107 Milcherzeugnisse 101 Pflanzliche Öle und Fette 92 Speisesalz 90 Kakaopulver 78 Aroma 74
If you go down the list, you will see ingredients in italics, with a * in the right column:
fr:carrageen-aus-biologischem-anbau 1 * kirschlikör-füllung 1 * natūrliche-aromen 1 * produkt-kann-spuren-von-erdnüssen 1 * teilweise-hydriertes-sojaöl 1 * milchpulvererzeugnis 1 * en:zucker 1 * fettarmes-kakaopuler 1 *
Those are ingredients that we did not recognize.
Ingredients stats
Adding ?stats=1 to the URL enables to see a summary count of known and unknown ingredients.
e.g. https://de.openfoodfacts.org/ingredients?stats=1
Unknown ingredients
You can then see only unknown ingredients by clicking on "unknown":
e.g. https://de.openfoodfacts.org/ingredients?status=unknown
Ingredients are ordered by the number of products they appear in, so it makes it easier to focus on the biggest problems first.
Filtering ingredients list
You can also add ?filter=something (or &filter=something if you already have another parameter in the url).
e.g. https://de.openfoodfacts.org/ingredients?status=unknown&filter=aroma
Reasons why we didn't recognize some ingredients
- The ingredient is not in the language of the ingredients list
- The ingredients list in one language is put in the field for the ingredients list in another language
- This is often the case when the "main language" of the product was set to a wrong language, e.g. French for a German product.
- e.g. fr:carrageen-aus-biologischem-anbau (German ingredients in French ingredients list)
- The ingredient list was not cut correctly and we have lists of ingredients in multiple languages instead of separated lists
- This is often the case when apps send us the result of the OCR of a big blob of text on the product label
- The ingredients list in one language is put in the field for the ingredients list in another language
- The input ingredient list is incorrect (mispellings etc.)
- The ingredients list contains other things than ingredients and we have not been able to split it at the right place.
- We have not been able to parse a particular sentence structure or formating (e.g. "a drop of delicious honey with a shower of powder sugar")
- The ingredient is not yet present in our taxonomy (or not with the right synonym or in the right language)
Concrete steps to improve ingredients analysis
- Analyze the results of the ingredients analysis: https://de.openfoodfacts.org/ingredients?status=unknown
- Classify the biggest problems you see
- Are many ingredients correct but missing in the taxonomy?
- Complete the ingredients taxonomy
- Verify if we have an existing entry in English, so that you can add a translation
- Else add a new entry in English + your target language (and possibly other languages too if you know the translation)
- Complete the ingredients taxonomy
- Is there a particular structure that is not parsed correctly?
- e.g. multiple ingredients compounded together "Ingredient A and Ingredient B"
- e.g. an ingredient plus something that specify how it was processed, or a label "Organic something", "cooked sliced something"
- Open bugs with examples for the most common structures that we should better support
- e.g. https://github.com/openfoodfacts/openfoodfacts-server/issues/3020 for "raw something".
- bugs should be filed in https://github.com/openfoodfacts/openfoodfacts-server/issues , tag them with the "ingredients-analysis" label.
- Are many ingredients correct but missing in the taxonomy?
- Classify the biggest problems you see