Jump to content

Ingredients analysis and search features extraction: Difference between revisions

No edit summary
Line 70: Line 70:


=== Ingredients taxonomy improvements ===
=== Ingredients taxonomy improvements ===
The main goal of ingredients analysis is to detect each individual ingredient of the ingredients list and map it to our multilingual ingredients taxonomy, so building a comprehensive list of ingredients with all their possible synonyms in as many languages as possible is very important.
We are constantly improving the ingredients taxonomy, in different ways:
==== Translations of existing entries ====
Volunteers can add translations to their own language of already existing entries, thanks to an interface on the Open Food Facts web site. (e.g. https://pl.openfoodfacts.org/ingredients?translate=1 to translate ingredients to Polish).
==== Incorporation of the most frequent unknown ingredients ====
Contributors familiar with the taxonomy and with experience using Git can edit directly the [https://github.com/openfoodfacts/openfoodfacts-server/blob/master/taxonomies/ingredients.txt ingredients taxonomy definition file] to incorporate the most frequent unrecognized ingredients in a language (e.g. https://nl.openfoodfacts.org/ingredients?status=unknown&limit=1000 )
Unrecognized entries need to be added as a translation or as a synonym of an existing entry, or added as a new entry if we don't already have a corresponding entry in English or another language.


=== Ingredients processing taxonomy ===
=== Ingredients processing taxonomy ===
A lot of ingredients list also contain information on how the ingredients have been processed (e.g. "cooked pork meat", "sliced tomatoes", "powdered garlic").
Instead of listing all possible combinations of processing for each ingredient in the ingredients taxonomy, we have created a [https://github.com/openfoodfacts/openfoodfacts-server/blob/master/taxonomies/ingredients_processing.txt taxonomy of processing methods] that we use during ingredients parsing.
At this point, we are adding new processing methods and translations for them very carefully, as there is a risk of having false positives. Before adding new processing methods, we first look at the list of ingredients that contain them (using URLs like https://us.openfoodfacts.org/ingredients?filter=cooked )
The ingredients processing taxonomy is a huge improvement as it is going to drastically reduce the number of entries needed in the ingredients taxonomy.


=== Ingredients parsing features ===
=== Ingredients parsing features ===