1,082
edits
No edit summary |
No edit summary |
||
Line 128: | Line 128: | ||
** Split some "A of B, C and D" (but not all...) | ** Split some "A of B, C and D" (but not all...) | ||
*** e.g. "Huile de palme, colza et tournesol" -> Huile de palme, huile de colza, huile de tournesol | *** e.g. "Huile de palme, colza et tournesol" -> Huile de palme, huile de colza, huile de tournesol | ||
** Handle * and other signs that indicate some ingredients are organic, fair trade etc. | |||
*** e.g. "Pomme*, ..., *: ingrédient issu de l'agriculture biologique" -> "Pomme bio" | |||
* Current solution | * Current solution | ||
** Perl code and regular expressions | ** Perl code and regular expressions | ||
*** lib/ProductOpener/Ingredients.pm - preparse_ingredients_text() | *** lib/ProductOpener/Ingredients.pm - preparse_ingredients_text() | ||
==== Ingredients parsing ==== | |||
* Separate individual ingredients and match them to the ingredients taxonomy | |||
** Extract properties of ingredients | |||
*** Labels like organic, fair trade etc. | |||
*** quantity (%) | |||
*** processing (e.g. "cooked") | |||
*** origin (e.g. "France") | |||
** Multi-level ingredients / sub-ingredients | |||
*** e.g. "Fromage (Lait, présure, sel)" | |||
** Recognize when "A and B" is a single ingredient, or 2 ingredients | |||
*** Uses the taxonomy to make the determination | |||
* Current solution | |||
** Perl code and regular expressions + multilingual ingredients taxonomy | |||
*** lib/ProductOpener/Ingredients.pm - extract_ingredients_from_text() | |||
== End to end metrics == | |||
* For each product, we have the number of known and unknown ingredients | |||
== Ressources === | |||
=== Data === | |||
* Ingredients text and result of ingredient parsing in MongoDB JSON / JSONL exports: https://world.openfoodfacts.org/data | |||
=== Ingredients taxonomy === | |||
* Definition: | |||
* JSON result: |