Jump to content

Ingredients Extraction and Analysis: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 151: Line 151:


== End to end metrics ==
== End to end metrics ==
=== Known and unknown ingredients ===


* For each product, we have the number of known and unknown ingredients
* For each product, we have the number of known and unknown ingredients
* https://fr.openfoodfacts.org/ingredients?stats=1 (takes a while to load and to render in a browser)
** Results on Sept 10th 2019:
*** 467564 ingrédients:
*** Type Unique tags Occurrences
*** known 3647 (0.78%) 3458004 (81.99%)
*** unknown 463917 (99.22%) 759598 (18.01%)
*** all 467565 (100.00%) 4217602 (100.00%)
** Can be given for a subset of products
*** https://fr.openfoodfacts.org/editor/scamark/ingredients?stats=1 (results from product data imported from Scamark / Leclerc)
=== Results of further ingredient analysis ===
* Number of products for which we are able to make a vegan / non-vegan or vegetarian / non-vegetarian determination
** https://fr.openfoodfacts.org/ingredients-analysis


== Ressources ===
== Ressources ==


=== Data ===
=== Data ===


* Ingredients text and result of ingredient parsing in MongoDB JSON / JSONL exports: https://world.openfoodfacts.org/data
* Ingredients text and result of ingredient parsing
** MongoDB JSON / JSONL exports: https://world.openfoodfacts.org/data
** API
*** product URL on OFF: https://fr.openfoodfacts.org/produit/3560070223145/pistaches-grillees-carrefour
*** add "/api/v0" to get JSON result: https://fr.openfoodfacts.org/api/v0/produit/3560070223145/pistaches-grillees-carrefour
* Sorted lists with counts of individual ingredients for a subset of product
** https://fr.openfoodfacts.org/ingredients?stats=1
*** Ingredients that are known (they exist in our taxonomy): https://fr.openfoodfacts.org/editor/scamark/ingredients?status=known
*** Unknown ingredients: https://fr.openfoodfacts.org/editor/scamark/ingredients?status=known
**** OCR and spelling errors
**** Parsing errors
**** Things that are not ingredients and that should not be in the ingredient list
**** Ingredients or synonyms that should be added to the taxonomy
** Can be given for a subset of products
*** https://fr.openfoodfacts.org/editor/scamark/ingredients?status=unknown (for imported Scamark / Leclerc products)


=== Ingredients taxonomy ===
=== Ingredients taxonomy ===


* Definition:
* Definition: https://github.com/openfoodfacts/openfoodfacts-server/blob/master/taxonomies/ingredients.txt
* JSON result:
* JSON result: http://static.openfoodfacts.org/data/taxonomies/ingredients.json