Ingredients Analysis Quality Evaluation - June 2022: Difference between revisions
Line 88: | Line 88: | ||
</pre> | </pre> | ||
|- | |- | ||
| [https://us.openfoodfacts.org/popularity/top-10000- | | [https://us.openfoodfacts.org/popularity/top-10000-us-scans-2021/ingredients?stats=1 Top 10k most scanned products:] || | ||
<pre> | <pre> | ||
known 1619 (18.25%) 97759 (88.84%) | known 1619 (18.25%) 97759 (88.84%) |
Revision as of 16:14, 24 June 2022
Evaluation of the quality of Ingredients Extraction and Analysis so that we can measure the improvements.
Ingredients parsing and recognition
For each product, the ingredients list is parsed to separate each ingredient. Ingredients that we can match to our multilingual ingredients taxonomy as marked as "known", others as "unknown".
There are different reasons an ingredient can be marked as unknown:
- The input ingredient list is incorrect (misspellings etc.)
- The ingredients list contains other things than ingredients and we have not been able to split it at the right place.
- We have not been able to parse a particular sentence structure or formatting (e.g. "a drop of delicious honey with a shower of powder sugar")
- The ingredient is not yet present in our taxonomy (or not with the right synonym or in the right language)
In the table below,
- the first column of numbers corresponds to the number of unique ingredients across all products.
- And the second column of number corresponds to the number of occurrences of those ingredients. e.g. if a specific ingredient appears in 5 products, it is counted 5 times.
EN - English - India
2022-06-23 | ||
---|---|---|
All products: |
known 776 (23.07%) 8444 (69.62%) unknown 2588 (76.93%) 3685 (30.38%) all 3364 (100.00%) 12129 (100.00%) |
|
Top 10k most scanned products: |
known 407 (35.83%) 2149 (69.23%) unknown 729 (64.17%) 955 (30.77%) all 1136 (100.00%) 3104 (100.00%) |
EN - English - UK
2022-06-23 | ||
---|---|---|
All products: |
known 2591 (6.22%) 406797 (88.32%) unknown 39034 (93.78%) 53807 (11.68%) all 41625 (100.00%) 460604 (100.00%) |
|
Top 10k most scanned products: |
known 1664 (21.69%) 77617 (91.14%) unknown 6008 (78.31%) 7546 (8.86%) all 7672 (100.00%) 85163 (100.00%) |
EN - English - US
2022-06-23 | ||
---|---|---|
All products: |
known 3027 (1.77%) 8121096 (91.22%) unknown 167711 (98.23%) 781824 (8.78%) all 170738 (100.00%) 8902920 (100.00%) |
|
Top 10k most scanned products: |
known 1619 (18.25%) 97759 (88.84%) unknown 7252 (81.75%) 12280 (11.16%) all 8871 (100.00%) 110039 (100.00%) |
Ingredients analysis
For each product, once we have parsed the ingredients list to separate each ingredient and match them to our taxonomy, we can analyze if the products are vegan, vegetarian, without palm oil etc.
To mark a product as non-vegan etc. we only need to find one non-vegan ingredient. But to mark a product as vegan, we need to be sure that we have been able to correctly recognize all ingredients so that we can verify that all of them are vegan.
EN - English - IN
2022-06-23 | ||
---|---|---|
All products: |
Vegetarian status unknown 478 Palm oil content unknown 436 Vegan status unknown 390 Non-vegan 184 Vegetarian 159 Palm oil free 96 Vegan 79 Palm oil 70 May contain palm oil 49 Non-vegetarian 20 Maybe vegan 8 Maybe vegetarian 7 |
|
Top 10k most scanned products: |
Vegetarian status unknown 128 Palm oil content unknown 114 Vegan status unknown 101 Non-vegan 43 Vegetarian 26 Palm oil 22 Palm oil free 16 Vegan 12 May contain palm oil 5 Non-vegetarian 3 Maybe vegan 2 Maybe vegetarian 2 |
EN - English - UK
2022-06-23 | ||
---|---|---|
All products: |
Vegetarian status unknown 8623 Palm oil content unknown 8378 Non-vegan 7693 Vegetarian 7293 Vegan status unknown 6188 Palm oil free 5725 Vegan 5021 Non-vegetarian 2754 Palm oil 2660 May contain palm oil 1648 Maybe vegetarian 911 Maybe vegan 559 |
|
Top 10k most scanned products: |
Vegetarian 1741 Palm oil content unknown 1472 Vegetarian status unknown 1412 Palm oil free 1407 Non-vegan 1359 Vegan 1138 Vegan status unknown 1120 Palm oil 511 Non-vegetarian 434 May contain palm oil 270 Maybe vegetarian 159 Maybe vegan 112 |
EN - English - US
2022-06-23 | ||
---|---|---|
All products: |
Vegetarian status unknown 212861 Palm oil content unknown 176245 Non-vegan 146466 Vegan status unknown 126220 Palm oil free 85478 Vegetarian 51116 Non-vegetarian 45193 Vegan 44942 Palm oil 37921 May contain palm oil 30916 Maybe vegetarian 21742 Maybe vegan 13279 |
|
Top 10k most scanned products: |
Vegetarian status unknown 2914 Palm oil content unknown 2488 Vegan status unknown 1954 Non-vegan 1552 Palm oil free 1223 Vegetarian 944 Vegan 847 Palm oil 405 Non-vegetarian 391 May contain palm oil 361 Maybe vegetarian 248 Maybe vegan 143 |