Ingredients Analysis Quality: Difference between revisions
m (added category data quality) |
No edit summary |
||
Line 96: | Line 96: | ||
[[Category:Project:Personalized_Search]] | [[Category:Project:Personalized_Search]] | ||
[[Category:Data quality]] | [[Category:Data quality]] | ||
[[Category:Ingredients]] |
Revision as of 07:50, 23 October 2023
This page explains how we measure and monitor the quality of Ingredients Extraction and Analysis.
Metrics for ingredients analysis
Ingredients parsing and recognition metrics
For each product, the ingredients list is parsed to separate each ingredient. Ingredients that we can match to our multilingual ingredients taxonomy as marked as "known", others as "unknown".
There are different reasons an ingredient can be marked as unknown:
- The input ingredient list is incorrect (mispellings etc.)
- The ingredients list contains other things than ingredients and we have not been able to split it at the right place.
- We have not been able to parse a particular sentence structure or formating (e.g. "a drop of delicious honey with a shower of powder sugar")
- The ingredient is not yet present in our taxonomy (or not with the right synonym or in the right language)
Percentage of known ingredients
The metric we use to assess the quality of ingredients parsing and recognition is the % of known ingredients over all ingredients.
The percentage of known ingredients can be evaluated:
- For one specific product.
- For all products in the database or in a subset of the database (e.g. all products sold in one country)
- For unique ingredients accross all products
- Weighted by the number of occurences of each ingredient. e.g. if a specific ingredient appears in 5 products, it is counted 5 times.
- This is the most representative metric for ingredients parsing and recognition
How to see the percentage of known ingredients
For one specific product, on the product page on the Open Food Facts web site, you can click on "Details of the ingredients analysis" to see how many ingredients have been recognized.
For a given set of products returned on the Open Food Facts web site (e.g. https://us.openfoodfacts.org/ for products sold in the United States), you can add /ingredients?stats=1 to get the percentage of known ingredients for this set of products (e.g. https://us.openfoodfacts.org/ingredients?stats=1 or https://us.openfoodfacts.org/popularity/top-10000-us-scans-2019/ingredients?stats=1 ).
Type | Unique tags | Occurrences |
---|---|---|
known | 2217 (1.78%) | 6451232 (84.62%) |
unknown | 122064 (98.22%) | 1172697 (15.38%) |
all | 124281 (100.00%) | 7623929 (100.00%) |
In the table above, the first column of numbers corresponds to the number of unique ingredients accross all products. And the second column of number corresponds to the number of occurrences of those ingredients.
Ingredients analysis at the product level
For each product, once we have parsed the ingredients list to separate each ingredient and match them to our taxonomy, we can analyze if the products are vegan, vegetarian, without palm oil etc.
To mark a product as non-vegan etc. we only need to find one non-vegan ingredient. But to mark a product as vegan, we need to be sure that we have been able to correctly recognize all ingredients so that we can verify that all of them are vegan.
Percentage of known values for each property
For a property (e.g. is the product vegetarian, vegan or palm oil free), we can have different values, for instance for the vegetarian property:
- Unknown value: we have not been able to make a determination
- Known values
- Yes: the product is vegetarian (we have recognized all ingredients, and we know all of them are vegetarian)
- No: the product is not vegetarian (at least one ingredient that we recognized is not vegetarian)
- Maybe: the product may be vegetarian (we have recognized all ingredients, but at least one of them can be vegetarian or not)
The metric we use is the percentage of known values over all values.
How to see the percentage of known values
For one specific product, the product page shows the results of the ingredients analysis with icons for vegetarian products, vegan products and palm oil free products (red, orange or green depending on the value). If the property has an unknown value, the icon is not displayed.
For a given set of products returned on the Open Food Facts web site, you can add /ingredients-analysis to the URL (e.g. https://us.openfoodfacts.org/ingredients-analysis)
Ingredients analysis Products * Vegetarian status unknown 230236 Palm oil content unknown 208375 Vegan status unknown 142640 Non-vegan 134162 Palm oil free 52268 Non-vegetarian 39834 Vegetarian 36098 Palm oil 34461 Vegan 32785 May contain palm oil 21314 Maybe vegetarian 10442 Maybe vegan 7020
Ingredients Analysis Quality Evaluation - March 2020
An initial evaluation of the quality of ingredients analysis for major European languages was done in March 2020 as part of the first task Ingredients analysis and search features extraction of the Project:Personalized_Search