Ingredients Analysis Quality: Difference between revisions

From Open Food Facts wiki
m (added category data quality)
No edit summary
Line 96: Line 96:
[[Category:Project:Personalized_Search]]
[[Category:Project:Personalized_Search]]
[[Category:Data quality]]
[[Category:Data quality]]
[[Category:Ingredients]]

Revision as of 07:50, 23 October 2023

This page explains how we measure and monitor the quality of Ingredients Extraction and Analysis.

Metrics for ingredients analysis

Ingredients parsing and recognition metrics

For each product, the ingredients list is parsed to separate each ingredient. Ingredients that we can match to our multilingual ingredients taxonomy as marked as "known", others as "unknown".

There are different reasons an ingredient can be marked as unknown:

  • The input ingredient list is incorrect (mispellings etc.)
  • The ingredients list contains other things than ingredients and we have not been able to split it at the right place.
  • We have not been able to parse a particular sentence structure or formating (e.g. "a drop of delicious honey with a shower of powder sugar")
  • The ingredient is not yet present in our taxonomy (or not with the right synonym or in the right language)

Percentage of known ingredients

The metric we use to assess the quality of ingredients parsing and recognition is the % of known ingredients over all ingredients.

The percentage of known ingredients can be evaluated:

  • For one specific product.
  • For all products in the database or in a subset of the database (e.g. all products sold in one country)
    • For unique ingredients accross all products
    • Weighted by the number of occurences of each ingredient. e.g. if a specific ingredient appears in 5 products, it is counted 5 times.
      • This is the most representative metric for ingredients parsing and recognition

How to see the percentage of known ingredients

For one specific product, on the product page on the Open Food Facts web site, you can click on "Details of the ingredients analysis" to see how many ingredients have been recognized.

For a given set of products returned on the Open Food Facts web site (e.g. https://us.openfoodfacts.org/ for products sold in the United States), you can add /ingredients?stats=1 to get the percentage of known ingredients for this set of products (e.g. https://us.openfoodfacts.org/ingredients?stats=1 or https://us.openfoodfacts.org/popularity/top-10000-us-scans-2019/ingredients?stats=1 ).

Type Unique tags Occurrences
known 2217 (1.78%) 6451232 (84.62%)
unknown 122064 (98.22%) 1172697 (15.38%)
all 124281 (100.00%) 7623929 (100.00%)

In the table above, the first column of numbers corresponds to the number of unique ingredients accross all products. And the second column of number corresponds to the number of occurrences of those ingredients.


Ingredients analysis at the product level

For each product, once we have parsed the ingredients list to separate each ingredient and match them to our taxonomy, we can analyze if the products are vegan, vegetarian, without palm oil etc.

To mark a product as non-vegan etc. we only need to find one non-vegan ingredient. But to mark a product as vegan, we need to be sure that we have been able to correctly recognize all ingredients so that we can verify that all of them are vegan.

Percentage of known values for each property

For a property (e.g. is the product vegetarian, vegan or palm oil free), we can have different values, for instance for the vegetarian property:

  • Unknown value: we have not been able to make a determination
  • Known values
    • Yes: the product is vegetarian (we have recognized all ingredients, and we know all of them are vegetarian)
    • No: the product is not vegetarian (at least one ingredient that we recognized is not vegetarian)
    • Maybe: the product may be vegetarian (we have recognized all ingredients, but at least one of them can be vegetarian or not)

The metric we use is the percentage of known values over all values.

How to see the percentage of known values

For one specific product, the product page shows the results of the ingredients analysis with icons for vegetarian products, vegan products and palm oil free products (red, orange or green depending on the value). If the property has an unknown value, the icon is not displayed.

For a given set of products returned on the Open Food Facts web site, you can add /ingredients-analysis to the URL (e.g. https://us.openfoodfacts.org/ingredients-analysis)

Ingredients analysis	Products	*
Vegetarian status unknown	230236	
Palm oil content unknown	208375	
Vegan status unknown	142640	
Non-vegan	134162	
Palm oil free	52268	
Non-vegetarian	39834	
Vegetarian	36098	
Palm oil	34461	
Vegan	32785	
May contain palm oil	21314	
Maybe vegetarian	10442	
Maybe vegan	7020		

Ingredients Analysis Quality Evaluation - March 2020

An initial evaluation of the quality of ingredients analysis for major European languages was done in March 2020 as part of the first task Ingredients analysis and search features extraction of the Project:Personalized_Search

Ingredients Analysis Quality Evaluation - March 2020