Ingredients Analysis Quality Evaluation - August 2023: Difference between revisions

From Open Food Facts wiki
No edit summary
Line 102: Line 102:
  ||
  ||
<pre>
<pre>
14121 składniki:


Type Unique tags Occurrences
known 1918 (13.58%) 127042 (89.13%)
unknown 12203 (86.42%) 15488 (10.87%)
all 14121 (100.00%) 142530 (100.00%)
</pre>
</pre>
|-
|-
Line 115: Line 120:
</pre>
</pre>
||
||
3627 składniki:


Type Unique tags Occurrences
known 1290 (35.57%) 34703 (92.74%)
unknown 2337 (64.43%) 2718 (7.26%)
all 3627 (100.00%) 37421 (100.00%)
|}
|}



Revision as of 15:59, 16 August 2023

Evaluation of the quality of Ingredients Extraction and Analysis so that we can measure the improvements.

Ingredients parsing and recognition

For each product, the ingredients list is parsed to separate each ingredient. Ingredients that we can match to our multilingual ingredients taxonomy as marked as "known", others as "unknown".

There are different reasons an ingredient can be marked as unknown:

  • The input ingredient list is incorrect (misspellings etc.)
  • The ingredients list contains other things than ingredients and we have not been able to split it at the right place.
  • We have not been able to parse a particular sentence structure or formatting (e.g. "a drop of delicious honey with a shower of powder sugar")
  • The ingredient is not yet present in our taxonomy (or not with the right synonym or in the right language)

In the table below,

  • the first column of numbers corresponds to the number of unique ingredients across all products.
  • And the second column of number corresponds to the number of occurrences of those ingredients. e.g. if a specific ingredient appears in 5 products, it is counted 5 times.


JP - Japanese

2023-07-30 2022-07-31
All products:
2822 原材料:

Type	Unique tags	Occurrences
known	726 (25.73%)	7191 (69.84%)
unknown	2096 (74.27%)	3105 (30.16%)
all	2822 (100.00%)	10296 (100.00%)
2593 原材料:

Type	Unique tags	Occurrences
known	766 (29.54%)	8466 (77.51%)
unknown	1827 (70.46%)	2456 (22.49%)
all	2593 (100.00%)	10922 (100.00%)
Top 10k most scanned products:


HR - Croatian

2023-07-30 2022-07-31
All products:
5700 sastojci:

Type	Unique tags	Occurrences
known	1304 (22.88%)	25105 (83.02%)
unknown	4396 (77.12%)	5134 (16.98%)
all	5700 (100.00%)	30239 (100.00%)
5254 sastojci:

Type	Unique tags	Occurrences
known	1197 (22.78%)	26194 (84.69%)
unknown	4057 (77.22%)	4735 (15.31%)
all	5254 (100.00%)	30929 (100.00%)
Top 10k most scanned products:

PL - Polish

2023-08-15 2022-08-16
All products:
16160 składniki:

Type	Unique tags	Occurrences
known	2048 (12.67%)	116070 (84.73%)
unknown	14112 (87.33%)	20910 (15.27%)
all	16160 (100.00%)	136980 (100.00%)
14121 składniki:

Type	Unique tags	Occurrences
known	1918 (13.58%)	127042 (89.13%)
unknown	12203 (86.42%)	15488 (10.87%)
all	14121 (100.00%)	142530 (100.00%)
Top 10k most scanned products:
4406 składniki:

Type	Unique tags	Occurrences
known	1296 (29.41%)	31605 (88.30%)
unknown	3110 (70.59%)	4189 (11.70%)
all	4406 (100.00%)	35794 (100.00%)

3627 składniki:

Type Unique tags Occurrences known 1290 (35.57%) 34703 (92.74%) unknown 2337 (64.43%) 2718 (7.26%) all 3627 (100.00%) 37421 (100.00%)

Observations