Ingredients Analysis Quality Evaluation - August 2023: Difference between revisions
Raphael0202 (talk | contribs) No edit summary |
No edit summary  |
||
(One intermediate revision by one other user not shown) | |||
Line 99: | Line 99: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! !! 2023-08-15 | ! || 2022-08-16!! 2023-08-15 | ||
!2023-12-01 | |||
|- | |- | ||
| [https://pl.openfoodfacts.org/ingredients?stats=1&no_cache=1 All products:] || | | [https://pl.openfoodfacts.org/ingredients?stats=1&no_cache=1 All products:] Â | ||
|| | |||
<pre> | |||
14121 skÅadniki: | |||
 | |||
Type Unique tags Occurrences | |||
known 1918 (13.58%) 127042 (89.13%) | |||
unknown 12203 (86.42%) 15488 (10.87%) | |||
all 14121 (100.00%) 142530 (100.00%) | |||
</pre>|| | |||
<pre> | <pre> | ||
16160 skÅadniki: | 16160 skÅadniki: | ||
Line 110: | Line 120: | ||
all 16160 (100.00%) 136980 (100.00%) | all 16160 (100.00%) 136980 (100.00%) | ||
</pre> | </pre> | ||
|<pre> | |||
16478 skÅadniki: | |||
 | |||
Type Unique tags Occurrences | |||
known 1957 (11.88%) 135344 (87.87%) | |||
unknown 14521 (88.12%) 18689 (12.13%) | |||
all 16478 (100.00%) 154033 (100.00%) | |||
</pre> | |||
|- | |||
| [https://pl.openfoodfacts.org/popularity/top-10000-pl-scans-2022/ingredients?stats=1&no_cache=1 Top 10k most scanned products:] | |||
|| | |||
<pre> | <pre> | ||
3627 skÅadniki: | |||
Type Unique tags Occurrences | Type Unique tags Occurrences | ||
known | known 1290 (35.57%) 34703 (92.74%) | ||
unknown | unknown 2337 (64.43%) 2718 (7.26%) | ||
all | all 3627 (100.00%) 37421 (100.00%) | ||
</pre> | </pre>|| Â | ||
<pre> | <pre> | ||
4406 skÅadniki: | 4406 skÅadniki: | ||
Line 129: | Line 147: | ||
all 4406 (100.00%) 35794 (100.00%) | all 4406 (100.00%) 35794 (100.00%) | ||
</pre> | </pre> | ||
| | |<pre> | ||
<pre> | 3809 skÅadniki: | ||
Type Unique tags Occurrences | Type Unique tags Occurrences | ||
known | known 1309 (34.37%) 35540 (92.39%) | ||
unknown | unknown 2500 (65.63%) 2926 (7.61%) | ||
all | all 3809 (100.00%) 38466 (100.00%) | ||
</pre> | </pre> | ||
|} | |} | ||
Line 184: | Line 201: | ||
all 7223 (100.00%) 90480 (100.00%)</pre> | all 7223 (100.00%) 90480 (100.00%)</pre> | ||
|} | |} | ||
=== Observations === | === Observations === | ||
Line 214: | Line 207: | ||
[[Category:Data quality]] | [[Category:Data quality]] | ||
[[Category:Ingredients]] | [[Category:Ingredients]] | ||
[[Category:Metrics]] |
Latest revision as of 10:43, 7 August 2024
Evaluation of the quality of Ingredients Extraction and Analysis so that we can measure the improvements.
Ingredients parsing and recognition
For each product, the ingredients list is parsed to separate each ingredient. Ingredients that we can match to our multilingual ingredients taxonomy as marked as "known", others as "unknown".
There are different reasons an ingredient can be marked as unknown:
- The input ingredient list is incorrect (misspellings etc.)
- The ingredients list contains other things than ingredients, and we have not been able to split it at the right place.
- We have not been able to parse a particular sentence structure or formatting (e.g., "a drop of delicious honey with a shower of powder sugar")
- The ingredient is not yet present in our taxonomy (or not with the right synonym or in the right language)
In the table below,
- the first column of numbers corresponds to the number of unique ingredients across all products.
- And the second column of number corresponds to the number of occurrences of those ingredients. e.g., if a specific ingredient appears in 5 products, it is counted 5 times.
JP - Japanese
2023-07-30 | 2022-07-31 | |
---|---|---|
All products: |
2822 åææ: Type Unique tags Occurrences known 726 (25.73%) 7191 (69.84%) unknown 2096 (74.27%) 3105 (30.16%) all 2822 (100.00%) 10296 (100.00%) |
2593 åææ: Type Unique tags Occurrences known 766 (29.54%) 8466 (77.51%) unknown 1827 (70.46%) 2456 (22.49%) all 2593 (100.00%) 10922 (100.00%) |
Top 10k most scanned products: |
|
|
HR - Croatian
2023-07-30 | 2023-07-31 | 2023-09-20 | |
---|---|---|---|
All products: |
5700 sastojci: Type Unique tags Occurrences known 1304 (22.88%) 25105 (83.02%) unknown 4396 (77.12%) 5134 (16.98%) all 5700 (100.00%) 30239 (100.00%) |
5254 sastojci: Type Unique tags Occurrences known 1197 (22.78%) 26194 (84.69%) unknown 4057 (77.22%) 4735 (15.31%) all 5254 (100.00%) 30929 (100.00%) |
3420 sastojci: Type Unique tags Occurrences known 1413 (41.32%) 29483 (92.97%) unknown 2007 (58.68%) 2229 (7.03%) all 3420 (100.00%) 31712 (100.00%) |
Top 10k most scanned products: |
|
|
PL - Polish
2022-08-16 | 2023-08-15 | 2023-12-01 | |
---|---|---|---|
All products: |
14121 skÅadniki: Type Unique tags Occurrences known 1918 (13.58%) 127042 (89.13%) unknown 12203 (86.42%) 15488 (10.87%) all 14121 (100.00%) 142530 (100.00%)|| 16160 skÅadniki: Type Unique tags Occurrences known 2048 (12.67%) 116070 (84.73%) unknown 14112 (87.33%) 20910 (15.27%) all 16160 (100.00%) 136980 (100.00%) |
16478 skÅadniki: Type Unique tags Occurrences known 1957 (11.88%) 135344 (87.87%) unknown 14521 (88.12%) 18689 (12.13%) all 16478 (100.00%) 154033 (100.00%) | |
Top 10k most scanned products: |
3627 skÅadniki: Type Unique tags Occurrences known 1290 (35.57%) 34703 (92.74%) unknown 2337 (64.43%) 2718 (7.26%) all 3627 (100.00%) 37421 (100.00%)|| 4406 skÅadniki: Type Unique tags Occurrences known 1296 (29.41%) 31605 (88.30%) unknown 3110 (70.59%) 4189 (11.70%) all 4406 (100.00%) 35794 (100.00%) |
3809 skÅadniki: Type Unique tags Occurrences known 1309 (34.37%) 35540 (92.39%) unknown 2500 (65.63%) 2926 (7.61%) all 3809 (100.00%) 38466 (100.00%) |
UK + English
2023-09-14 | 2023-12-01 | |
---|---|---|
All products: |
60330 ingredients: Type Unique tags Occurrences known 2840 (4.71%) 649282 (88.62%) unknown 57490 (95.29%) 83403 (11.38%) all 60330 (100.00%) 732685 (100.00%) |
63758 ingredients: Type Unique tags Occurrences known 2916 (4.57%) 723156 (89.04%) unknown 60842 (95.43%) 89015 (10.96%) all 63758 (100.00%) 812171 (100.00%) |
Top 10k most scanned products: |
7243 ingredients: Type Unique tags Occurrences known 1728 (23.86%) 79991 (92.23%) unknown 5515 (76.14%) 6740 (7.77%) all 7243 (100.00%) 86731 (100.00%) |
7223 ingredients: Type Unique tags Occurrences known 1760 (24.37%) 83819 (92.64%) unknown 5463 (75.63%) 6661 (7.36%) all 7223 (100.00%) 90480 (100.00%) |