Olive oil - en
Introduction
In november 2020 @stephane asked to have a look at the products in the olive oil category. The category was in need of cleaning up. An olive oil should have a Nutriscore D or C, but other values were seen. So I had a look at the products and cleaned a bit. This post is a log of my observations.
Background
Olive oils are a special category for Nutriscore, as they are a special case for the score calculation. Olive oil is seen as one of the better oils, and thus may get a better score. This exception was recently introduced (a refrence here?).
However all oils are bad, so olive oil will never get a better score than C. This has provoked a lot of resistance in Italy and Spain, as olive oil is seen as a major export product. Those countries might leave out olive oil from the Nutriscore obligation.
Definition
https://en.wikipedia.org/wiki/Olive_oil is a simple product: it has only one ingredient: olives. The extraction process and origin might influence the actual composition of oils. This might be visible in the nutritional values and labelling on a product.
Completeness
On 13 dec 2020 the https://world.openfoodfacts.org/category/olive-oils is comprised of 6118 products. There might be olive oil products, which are not labelled as such.
Robotoff comes here to the rescue, as it might have found some eligible products. On 13 dec 2020 there were no remaining questions on Robotoff.
Interlopers
Are all the products in the Olive oils category indeed olive oils. We need to find the products that are not olive oils (the interlopers).
Number of ingredients
A first check is to look at the number of ingredients. And it turns out that are some products that seem wrongly classified.
Note that of the 6118 products only 1967 seem to have ingredients defined.
Not sure how I can list the products with more than 1 ingredient.
Another way to fin interlopers is by loooking for strange nutritional values, but that can also be due to wrongly entered data.
Data quality
Fat percentage
A next step to correct wrong nutritional data, starting by looking at the fat percentage.
There are 6 products with more than 100% fat, so lets correct those first. It easier to list those with a query and correct them.
For these products the wrong data has been added. One product had a fat content of 101g though.
Now we can graph the fat content again. Now we see we have an issue with products with a fat percentage that is much to low. So next edit round.
I repaired the following errors:
- wrong classifications
- mixup between per serving and per 100g data
- all nutritional values set to 0
- forgotten to check per serving
Conclusions
New categories
By looking more closely at all the products we can identify other categories. Only if there are a lot of products in each category or if the nutritional values deviate to much, it is worthwhile to create these new products
- pure olive oils: the current olive oils category should be renamed to pure olive oils to indicate that these products contain only one ingredient. This helps also to distinguish from the other olive oils.
- olive oil sprays for olive oil contain other ingredients to make it sprayable. As the serving is only 0.25g, the nutritional values per serving are all zero (thanks to rounding).
- enhanced olive oils: these olive oils have added vitamins for children(?)
- flavoured olive oils: these olive oils have added flavours (garlic, etc.)
Canonical fat percentages
Many products use, what seems as, standard canonical values for nutritional values. Looking at the other products these seem not to reflect the real product.