Olive oil - en: Difference between revisions
No edit summary |
|||
(150 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
[[Category:Category analysis]] | |||
== Introduction == | == Introduction == | ||
In november 2020 @stephane asked to have a look at the products in the [https://world.openfoodfacts.org/category/olive-oils olive oil category]. The category was in need of cleaning up. An olive oil should have a Nutriscore | In november 2020 @stephane asked to have a look at the products in the [https://world.openfoodfacts.org/category/olive-oils olive oil category]. The category was in need of cleaning up. An olive oil should have a Nutriscore C, but other values were seen. So I had a look at the products and cleaned up a bit. This post is a log of my observations. | ||
== Background == | == Background == | ||
Olive oils are a special category for Nutriscore, as they are a special case for the | Olive oils are a special category for Nutriscore, as they are a special case for the Nutriscore calculation. Olive oil is seen as one of the better oils, and thus may get a better score. This exception was recently introduced (can not find the change history). | ||
However all oils are bad, so olive oil will never get a better score than C. This has provoked a lot of resistance in Italy and Spain, as olive oil is seen as a major export product. Those countries might leave out olive oil from the Nutriscore obligation. | However all oils are bad, so olive oil will never get a better score than C. This has provoked a lot of resistance in Italy and Spain, as olive oil is seen as a major export product ([https://www.oliveoiltimes.com/business/nutri-score-will-damage-olive-oil-trade/87617 see]). Those countries might leave out olive oil from the Nutriscore obligation. | ||
Also a good reason to have a cleanup category. | |||
== Definition == | == Definition == | ||
[[Olive oil|https://en.wikipedia.org/wiki/Olive_oil]] is a simple product: it has only one ingredient: olives. The extraction process and origin might influence the actual composition of oils. This might be visible in the nutritional values and labelling on a product. | [[Olive oil|https://en.wikipedia.org/wiki/Olive_oil]] is a simple product: it has only one ingredient: olives. The extraction process and origin might influence the actual composition of oils. This might be visible in the nutritional values and labelling on a product. | ||
We can investigate whether we see any difference between olive oils. | |||
== Completeness == | == Completeness == | ||
On 13 dec 2020 the [[olive oil category|https://world.openfoodfacts.org/category/olive-oils]] is comprised of 6118 products. There might be olive oil products, which are not | On 13 dec 2020 the [[olive oil category|https://world.openfoodfacts.org/category/olive-oils]] is comprised of 6118 products. There might be other olive oil products, which are not yet categorised. | ||
Robotoff | Robotoff might help finding, as it might have found some eligible products. On 13 dec 2020 there were no remaining questions on [https://hunger.openfoodfacts.org/questions?type=category&value_tag=olive-oils Robotoff]. | ||
There are many products for which we have no [https://world.openfoodfacts.org/cgi/search.pl?action=process&tagtype_0=categories&tag_contains_0=contains&tag_0=olive%20oils&tagtype_1=states&tag_contains_1=contains&tag_1=Nutrition%20facts%20to%20be%20completed&sort_by=unique_scans_n&page_size=20 nutritional data]. It is impossible to go through all these by hand. A quick look at these products reveals that some 10% have images of the nutritional tables from which the data can be extracted. Clearly there is a role for Robotoff to play here in automatically extracting the data, or just signalling us that there might be such data. | |||
There are many products for which we have no [https://world.openfoodfacts.org/cgi/search.pl?action=process&tagtype_0=categories&tag_contains_0=contains&tag_0=olive%20oils&tagtype_1=states&tag_contains_1=contains&tag_1=Nutrition%20facts%20to%20be%20completed&sort_by=unique_scans_n&page_size=20 nutritional data]. Clearly there is a role for Robotoff to play here | |||
== | == Data quality == | ||
We like to have a consistent set of products in the olive oils category: no interlopers, data correct, etc. | |||
=== Number of | === Number of Ingredients === | ||
[[File:BD5ADFBE-A639-47CE-87D9-CC1F4273A694.jpg|thumb]] | [[File:BD5ADFBE-A639-47CE-87D9-CC1F4273A694.jpg|thumb]] | ||
Are all the products in the [https://world.openfoodfacts.org/category/olive-oils Olive oils category] indeed olive oils. We need to find the products that are not olive oils (the interlopers). | |||
A check is to look at the [https://world.openfoodfacts.org/cgi/search.pl?action=process&tagtype_0=categories&tag_contains_0=contains&tag_0=olive%20oils&sort_by=unique_scans_n&page_size=20&axis_x=ingredients_n&graph_title=Olive%20oils%20(number%20of%20ingredients)&graph=1 number of ingredients]. And it turns out that are some products that seem wrongly classified. | |||
Note that of the 6118 products in the olive oils category only 1967 seem to have ingredients defined. I do not know how to list the products with more than one ingredient. So we have no other choice than find to go through all products, if we want to find these products. | |||
Another way to | Another way to find interlopers might be by looking at the nutritional values. If a nutritional value is to abnormal, then it might be another type of product. But then we have to find out what is normal first. | ||
=== Nutritional values check=== | === Nutritional values check=== | ||
[[File:OliveOilsFatPercentageDistribution.jpg|thumb]] | [[File:OliveOilsFatPercentageDistribution.jpg|thumb]] | ||
A next step to find and correct wrong nutritional data, I started by looking at the [https://world.openfoodfacts.org/cgi/search.pl?action=process&tagtype_0=categories&tag_contains_0=contains&tag_0=olive%20oils&sort_by=unique_scans_n&page_size=20&axis_x=fat&graph_title=Olive%20oils%20(fat%20percentage)&graph=1 fat percentage]. | |||
There were 6 products with more than 100% fat, which is clearly impossible. These can be listed with this search [https://world.openfoodfacts.org/cgi/search.pl?action=process&tagtype_0=categories&tag_contains_0=contains&tag_0=olive%20oils&nutriment_0=fat&nutriment_compare_0=gt&nutriment_value_0=100&sort_by=unique_scans_n&page_size=20&axis_x=fat query]. These products had just the wrong data entered. Although one product actually had a fat content of 101g. | |||
The next step is to look at fat percentages that were too low. | |||
The | The graph shows the final distribution of fat percentages. Note that the distribution has two peaks, one around 100% and one around 90%. The reason for this will be investigated further on. | ||
As the fat percentage shows, we need to check all nutritional values and check if anything is out of the ordinary. For this we need to define an envelope of accepted values. The products that have all values will not be checked any further. Products with one or more values outside this envelope need to be checked by hand. | |||
The envelope check also allowed to remove wrongly assigned products (very few) and do other repairs, like adding ingredients images, doing the recognition, adding subcategories, adding brands, adding package sizes, etc. | |||
The | |||
=== Typical errors === | === Typical errors === | ||
The repaired errors were mainly of the following type: | |||
* wrong classifications | * wrong classifications | ||
* mixup between per serving and per 100g data | * mixup between per serving and per 100g data | ||
Line 59: | Line 56: | ||
== Olive oil conclusions and thoughts == | == Olive oil conclusions and thoughts == | ||
After having repaired most of the olive oil products, we can have a closer look at the data in detail. | |||
=== States overview === | === States overview === | ||
First an overview of the completion [https://world.openfoodfacts.org/category/olive-oils/states states]. | |||
Some highlights (status on 16 dec 2020): | Some highlights (status on 16 dec 2020): | ||
Line 69: | Line 67: | ||
=== Ingredients === | === Ingredients === | ||
The ingredients mentioned now are something like ''olive oil''. But does that convey any useful information? Preferably we would like to know the olive variety (or varieties). The origin of the olives and the processes applied to extract the oil. Words like ''virgin'', ''extra virgin'', ''cold-pressed'', ''AOP'', the origin of the olives, etc provides more information and is good information. | |||
Now and then the information available on the package does provide more information, like "huile d'olive de catégorie supérieure obtenue directement des olives et uniquement par des procédés mécaniques". So we know it is made from cold pressed olives. | |||
Sometimes one can find the olive variety elsewhere on the packaging. | |||
=== Serving size === | === Serving size === | ||
The products sold in the note a serving size on 15 ml on their packaging, which is equivalent to 1 tablespoon. Instead of the 15 ml, a weight of 14 g can be given. This weight corresponds to the specific gravity of 0.911 of olive oils ([https://en.wikipedia.org/wiki/Olive_oil wikipedia]). On products sold in Europe a serving size of 10 ml can sometimes be found. The usefulness of having a serving size for a cooking aid can be debated. In fact 15 ml seems a lot. | The products sold in the note a serving size on 15 ml on their packaging, which is equivalent to 1 tablespoon. Instead of the 15 ml, a weight of 14 g can be given. This weight corresponds to the specific gravity of 0.911 of olive oils ([https://en.wikipedia.org/wiki/Olive_oil wikipedia]). On products sold in Europe a serving size of 10 ml can sometimes be found. The usefulness of having a serving size for a cooking aid can be debated. In fact 15 ml seems a lot. | ||
=== | === Packaging size === | ||
Many products use | The packaging sizes seem to vary between 100ml and 5L. | ||
=== Nutritional values === | |||
The nutritional values can be presented in four ways: | |||
* per weight serving (usually 14g); | |||
* per volumetric serving (usually 1 tablespoon, i.e. 15ml); | |||
* per 100 ml | |||
* per 100 g | |||
This implies that we cannot easily compare the values. | |||
OFF applies a conversion from serving to 100ml/mg by simple scaling. Thus to go from 15ml to 100ml, just multiply by 6.67. | |||
==== Energy ==== | |||
Many olive oil products use canonical nutritional values. The canonical values for energy (wikipedia) is 3700 kJ or 880 kcal. The products use 900 kcal instead. On US style nutritional tables, the canonical value is 120 Calories. Simple scaling implies than an Energy of 800 kcal. | |||
In total there are [https://world.openfoodfacts.org/cgi/search.pl?action=process&tagtype_0=categories&tag_contains_0=contains&tag_0=Olive%20oils&nutriment_0=energy-kcal&nutriment_compare_0=gt&nutriment_value_0=890&nutriment_1=energy-kcal&nutriment_compare_1=lt&nutriment_value_1=910&sort_by=unique_scans_n&page_size=20 1957 products] with this canonical value. | |||
[[File:OliveOilsKcalDistribution20201216.jpg|thumb|center|Energy (kcal) distribution for olive oils (16 dec 2020)]] | [[File:OliveOilsKcalDistribution20201216.jpg|thumb|center|Energy (kcal) distribution for olive oils (16 dec 2020)]] | ||
The energy-kcal distribution shows this effect even more clearly: there are at least two peaks visible, due to the different origin of the data. The data is even more complex on closer inspection. We discuss that later in the section on normalisation. | |||
==== Fat ==== | |||
The distribution of the fat percentage also shows a double peak: | |||
[[File:OliveOilsFatDistribution20201216.jpg|thumb|center|Fat distribution of olive oils (16 dec 2020)]] | |||
This distribution seems more strange. You would expect that an oil consists of 100% fat, or maybe a bit less if there are impurities. The other peak lies around the 91%, very much like the specific gravity of 0.911. | |||
Conclusion is that some producers report their nutritional values per 100g, some per 100ml and some use the canonical value of 100% (2193 products). A few product have values just below 100%, which seems the most honest. | |||
If we have a look at the 100ml sample: | |||
[[File:OliveOils100mlFatDistribution20201216.jpg|thumb|center|Distribution of fat percentage smaller than 97% (16-dec-2020)]] | |||
Interestingly we have another distribution with two peaks. So where does the upper peak come from. Looking at the origin of the data, it looks like these products have nutritional values listed per serving. The canonical fat percentage per serving is 14g for 15ml, which implies a "specific gravity" of 0.933, corresponding to this second peak. | |||
Also in the case of fat the data is more complex. | |||
==== Saturated Fat ==== | |||
The distribution of the saturated fat percentages does not present obvious multiple peaks (although the distribution does not seem symmetrical): | |||
[[File:OliveOilsSaturatedFat20201216.jpg|thumb|center|Distribution of saturated fat percentages (16-dec-2020)]] | |||
==== Unsaturated fats ==== | |||
There are quite some products (1252) that report on their unsaturated fat content (mono- and poly-unsaturated fat). The ratio between these two insaturated fats differs between products. | |||
[[File:OliveOilsUnsaturatedFatsCorrelation20201216.jpg|thumb|center|Correlation of the two unsaturated fats]] | |||
You see if there are more mono-unsaturated fats, there are less poly-unsaturated fats. There seems to be two correlation lines, I wonder whether these are due to a difference between the aforementioned groups. | |||
==== The omegas ==== | |||
A few products report on the values of the Omega-3, -6 and -9 values. As this is always a standard(?) fraction of the unsaturated fats, it does not add much value at the moment | |||
==== Vitamins ==== | |||
Quiten often the Vitamin E and Vitamin A are indicated. Quite strange. As if you are going to take olive oil for your vitamins. I guess the producers want to improve the health standing of their products. | |||
==== Conclusion ==== | |||
In conclusion we have three different group of products, whose data we can not compare. So we need to normalize before we can analyse any further. | |||
We can define the standard nutritional value ranges for olive oils: | |||
{| class="wikitable" | |||
|- | |||
! Nutritional<br>Element !! 10%<br>percentile !! Mean !! 90%<br>percentile | |||
|- | |||
| Energy (kJ) || 3090 || 3550 || 4010 | |||
|- | |||
| Energy (kcal) || 750 || 862 || 970 | |||
|- | |||
| Fat (g) || 91 || 95.5 || 100 | |||
|- | |||
| Saturated Fat (g) || 13 || 13.9 || 16 | |||
|- | |||
| Mono-unsaturated fat (g) || 67 || 69.7 || 78 | |||
|- | |||
| Poly-unsaturated fat (g) || 6.6 || 9.7 || 13 | |||
|- | |||
| Vitamin E (mg) || 20 || 25 || 40 | |||
|- | |||
| Vitamin A (µg) || 200 || 223 || 330 | |||
|- | |||
| Carbohydrate || 0 || - || 0.5 | |||
|- | |||
| Sugars || 0 || - || 0.5 | |||
|- | |||
| Proteins || 0 || - || 0.5 | |||
|- | |||
| Fiber || 0 || - || 0.5 | |||
|- | |||
| Sodium || 0 || - || 0.5 | |||
|} | |||
=== Subcategories === | === Subcategories === | ||
OFF has defined multiple | OFF has defined multiple subcategories based on origin (country, pdo) or pressing process (virgin, extra virgin). | ||
The virgin olive oils | The virgin olive oils are created using a first pressing of olives. The second extraction is based on what is left over and are called pomace olive oil ([https://en.wikipedia.org/wiki/Olive_oil wikipedia]). OFF also has a category [https://world.openfoodfacts.org/category/refined-olive-oils refined olive oils]. Not sure whether this refined category is the same as the pomace category, we should check the labels of the products. | ||
Possible origins are now France, Greece and Italy. We should add Tunisia, Spain. | Possible origins are now [https://world.openfoodfacts.org/category/olive-oils-from-france France] (30 products), [https://world.openfoodfacts.org/category/olive-oils-from-greece Greece] (43 products) and [https://world.openfoodfacts.org/category/olive-oils-from-italy Italy] (49 products). We should add at least Tunisia, Maroc, Algeria, Spain, Argentina and South Africa to these. | ||
In addition the countries can be subdivided into regions, which correspond to official PDO's. Only 72 olive oils have a PDO label. Many olive oils PDO's seem to be defined as categories, but not yet well integrated in the olive oils category. | |||
It is unclear whether these subcategories exhibit also differences in nutritional values. Before we can determine that we need more products in each PDO category. | |||
=== New categories === | === New categories === | ||
By looking more closely at all the products we can identify other categories. Only if there are a lot of products in each category or if the nutritional values deviate to much, it is worthwhile to create these new | By looking more closely at all the products we can identify other categories. Only if there are a lot of products in each category (or required if the nutritional values of these products deviate to much), it is worthwhile to create these new categories: | ||
* pure olive oils: the current olive oils category should be renamed to pure olive oils to indicate that these products contain only one ingredient. This helps also to distinguish from the other olive oils. | * pure olive oils: the current olive oils category should be renamed to pure olive oils to indicate that these products contain only one ingredient. This helps also to distinguish from the other olive oils. | ||
* [https://world.openfoodfacts.org/cgi/search.pl?action=process&tagtype_0=categories&tag_contains_0=contains&tag_0=olive%20oil%20sprays&sort_by=unique_scans_n&page_size=20 olive oil sprays] | * [https://world.openfoodfacts.org/cgi/search.pl?action=process&tagtype_0=categories&tag_contains_0=contains&tag_0=olive%20oil%20sprays&sort_by=unique_scans_n&page_size=20 olive oil sprays] contain other ingredients to make it sprayable. As the serving is only 0.25g, the nutritional values per serving are all zero (thanks to rounding). | ||
* enhanced olive oils: these olive oils have added vitamins for children(?) | * enhanced olive oils: these olive oils have added vitamins for children(?) | ||
* [https://world.openfoodfacts.org/cgi/search.pl?action=process&search_terms=flavoured%20olive%20oils&tagtype_0=categories&tag_contains_0=contains&tag_0=flavoured%20olive%20oils&sort_by=unique_scans_n&page_size=20 flavoured olive oils]: these olive oils have added flavours (garlic, etc.) | * [https://world.openfoodfacts.org/cgi/search.pl?action=process&search_terms=flavoured%20olive%20oils&tagtype_0=categories&tag_contains_0=contains&tag_0=flavoured%20olive%20oils&sort_by=unique_scans_n&page_size=20 flavoured olive oils]: these olive oils have added flavours (garlic, etc.) | ||
* | * Olive oil blends: some oils are a blend of refined and virgin olive oils. Or a blend of virgin or extra virgin oils. | ||
* | * Unfiltered olive oils: some oils indicate that they are unfiltered. Does this imply a lower fat percentage? | ||
=== NOVA === | === NOVA === | ||
Line 112: | Line 185: | ||
=== Nutriscore === | === Nutriscore === | ||
The Nutri-score for olive oils should be all the same: 10 points for energy, 1 point for saturated fat to fat ratio and -5 points for being olive oils. This will calculate to 6 points and thus NutriScore C. | |||
=== Eco-score === | === Eco-score === | ||
The [https://www.agribalyse.fr/app/aliments/17270#Huile_d'olive_vierge_extra environmental score] for extra virgin olive oil is 0.6, which is comparable to the other oils. No additional subdivisions are available. This imples that the Eco-score grade will be C. | |||
== Normalisation == | |||
As show in the previous section the Olive Oils category can be split into 3 (or more) subgroups, based on how the nutritional values are reported on the package: | |||
* per 100g | |||
* per 100ml | |||
* per volumetric serving | |||
And maybe there is even a fourth group: per weight serving. | |||
=== Formula's === | |||
If we know how the nutritional data for a product is reported, we can normalise that data. And with the normalised data we have a consistent dataset, which can be used to get the real nutritional values. We will normalise on the per 100g values with the formula: | |||
N<sub>100g</sub> = N<sub>100ml</sub> * sg | |||
In which: | |||
* N<sub>100g</sub> is the nutritional value per 100g | |||
* N<sub>100ml</sub> is the nutritional value per 100ml | |||
* sg the specific gravity of olive oils, for which I use 0.911 | |||
The conversion from a volumetric serving to 100ml is: | |||
N<sub>100ml</sub> = N<sub>serving</sub> * 100 / size<sub>serving</sub> | |||
in which: | |||
* N<sub>100ml</sub> is the nutritional value per 100 ml | |||
* N<sub>serving</sub> is the nutritional value per serving | |||
* size<sub>serving</sub> the serving size in ml | |||
From the histograms shown earlier, it is clear that the data does not correspond to these formula's. We need to add a correction factor, which accounts for rounding errors to account for the real serving size. | |||
Thus: | |||
N<sub>100g</sub> = N<sub>serving</sub> * 100 * sg * C / size<sub>serving</sub> | |||
or | |||
N<sub>100g</sub> = N<sub>serving</sub> * 100 * sg / size<sub>real serving</sub> | |||
with C the added correction factor, which corrects the serving size: | |||
size<sub>real serving </sub> = size<sub>serving</sub> / C | |||
We assume that C has the same value for all nutritional values. | |||
=== Groups === | |||
The first step to categorise each product into each of the groups. The group that has nutritional values per 100g will left as it is, so we need to concentrate only on the two other groups. | |||
==== Volumetric serving group ==== | |||
The ''per volumetric serving'' group is the easiest. Any product that has a US style nutritional table falls in this group. Unfortunately this is not registered, so we need to find a proxy for this table. | |||
We could use the serving size field for this: if it contains 15 ml, it is probably taken from a per serving nutritional table. I have seen exceptions however. Unfortunately the web-interface does not allow me to search on that. So this is not an option. | |||
A better proxy is the existence of the transfat field. If there is data in that field, it is most likely from an US or Canadian style nutritional table. Again this not a guarantee as other countries mark transfat as well. This results in 724 products. | |||
However some 60 products of this transfat sample have fat percentages around 100%, so we have false positives. We can add a fat limit of 94%, i.e. any product that has fat percentage less than 94% belongs to this group. With this we exclude the products that probably have the nutritional values listed per 100g. This second filter results in 629 products. | |||
==== Per 100 ml ==== | |||
Defining this group of products is more difficult. We only have the fat percentage itself to go on. Assuming a fat percentage of 100% and a specific gravity of 0.911, the listed fat percentage should be lower than 91.1 gram (per 100g) for products that have a nutritional table per 100 ml. | |||
Looking at some products, the canonical rounded value is 91g, but higher values are seen as well. So we could take a limit 92%. This will result in 1475 products. | |||
There will be an overlap with the products with a US-stye nutritional table, so it is better to subtract these US-products first. | |||
==== Summary ==== | |||
This grouping procedure results in group sizes of: | |||
* per 100g : 94 products | |||
* per 100ml: 2235 products | |||
* per vol. serving: 636 products | |||
=== Calculations === | |||
In order to normalise the data, it has been exported in XLSX-format. Imported in Numbers, pruned (all non-relevant data removed), scaled by the normalisation factors, merged into one table, and finally plotted and averaged. | |||
=== Results === | |||
==== Distributions ==== | |||
We need to plot the same distributions as presented earlier, so that we can see if the corrections had any effect. | |||
===== Fat ===== | |||
[[File:OliveOilsFatNormalisedDistribution20201224.png|thumb|center|Normalised fat percentage distribution (24-dec-2020)]] | |||
If we use the standard formulas, we get quite some olive oils with a fat percentage larger than 100%. These olive oils belong to the ''per 100ml''-group. A correction of 1% of the specific gravity, i.e. 0.920 instead of 0.911, is enough to have a maximum at 100%. | |||
There are quite some products below the 100% line. Lowering the specific gravity by a few promille, is sufficient to increase the fat percentage. So are these really different specific gravities? | |||
There is also a group of products that have a specific gravity of 101.4%. What is up with these? | |||
A closer look reveals that many of these products come from the USDA import or have 1 Tbsp serving size. So they belong to the per-serving-group, but as they lack the nutritional values that such products should have, like transfat, cholesterol, etc. So how can I recognise these? The original per serving data is not available, only the 100 ml data. This can be seen from the values that are a result of conversion, like 93.333333. This is the same fat percentage that we see in the per-serving group. Thus if a product has a fat percentage between 92.9 and 93.6, it should belong to the per serving group. | |||
[[File:OliveOilsNormalisedFatDistributionThree20201226.png|thumb|center|Olive oils corrected normalised fat distribution (26-dec-2020)]] | |||
The corrected normalised fat distribution is shown in the graph above. Now some 1400 products have a fat percentage of 100%. Every product that falls the 100% either have indeed less than 100% fat, or are the victim of rounding errors. | |||
There still lie a few products under 98% in the per 100ml group. These products have a fat percentage of 90g per 100ml. Is this a rounding issue? | |||
===== Saturated fat ===== | |||
The distribution spans some 4%. The corrected normalisation seems to have worked well. | |||
[[File:OliveOilsSaturatedFatNormalisedCorrected20201226.png|thumb|center|Corrected normalised saturated fat distribution (26-dec-2020)]] | |||
===== Mono-unsaturated fat ===== | |||
The range of mono-unsaturated fat spans some 10% with a peak at 71.5% and at 78.5%. The two peaks are present in all four groups. No explanation occurs to me. | |||
[[File:OliveOilsMonoUnsaturatedFatDistributionNormalisedCorrected20201227.png|thumb|center|Corrected normalised mono-unsaturated fat distribution (27-dec-2020)]] | |||
===== Energy (kcal) ===== | |||
[[File:OliveOilsEnergyKcalNormalisedCorrected20201227.png|thumb|center|Corrected normalised kcal-energy distribution (27-dec-2020)]] | |||
The distribution shows three peaks: | |||
* 856 kcal combines the serving, 100g and the serving in 100ml group | |||
* 870 kcal | |||
* 895 kcal | |||
The corrected normalisation has well worked out for the serving group, but not at all for the 100g group. We expect that the peak at 856 is the correct one. So what should the correction factor be if we want the peaks to match? | |||
[[File:OliveOilsKcalEnergyDistributionFudged20201227.png|thumb|center|Fudged kcal-energy distribution (27-dec-2020)]] | |||
The peak is roughly correct with a factor 1.055 instead of the 1.01 used earlier. No idea where this comes from. | |||
==== Correlations ==== | |||
* Energy versus fat | |||
The US/Canadian nutritional tables often have an entry ''Energy from Fat''. So we would expect a correlation between energy and fat percentage. This is shown in the graph below. Both the fat percentage and the Energy (kcal) have been normalised. | |||
[[File:OliveOilsEnergyKcalNormalised.png|thumb|center|Distribution of Fat percentage per Energy (kcal) (17-dec-2020)]] | |||
The graphs shows the 100mg group as blue dots, the per serving group as green dots and the 100ml group as red dots. | |||
The distribution of red dots seems to indicate a trend. This is however optical. The corresponding trend lines are shown as lines. There is hardly a correlation between fat percentage and Energy. The trendlines lie below the 100% fat percentage, due to a number of products having values below 100%. | |||
The difference in fat percentage can be explained by differences in specific gravity. | |||
* Energy versus saturated fat | |||
[[File:OliveOilsSaturatedFatEnergyNormalised20201217.png|thumb|center|Correlation between energy and saturated fat percentage (17-dec-2020)]] | |||
There is also no correlation between the saturated fat content and the energy. | |||
* Mono-unsaturated versus poly-unsaturated fat | |||
[[File:OliveOilsMonoPolyUnsaturatedFatCorrelation20201218.png|thumb|center|Correlation between normalised poly- and mono-unsaturated fat percentage (18-dec-2020)]] | |||
As expected there is a good correlation between the mono- and poly-unsaturated fat percentages. The normalisation worked out well. | |||
* Fat versus fats combined | |||
[[File:OliveOilsNormalisedFatsCombinedCorrelation20201218.png |thumb|center|Correlation between normalised fat percentages and saturated- /unsaturated fats combined (18-dec-2020)]] | |||
You would expect that the combined saturated- and unsaturated-fats add up to the same as the fat percentage. This is tested in the correlation above. There seems to be an upward correlation. The added values can be some 10 above or below the total fat percentage. Is this an indication of the accuracy of the listed fat percentages? | |||
==== Statistics ==== | |||
We can redefine the nutritional values based on the normalised data: | |||
{| class="wikitable" | |||
|- | |||
! Nutritional<br>Element !! 10%<br>percentile !! Mean !! 90%<br>percentile | |||
|- | |||
| Energy (kcal) || 857 || 888 || 900 | |||
|- | |||
| Fat (g) || 98.9 || 99.6 || 100 | |||
|- | |||
| Saturated Fat (g) || 13.7 || 14.5 || 16.3 | |||
|- | |||
| Mono-unsaturated fat (g) || 71.4 || 72.9 || 78.5 | |||
|- | |||
| Poly-unsaturated fat (g) || 7.1 || 11.2 || 14.3 | |||
|} | |||
==== Accuracy and Variation ==== | |||
How accurate are the listed nutritional values? And has the apparent variation any meaning? | |||
Many products use the canonical values as indication for their nutritional values, like 3700kJ or 900 kcal. These are probably average values. Note that the listed accuracy is one or two significant digits. This means that it is not worthwhile to add more significant digits. Normally this implies that the variation is ± 50 kcal or 50 kJ (200 kJ?). This corresponds to the spread of values seen in the graphs above. | |||
One can wonder about the usefulness of products that display the nutritional values with four or five significant digits. Are those values just a reflection of statistical errors, actual differences in olive oils (year, olive, region?) | |||
== OFF Conclusions and thoughts == | == OFF Conclusions and thoughts == | ||
* OFF quality | * OFF quality | ||
During the cleanup I edited a lot of products. This gives an indication of the quality of the OFF data. The counter stands at 318 products | During the cleanup I edited a lot of products. This gives an indication of the quality of the OFF data. The counter stands at 318 products of the 6115 products in the olive oils category, i.e. 7%. | ||
* Quality indicators | * Quality indicators | ||
Line 134: | Line 360: | ||
* Wrong nutritional values | * Wrong nutritional values | ||
Several products present nutritional values that is clearly wrong. Now I edited these out. I rather leave them and raise a flag, so that they are not used in the calculations. | Several products present nutritional values that is clearly wrong. Now I edited these out. I rather leave them and raise a flag, so that they are not used in the calculations. | ||
* Nutritional data per volume or weight | |||
It should be possible to indicate whether the nutritional data on the package is given per volume or per weight. This would allow for normalisation afterwards based on specific gravity. | |||
* Category names | |||
There are some category names that are singular, which must be changed to plural (mostly oil versus oils). | |||
Should the origin be written as country or adjective? i.e. olive oils from France versus French olive oils. | |||
Should a PDO be part of the category name? It would make them more easy to recognise. |
Latest revision as of 08:47, 6 August 2024
Introduction
In november 2020 @stephane asked to have a look at the products in the olive oil category. The category was in need of cleaning up. An olive oil should have a Nutriscore C, but other values were seen. So I had a look at the products and cleaned up a bit. This post is a log of my observations.
Background
Olive oils are a special category for Nutriscore, as they are a special case for the Nutriscore calculation. Olive oil is seen as one of the better oils, and thus may get a better score. This exception was recently introduced (can not find the change history).
However all oils are bad, so olive oil will never get a better score than C. This has provoked a lot of resistance in Italy and Spain, as olive oil is seen as a major export product (see). Those countries might leave out olive oil from the Nutriscore obligation.
Also a good reason to have a cleanup category.
Definition
https://en.wikipedia.org/wiki/Olive_oil is a simple product: it has only one ingredient: olives. The extraction process and origin might influence the actual composition of oils. This might be visible in the nutritional values and labelling on a product.
We can investigate whether we see any difference between olive oils.
Completeness
On 13 dec 2020 the https://world.openfoodfacts.org/category/olive-oils is comprised of 6118 products. There might be other olive oil products, which are not yet categorised.
Robotoff might help finding, as it might have found some eligible products. On 13 dec 2020 there were no remaining questions on Robotoff.
There are many products for which we have no nutritional data. It is impossible to go through all these by hand. A quick look at these products reveals that some 10% have images of the nutritional tables from which the data can be extracted. Clearly there is a role for Robotoff to play here in automatically extracting the data, or just signalling us that there might be such data.
Data quality
We like to have a consistent set of products in the olive oils category: no interlopers, data correct, etc.
Number of Ingredients
Are all the products in the Olive oils category indeed olive oils. We need to find the products that are not olive oils (the interlopers).
A check is to look at the number of ingredients. And it turns out that are some products that seem wrongly classified. Note that of the 6118 products in the olive oils category only 1967 seem to have ingredients defined. I do not know how to list the products with more than one ingredient. So we have no other choice than find to go through all products, if we want to find these products.
Another way to find interlopers might be by looking at the nutritional values. If a nutritional value is to abnormal, then it might be another type of product. But then we have to find out what is normal first.
Nutritional values check
A next step to find and correct wrong nutritional data, I started by looking at the fat percentage. There were 6 products with more than 100% fat, which is clearly impossible. These can be listed with this search query. These products had just the wrong data entered. Although one product actually had a fat content of 101g. The next step is to look at fat percentages that were too low.
The graph shows the final distribution of fat percentages. Note that the distribution has two peaks, one around 100% and one around 90%. The reason for this will be investigated further on.
As the fat percentage shows, we need to check all nutritional values and check if anything is out of the ordinary. For this we need to define an envelope of accepted values. The products that have all values will not be checked any further. Products with one or more values outside this envelope need to be checked by hand.
The envelope check also allowed to remove wrongly assigned products (very few) and do other repairs, like adding ingredients images, doing the recognition, adding subcategories, adding brands, adding package sizes, etc.
Typical errors
The repaired errors were mainly of the following type:
- wrong classifications
- mixup between per serving and per 100g data
- all nutritional values set to 0
- forgotten to check per serving
- no total fat percentage on label
- some yuka users converted the per serving to per 100g values
Olive oil conclusions and thoughts
After having repaired most of the olive oil products, we can have a closer look at the data in detail.
States overview
First an overview of the completion states.
Some highlights (status on 16 dec 2020):
- Total in category: 6043
- Has nutritional values: 5380 (89%)
- Has ingredients: 1971 (33%)
Ingredients
The ingredients mentioned now are something like olive oil. But does that convey any useful information? Preferably we would like to know the olive variety (or varieties). The origin of the olives and the processes applied to extract the oil. Words like virgin, extra virgin, cold-pressed, AOP, the origin of the olives, etc provides more information and is good information.
Now and then the information available on the package does provide more information, like "huile d'olive de catégorie supérieure obtenue directement des olives et uniquement par des procédés mécaniques". So we know it is made from cold pressed olives.
Sometimes one can find the olive variety elsewhere on the packaging.
Serving size
The products sold in the note a serving size on 15 ml on their packaging, which is equivalent to 1 tablespoon. Instead of the 15 ml, a weight of 14 g can be given. This weight corresponds to the specific gravity of 0.911 of olive oils (wikipedia). On products sold in Europe a serving size of 10 ml can sometimes be found. The usefulness of having a serving size for a cooking aid can be debated. In fact 15 ml seems a lot.
Packaging size
The packaging sizes seem to vary between 100ml and 5L.
Nutritional values
The nutritional values can be presented in four ways:
- per weight serving (usually 14g);
- per volumetric serving (usually 1 tablespoon, i.e. 15ml);
- per 100 ml
- per 100 g
This implies that we cannot easily compare the values.
OFF applies a conversion from serving to 100ml/mg by simple scaling. Thus to go from 15ml to 100ml, just multiply by 6.67.
Energy
Many olive oil products use canonical nutritional values. The canonical values for energy (wikipedia) is 3700 kJ or 880 kcal. The products use 900 kcal instead. On US style nutritional tables, the canonical value is 120 Calories. Simple scaling implies than an Energy of 800 kcal.
In total there are 1957 products with this canonical value.
The energy-kcal distribution shows this effect even more clearly: there are at least two peaks visible, due to the different origin of the data. The data is even more complex on closer inspection. We discuss that later in the section on normalisation.
Fat
The distribution of the fat percentage also shows a double peak:
This distribution seems more strange. You would expect that an oil consists of 100% fat, or maybe a bit less if there are impurities. The other peak lies around the 91%, very much like the specific gravity of 0.911.
Conclusion is that some producers report their nutritional values per 100g, some per 100ml and some use the canonical value of 100% (2193 products). A few product have values just below 100%, which seems the most honest.
If we have a look at the 100ml sample:
Interestingly we have another distribution with two peaks. So where does the upper peak come from. Looking at the origin of the data, it looks like these products have nutritional values listed per serving. The canonical fat percentage per serving is 14g for 15ml, which implies a "specific gravity" of 0.933, corresponding to this second peak.
Also in the case of fat the data is more complex.
Saturated Fat
The distribution of the saturated fat percentages does not present obvious multiple peaks (although the distribution does not seem symmetrical):
Unsaturated fats
There are quite some products (1252) that report on their unsaturated fat content (mono- and poly-unsaturated fat). The ratio between these two insaturated fats differs between products.
You see if there are more mono-unsaturated fats, there are less poly-unsaturated fats. There seems to be two correlation lines, I wonder whether these are due to a difference between the aforementioned groups.
The omegas
A few products report on the values of the Omega-3, -6 and -9 values. As this is always a standard(?) fraction of the unsaturated fats, it does not add much value at the moment
Vitamins
Quiten often the Vitamin E and Vitamin A are indicated. Quite strange. As if you are going to take olive oil for your vitamins. I guess the producers want to improve the health standing of their products.
Conclusion
In conclusion we have three different group of products, whose data we can not compare. So we need to normalize before we can analyse any further.
We can define the standard nutritional value ranges for olive oils:
Nutritional Element |
10% percentile |
Mean | 90% percentile |
---|---|---|---|
Energy (kJ) | 3090 | 3550 | 4010 |
Energy (kcal) | 750 | 862 | 970 |
Fat (g) | 91 | 95.5 | 100 |
Saturated Fat (g) | 13 | 13.9 | 16 |
Mono-unsaturated fat (g) | 67 | 69.7 | 78 |
Poly-unsaturated fat (g) | 6.6 | 9.7 | 13 |
Vitamin E (mg) | 20 | 25 | 40 |
Vitamin A (µg) | 200 | 223 | 330 |
Carbohydrate | 0 | - | 0.5 |
Sugars | 0 | - | 0.5 |
Proteins | 0 | - | 0.5 |
Fiber | 0 | - | 0.5 |
Sodium | 0 | - | 0.5 |
Subcategories
OFF has defined multiple subcategories based on origin (country, pdo) or pressing process (virgin, extra virgin).
The virgin olive oils are created using a first pressing of olives. The second extraction is based on what is left over and are called pomace olive oil (wikipedia). OFF also has a category refined olive oils. Not sure whether this refined category is the same as the pomace category, we should check the labels of the products.
Possible origins are now France (30 products), Greece (43 products) and Italy (49 products). We should add at least Tunisia, Maroc, Algeria, Spain, Argentina and South Africa to these.
In addition the countries can be subdivided into regions, which correspond to official PDO's. Only 72 olive oils have a PDO label. Many olive oils PDO's seem to be defined as categories, but not yet well integrated in the olive oils category.
It is unclear whether these subcategories exhibit also differences in nutritional values. Before we can determine that we need more products in each PDO category.
New categories
By looking more closely at all the products we can identify other categories. Only if there are a lot of products in each category (or required if the nutritional values of these products deviate to much), it is worthwhile to create these new categories:
- pure olive oils: the current olive oils category should be renamed to pure olive oils to indicate that these products contain only one ingredient. This helps also to distinguish from the other olive oils.
- olive oil sprays contain other ingredients to make it sprayable. As the serving is only 0.25g, the nutritional values per serving are all zero (thanks to rounding).
- enhanced olive oils: these olive oils have added vitamins for children(?)
- flavoured olive oils: these olive oils have added flavours (garlic, etc.)
- Olive oil blends: some oils are a blend of refined and virgin olive oils. Or a blend of virgin or extra virgin oils.
- Unfiltered olive oils: some oils indicate that they are unfiltered. Does this imply a lower fat percentage?
NOVA
What should be the value of the NOVA-score for olive oils? It seems to be now mainly NOVA 2. But shouldn't it be NOVA 3 for non-virgin olive oils. The pomace oils are created through chemical processes. A NOVA downgrade seems appropriate. This would also mean that the standard value for the category Olive oils would be NOVA 2, only if the product would be assigned to Virgin olive oils, it would turn NOVA 2.
Nutriscore
The Nutri-score for olive oils should be all the same: 10 points for energy, 1 point for saturated fat to fat ratio and -5 points for being olive oils. This will calculate to 6 points and thus NutriScore C.
Eco-score
The environmental score for extra virgin olive oil is 0.6, which is comparable to the other oils. No additional subdivisions are available. This imples that the Eco-score grade will be C.
Normalisation
As show in the previous section the Olive Oils category can be split into 3 (or more) subgroups, based on how the nutritional values are reported on the package:
- per 100g
- per 100ml
- per volumetric serving
And maybe there is even a fourth group: per weight serving.
Formula's
If we know how the nutritional data for a product is reported, we can normalise that data. And with the normalised data we have a consistent dataset, which can be used to get the real nutritional values. We will normalise on the per 100g values with the formula:
N100g = N100ml * sg
In which:
- N100g is the nutritional value per 100g
- N100ml is the nutritional value per 100ml
- sg the specific gravity of olive oils, for which I use 0.911
The conversion from a volumetric serving to 100ml is:
N100ml = Nserving * 100 / sizeserving
in which:
- N100ml is the nutritional value per 100 ml
- Nserving is the nutritional value per serving
- sizeserving the serving size in ml
From the histograms shown earlier, it is clear that the data does not correspond to these formula's. We need to add a correction factor, which accounts for rounding errors to account for the real serving size.
Thus:
N100g = Nserving * 100 * sg * C / sizeserving
or
N100g = Nserving * 100 * sg / sizereal serving
with C the added correction factor, which corrects the serving size:
sizereal serving = sizeserving / C
We assume that C has the same value for all nutritional values.
Groups
The first step to categorise each product into each of the groups. The group that has nutritional values per 100g will left as it is, so we need to concentrate only on the two other groups.
Volumetric serving group
The per volumetric serving group is the easiest. Any product that has a US style nutritional table falls in this group. Unfortunately this is not registered, so we need to find a proxy for this table.
We could use the serving size field for this: if it contains 15 ml, it is probably taken from a per serving nutritional table. I have seen exceptions however. Unfortunately the web-interface does not allow me to search on that. So this is not an option.
A better proxy is the existence of the transfat field. If there is data in that field, it is most likely from an US or Canadian style nutritional table. Again this not a guarantee as other countries mark transfat as well. This results in 724 products.
However some 60 products of this transfat sample have fat percentages around 100%, so we have false positives. We can add a fat limit of 94%, i.e. any product that has fat percentage less than 94% belongs to this group. With this we exclude the products that probably have the nutritional values listed per 100g. This second filter results in 629 products.
Per 100 ml
Defining this group of products is more difficult. We only have the fat percentage itself to go on. Assuming a fat percentage of 100% and a specific gravity of 0.911, the listed fat percentage should be lower than 91.1 gram (per 100g) for products that have a nutritional table per 100 ml.
Looking at some products, the canonical rounded value is 91g, but higher values are seen as well. So we could take a limit 92%. This will result in 1475 products.
There will be an overlap with the products with a US-stye nutritional table, so it is better to subtract these US-products first.
Summary
This grouping procedure results in group sizes of:
- per 100g : 94 products
- per 100ml: 2235 products
- per vol. serving: 636 products
Calculations
In order to normalise the data, it has been exported in XLSX-format. Imported in Numbers, pruned (all non-relevant data removed), scaled by the normalisation factors, merged into one table, and finally plotted and averaged.
Results
Distributions
We need to plot the same distributions as presented earlier, so that we can see if the corrections had any effect.
Fat
If we use the standard formulas, we get quite some olive oils with a fat percentage larger than 100%. These olive oils belong to the per 100ml-group. A correction of 1% of the specific gravity, i.e. 0.920 instead of 0.911, is enough to have a maximum at 100%.
There are quite some products below the 100% line. Lowering the specific gravity by a few promille, is sufficient to increase the fat percentage. So are these really different specific gravities?
There is also a group of products that have a specific gravity of 101.4%. What is up with these?
A closer look reveals that many of these products come from the USDA import or have 1 Tbsp serving size. So they belong to the per-serving-group, but as they lack the nutritional values that such products should have, like transfat, cholesterol, etc. So how can I recognise these? The original per serving data is not available, only the 100 ml data. This can be seen from the values that are a result of conversion, like 93.333333. This is the same fat percentage that we see in the per-serving group. Thus if a product has a fat percentage between 92.9 and 93.6, it should belong to the per serving group.
The corrected normalised fat distribution is shown in the graph above. Now some 1400 products have a fat percentage of 100%. Every product that falls the 100% either have indeed less than 100% fat, or are the victim of rounding errors.
There still lie a few products under 98% in the per 100ml group. These products have a fat percentage of 90g per 100ml. Is this a rounding issue?
Saturated fat
The distribution spans some 4%. The corrected normalisation seems to have worked well.
Mono-unsaturated fat
The range of mono-unsaturated fat spans some 10% with a peak at 71.5% and at 78.5%. The two peaks are present in all four groups. No explanation occurs to me.
Energy (kcal)
The distribution shows three peaks:
- 856 kcal combines the serving, 100g and the serving in 100ml group
- 870 kcal
- 895 kcal
The corrected normalisation has well worked out for the serving group, but not at all for the 100g group. We expect that the peak at 856 is the correct one. So what should the correction factor be if we want the peaks to match?
The peak is roughly correct with a factor 1.055 instead of the 1.01 used earlier. No idea where this comes from.
Correlations
- Energy versus fat
The US/Canadian nutritional tables often have an entry Energy from Fat. So we would expect a correlation between energy and fat percentage. This is shown in the graph below. Both the fat percentage and the Energy (kcal) have been normalised.
The graphs shows the 100mg group as blue dots, the per serving group as green dots and the 100ml group as red dots.
The distribution of red dots seems to indicate a trend. This is however optical. The corresponding trend lines are shown as lines. There is hardly a correlation between fat percentage and Energy. The trendlines lie below the 100% fat percentage, due to a number of products having values below 100%.
The difference in fat percentage can be explained by differences in specific gravity.
- Energy versus saturated fat
There is also no correlation between the saturated fat content and the energy.
- Mono-unsaturated versus poly-unsaturated fat
As expected there is a good correlation between the mono- and poly-unsaturated fat percentages. The normalisation worked out well.
- Fat versus fats combined
You would expect that the combined saturated- and unsaturated-fats add up to the same as the fat percentage. This is tested in the correlation above. There seems to be an upward correlation. The added values can be some 10 above or below the total fat percentage. Is this an indication of the accuracy of the listed fat percentages?
Statistics
We can redefine the nutritional values based on the normalised data:
Nutritional Element |
10% percentile |
Mean | 90% percentile |
---|---|---|---|
Energy (kcal) | 857 | 888 | 900 |
Fat (g) | 98.9 | 99.6 | 100 |
Saturated Fat (g) | 13.7 | 14.5 | 16.3 |
Mono-unsaturated fat (g) | 71.4 | 72.9 | 78.5 |
Poly-unsaturated fat (g) | 7.1 | 11.2 | 14.3 |
Accuracy and Variation
How accurate are the listed nutritional values? And has the apparent variation any meaning?
Many products use the canonical values as indication for their nutritional values, like 3700kJ or 900 kcal. These are probably average values. Note that the listed accuracy is one or two significant digits. This means that it is not worthwhile to add more significant digits. Normally this implies that the variation is ± 50 kcal or 50 kJ (200 kJ?). This corresponds to the spread of values seen in the graphs above.
One can wonder about the usefulness of products that display the nutritional values with four or five significant digits. Are those values just a reflection of statistical errors, actual differences in olive oils (year, olive, region?)
OFF Conclusions and thoughts
- OFF quality
During the cleanup I edited a lot of products. This gives an indication of the quality of the OFF data. The counter stands at 318 products of the 6115 products in the olive oils category, i.e. 7%.
- Quality indicators
Many olive oils sold in the USA show extra quality indicators (eg this one). We could add these parameters to the nutritional values. These are mainly expressed as limits (less than or more than).
- US import issues?
Many US product are not indicated by serving. In the total fat field, the monounsaturated fats are shown. Did we have an import issue?
- Quality check
Can the verified average values be used as a quality check on new products? A flag could be raised if one of the nutritional values is outside the allowed range.
- Missing Ingredients
Many products have no Nova calculated as their ingredient list is empty. Could we default to olive oil as ingredient? We could define this category as a base food with a single ingredient. This in turn could be used as quality check.
- Wrong nutritional values
Several products present nutritional values that is clearly wrong. Now I edited these out. I rather leave them and raise a flag, so that they are not used in the calculations.
- Nutritional data per volume or weight
It should be possible to indicate whether the nutritional data on the package is given per volume or per weight. This would allow for normalisation afterwards based on specific gravity.
- Category names
There are some category names that are singular, which must be changed to plural (mostly oil versus oils).
Should the origin be written as country or adjective? i.e. olive oils from France versus French olive oils.
Should a PDO be part of the category name? It would make them more easy to recognise.