Data quality: Difference between revisions

From Open Food Facts wiki
Β 
(11 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Some important things to know:
Data quality at Open Food Facts is guided by '''3 main principles''':
* Quality does not make sense for itself: quality depends on usages.
* Quality does not make sense for itself: quality depends on usages.
* No database at all can pretend to zero-default.
* We always try to favor ease of use and ease of collection. Data quality control, data verification and data fix should never be an obstacle to data gathering.
* With more than 2 600 000 products, there are quality concerns: our goal is to lower the impacts of the issues.
* No database at all can pretend to be zero-defect. With more than 2,900,000 products, there are quality concerns. Data quality is done on a "best effort" basis. The effort on data quality includes measuring quality, setting objectives, publishing them and implementing them.
These principles and our high level objectives are described in the following short document, which you should read at first if you want to go further:


== Data quality: how to help? ==
https://link.openfoodfacts.org/data-quality Β 


=== Nutrition values issues ===
== Measures ==
Open Food Facts identifies some issues related to nutrition values. Some of them are '''very easy to solve''':
We have started an initiative to measure and publish continuously some data quality stats. We have created a specific page dedicated to [[data quality stats]].
* [https://world.openfoodfacts.org/data-quality/energy-value-in-kcal-greater-than-in-kj Energy value in kcal greater than in kJ]
Β 
* [https://world.openfoodfacts.org/data-quality/nutrition-value-over-105-salt Nutrition Salt is higher than 100g per 100g]
== How to help? ==
* [https://world.openfoodfacts.org/data-quality/nutrition-value-over-105-carbohydrates Carbohydrate is higher than 100g per 100g]
Anyone can help to improve the data quality.
* [https://world.openfoodfacts.org/data-quality/nutrition-value-over-105-fat Fat is higher than 100g per 100g]
Β 
=== 1. Adding photos ===
Photos allow other contributors to verify and, if necessary, fix the issues. As a contributor, this is the first step to improve data quality.
Β 
=== 2. Fixing a product issue ===
As in Wikipedia, anyone can edit at Open Food Facts. If you see an error, don't hesitate to fix it! If you're afraid or hesitating, you can ask your questions on [https://forum.openfoodfacts.org the forum] or in our [https://slack.openfoodfacts.org/ Slack space].


=== Nutri-Score quality ===
=== 3. Report an issue related to many products or related to data quality ===
Some products now have Nutri-Score printed on the front of pack. Some differs from our Nutri-Score calculation. We should take care about that:
Sometimes you can discover issues related to many products or related to data quality.
* [https://world.openfoodfacts.org/nutrition-grade/e/label/nutriscore-grade-a Nutri-Score printed A but calculated E]
* [https://world.openfoodfacts.org/nutrition-grade/d/label/nutriscore-grade-a Nutri-Score printed A but calculated D]
* [https://world.openfoodfacts.org/nutrition-grade/c/label/nutriscore-grade-a Nutri-Score printed A but calculated C]
* [https://world.openfoodfacts.org/nutrition-grade/a/label/nutriscore-grade-e Nutri-Score printed E but calculated A]
* [https://world.openfoodfacts.org/nutrition-grade/b/label/nutriscore-grade-e Nutri-Score printed E but calculated B]
* [https://world.openfoodfacts.org/nutrition-grade/c/label/nutriscore-grade-e Nutri-Score printed E but calculated C]
* [https://world.openfoodfacts.org/nutrition-grade/a/label/nutriscore-grade-d Nutri-Score printed D but calculated A]
* [https://world.openfoodfacts.org/nutrition-grade/b/label/nutriscore-grade-d Nutri-Score printed D but calculated B]
* [https://world.openfoodfacts.org/nutrition-grade/c/label/nutriscore-grade-d Nutri-Score printed D but calculated C]
* [https://world.openfoodfacts.org/nutrition-grade/e/label/nutriscore-grade-b Nutri-Score printed B but calculated E]
* [https://world.openfoodfacts.org/nutrition-grade/d/label/nutriscore-grade-b Nutri-Score printed B but calculated D]
* [https://world.openfoodfacts.org/nutrition-grade/c/label/nutriscore-grade-b Nutri-Score printed B but calculated C]
There are many reasons why it can differ:
* the label in Open Food Facts does not represent the label printed on the package (easy to solve)
* the label is correct, but our calculation doesn't provide the same result:
** check the category,
** then check the nutrition facts: the issue is sometimes the lack of "fibers" information or the lack of "Fruits, vegetables, nuts and rapeseed, walnut and olive oils" percentage.
* it can be a software issue (quite rare but possible).
{| class="wikitable"
|+
!Issue
!Rationale
!How to fix
|-
|The Nutri-Score displayed by the producer is different from the Nutri-Score Open Food Facts computes
|The label in Open Food Facts does not represent the label printed on the package
|Change the label
|-
|idem
|The category is wrong
|Change the category and see if it's modifying the Nutri-Score
|-
|idem
|The nutrition facts are wrong
|
* Update it
* If the product's nutrition facts has evolved from a previous version, you can add it to the [[Verification/Noteworthy Products#Products with a Nutri-Score that has improved over time|products whose recipe has changed]] or even better, if it's the case, the [[Verification/Noteworthy Products#Products with a Nutri-Score that has improved over time|products with a Nutri-Score that has improved over time]].
|}


=== Cleaning up the consequences of an old Android bug ===
* You can report the issue on [https://forum.openfoodfacts.org the forum] or in our [https://slack.openfoodfacts.org/ Slack space].
The word "Loading…" replaced the correct product name. 99% of phones have been updated with the fix, but we still have some unfixed products.
* You can also directly [https://github.com/openfoodfacts/openfoodfacts-server/issues/new/choose report the issue directly on our bug reporting tool].
* [https://world.openfoodfacts.org/cgi/search.pl?action=process&search_terms=Caricamento%E2%80%A6&sort_by=unique_scans_n&page_size=24&page=1&sort_by=unique_scans_n Italian]
* [https://world.openfoodfacts.org/cgi/search.pl?action=process&search_terms=Loading%E2%80%A6&sort_by=unique_scans_n&page_size=24&page=1&sort_by=unique_scans_n English]
* [https://world.openfoodfacts.org/cgi/search.pl?action=process&search_terms=chargement%E2%80%A6&sort_by=unique_scans_n&page_size=24&page=1&sort_by=unique_scans_n French]
* [https://world.openfoodfacts.org/cgi/search.pl?search_terms=laden%E2%80%A6&search_simple=1&action=process German]


=== Non-Food Products ===
=== 4. Help improve data quality with specific missions ===
Some people are adding products which are not food: beauty products, books, pet food, etc. These products have to be moved to Open Food Facts side projects. Our AI (artificial intelligence) already identifies many cases. These cases are published in the [https://app.slack.com/client/T02KVRT1Q/CT2N423PA/thread/GCUD53J5R-1586349162.333800?cdn_fallback=2 #bot-image-alerts] channel on our slack space.
If you want to further, you can check [[Data quality missions|specific missions related to data quality]]. Some missions are fast and easy to achieve. Β 


==== How to move these products? ====
=== 5. Joining the effort to improve data quality ===
* identify a product in the [https://app.slack.com/client/T02KVRT1Q/CT2N423PA/thread/GCUD53J5R-1586349162.333800?cdn_fallback=2 #bot-image-alerts] channel
We organize a public monthly meeting dedicated to data quality. It takes place every first Tuesday of the month at 6pm CET, see [[Events|Open Food Facts' events]] to find the next meetings in our community calendar.
* clic on the link after "edit:"
* if you have the rights to so, you will see "If the barcode is not correct, please correct it here"
** enter "obf" to move beauty products to Open Beauty Facts
** enter "opff" to move products to Open Pet Food Facts
** enter "opf" to move products to Open Product Facts
* save (if "A product already exists with the new code" message appear, move it manually, and delete it)
* in [https://app.slack.com/client/T02KVRT1Q/CT2N423PA/thread/GCUD53J5R-1586349162.333800?cdn_fallback=2 #bot-image-alerts] channel, annotate the product with a "checked" icon to tell others that the product has been moved


== Data quality measurement ==
If you have '''technical skills''', you can also do your part for data quality. Head over to [https://github.com/openfoodfacts/openfoodfacts-server/issues/5538 our tracking issue on GitHub]
We have started an initiative to measure and publish continuously some data quality stats. We have created a specific page dedicated to [[data quality stats]].
Β 
== Tools ==
[to be completed]
=== Prevention ===
==== Prevention - Encouraging good API implementation ====
* Documentation and API tutorial for 3rd party quality controls
* Private channels with 3rd party apps for moderation
==== Prevention - Quality warnings ====
==== Website ====
==== Official mobile app ====
Β 
==== Prevention - Edit rules ====
* We silently discard/rewrite some edits in specific conditions when we have no other solutions
Β 
=== Data quality measurement ===
See [[data quality stats]].
Β 
==== Quality facets ====
[to be described]
Β 
==== Quality knowledge panel ====
* A knowledge panel showing all the potential data quality issues related to the product.
Β 
==== Nutri-Patrol ====
* [[Nutri-Patrol]]
Β 
=== Data quality daily ===
[https://mirabelle.openfoodfacts.org/-/data-quality-daily/subscribe Data quality daily] is a daily email suggesting you 3 Open Food Facts products to fix.
Β 
* this is ''your'' mission, these 3 products are not sent to other users
* these products should be fixable (photos)
* products' popularity is taken into account (number of scans last year): your fixes have a higher impact
* you also get nice daily stats about data quality, including a contributors' board.
Β 
Don't hesitate to [https://mirabelle.openfoodfacts.org/-/data-quality-daily/subscribe register], you can unsubscribe at any time.
Β 
=== Power User Script ===
[https://github.com/openfoodfacts/power-user-script Power User Script] is a user script for your browser, to empower Open Food FactsΒ  contribution. It offers many enhancements for contributors, and many features dedicated to data quality.
Β 
Β 
=== Where do the errors come from? ===
Thanks to Mirabelle tool, we have created the list of contributors errors' count: https://mirabelle.openfoodfacts.org/_memory/errors_from
Β 
We can monitor this list from time to time, to prevent some bad data from apps or users.
Β 
== Reference / documentation ==
Β 
=== Quality facets ===
There are more than 180 data quality facets. You can consult the [[List of data quality errors (generated)|list of data quality errors]].
Β 
=== Data quality issues which can't be fixed ===
Some data quality issues can't be fixed due to different reasons. See the dedicated page: [[Data quality issues which can't be fixed]].
[[Category:Quality]]
[[Category:Quality]]
[[Category:Data quality]]


== Helping technically with data quality ==
== Get in touch ==
* If you have technical skills, you can also do your part for data quality. Head over to [https://github.com/openfoodfacts/openfoodfacts-server/issues/5538 our tracking issue on GitHub]
{{Box
| 1Β  Β  = Slack channel
| 2Β  Β  =Β  [https://openfoodfacts.slack.com/messages/C03H290LF/ #quality-data]
}}

Latest revision as of 13:51, 19 August 2024

Data quality at Open Food Facts is guided by 3 main principles:

  • Quality does not make sense for itself: quality depends on usages.
  • We always try to favor ease of use and ease of collection. Data quality control, data verification and data fix should never be an obstacle to data gathering.
  • No database at all can pretend to be zero-defect. With more than 2,900,000 products, there are quality concerns. Data quality is done on a "best effort" basis. The effort on data quality includes measuring quality, setting objectives, publishing them and implementing them.

These principles and our high level objectives are described in the following short document, which you should read at first if you want to go further:

https://link.openfoodfacts.org/data-quality

Measures

We have started an initiative to measure and publish continuously some data quality stats. We have created a specific page dedicated to data quality stats.

How to help?

Anyone can help to improve the data quality.

1. Adding photos

Photos allow other contributors to verify and, if necessary, fix the issues. As a contributor, this is the first step to improve data quality.

2. Fixing a product issue

As in Wikipedia, anyone can edit at Open Food Facts. If you see an error, don't hesitate to fix it! If you're afraid or hesitating, you can ask your questions on the forum or in our Slack space.

3. Report an issue related to many products or related to data quality

Sometimes you can discover issues related to many products or related to data quality.

4. Help improve data quality with specific missions

If you want to further, you can check specific missions related to data quality. Some missions are fast and easy to achieve.

5. Joining the effort to improve data quality

We organize a public monthly meeting dedicated to data quality. It takes place every first Tuesday of the month at 6pm CET, see Open Food Facts' events to find the next meetings in our community calendar.

If you have technical skills, you can also do your part for data quality. Head over to our tracking issue on GitHub

Tools

[to be completed]

Prevention

Prevention - Encouraging good API implementation

  • Documentation and API tutorial for 3rd party quality controls
  • Private channels with 3rd party apps for moderation

Prevention - Quality warnings

Website

Official mobile app

Prevention - Edit rules

  • We silently discard/rewrite some edits in specific conditions when we have no other solutions

Data quality measurement

See data quality stats.

Quality facets

[to be described]

Quality knowledge panel

  • A knowledge panel showing all the potential data quality issues related to the product.

Nutri-Patrol

Data quality daily

Data quality daily is a daily email suggesting you 3 Open Food Facts products to fix.

  • this is your mission, these 3 products are not sent to other users
  • these products should be fixable (photos)
  • products' popularity is taken into account (number of scans last year): your fixes have a higher impact
  • you also get nice daily stats about data quality, including a contributors' board.

Don't hesitate to register, you can unsubscribe at any time.

Power User Script

Power User Script is a user script for your browser, to empower Open Food Facts contribution. It offers many enhancements for contributors, and many features dedicated to data quality.


Where do the errors come from?

Thanks to Mirabelle tool, we have created the list of contributors errors' count: https://mirabelle.openfoodfacts.org/_memory/errors_from

We can monitor this list from time to time, to prevent some bad data from apps or users.

Reference / documentation

Quality facets

There are more than 180 data quality facets. You can consult the list of data quality errors.

Data quality issues which can't be fixed

Some data quality issues can't be fixed due to different reasons. See the dedicated page: Data quality issues which can't be fixed.

Get in touch

Slack channel