Data quality

From Open Food Facts wiki
Revision as of 16:09, 2 July 2024 by Teolemon (talk | contribs) (Teolemon moved page Quality to Data quality)

Data quality at Open Food Facts is guided by 3 main principles:

  • Quality does not make sense for itself: quality depends on usages.
  • We always try to favor ease of use and ease of collection. Data quality control, data verification and data fix should never be an obstacle to data gathering.
  • No database at all can pretend to be zero-defect. With more than 2,900,000 products, there are quality concerns. Data quality is done on a "best effort" basis. The effort on data quality includes measuring quality, setting objectives, publishing them and implementing them.

These principles and our high level objectives are described in the following short document, which you should read at first if you want to go further:

https://link.openfoodfacts.org/data-quality

Measures

We have started an initiative to measure and publish continuously some data quality stats. We have created a specific page dedicated to data quality stats.

How to help?

Anyone can help to improve the data quality.

1. Adding photos

Photos allow other contributors to verify and, if necessary, fix the issues. As a contributor, this is the first step to improve data quality.

2. Fixing a product issue

As in Wikipedia, anyone can edit at Open Food Facts. If you see an error, don't hesitate to fix it! If you're afraid or hesitating, you can ask your questions on the forum or in our Slack space.

3. Report an issue related to many products or related to data quality

Sometimes you can discover issues related to many products or related to data quality.

4. Help improve data quality with specific missions

If you want to further, you can check specific missions related to data quality. Some missions are fast and easy to achieve.

5. Joining the effort to improve data quality

We organize a public monthly meeting dedicated to data quality. It takes place every first Tuesday of the month at 6pm CET, see Open Food Facts' events to find the next meetings in our community calendar.

If you have technical skills, you can also do your part for data quality. Head over to our tracking issue on GitHub

Tools

[to be completed]

Data quality measurement

See data quality stats.

Quality facets

[to be described]

Data quality daily

Data quality daily is a daily email suggesting you 3 Open Food Facts products to fix.

  • this is your mission, these 3 products are not sent to other users
  • these products should be fixable (photos)
  • products' popularity is taken into account (number of scans last year): your fixes have a higher impact
  • you also get nice daily stats about data quality, including a contributors' board.

Don't hesitate to register, you can unsubscribe at any time.

Power User Script

Power User Script is a user script for your browser, to empower Open Food Facts contribution. It offers many enhancements for contributors, and many features dedicated to data quality.

Where do the errors come from?

Thanks to Mirabelle tool, we have created the list of contributors errors' count: https://mirabelle.openfoodfacts.org/_memory/errors_from

We can monitor this list from time to time, to prevent some bad data from apps or users.

Reference / documentation

Quality facets

There are more than 180 data quality facets. You can consult the list of data quality errors.

Data quality issues which can't be fixed

Some data quality issues can't be fixed due to different reasons. See the dedicated page: Data quality issues which can't be fixed.