Project:Data Quality

From Open Food Facts wiki

Introduction

As more and more people add data to OFF, the quality of the data is under pressure. People might enter data that is wrong, in the wrong place, etc. This could be by accident or on purpose. If OFF wants to be a reliable source of information for consumers, it should set up procedures to guard and improve the data quality. This project is an attempt to create an overview of the possible errors, how these errors can be found and what can be done about it. (This project extends Project:Error detection)

Data errors

This section lists all the possible data errors. Each error has a description, a detection method and a solution.

Field issues

Issues that related to the filling of the fields.

Wrong language in language dependent field

  • Issue: the wrong language has been entered in a language dependent field;
  • Detection: search for one language in another language;
  • Solution:;

Not available language entered

  • Issue: a product has listed a language that is not on the package;
  • Detection: a language filled is filled, when there is no corresponding image;
  • Solution:;

Image issues

No image available

  • Issue: a product does not contain any image, that supportes the data. Thus there is no way to verify if the data is any good;
  • Detection: search for products without images;
  • Solution: these products can be deleted automatically on a regular basis;

Relevant images not available

  • Issue: a product might not contain all (or any) relevant images;
  • Detection:
  • Solution: either new images must be acquired, or another existing image can be assigned by hand;

Image not assigned

  • Issue: a product might contain all the relevant images, but the images are not assigned to the relevant sections (main, ingredient or nutrition). Products that consist of multiple languages might have this issue;
  • Detection: a product with multiple images and non-assignments could be detected;
  • Solution: has to be solved by hand?

Wrong images assigned

  • Issue: A product might have assigned images that are irrelevant. A wrong image can be assigned to the wrong language.
  • Detection: could this be detected by the ocr?
  • Solution: either new images must be acquired, or another existing image can be assigned by hand;

Bad images

  • Issue: a product might have an assigned image that is unusable (unreadable) or shows the product partially.
  • Detection: could this be detected by the ocr?
  • Solution: either new images must be acquired, or another existing image can be assigned.

Interloper images

  • Issue: a product might one (or more images) not relevant to the product. For instance an image of another product, or something totally unrelated.
  • Detection: each product has to be inspected individually by hand.
  • Solution: has to be solved by hand.