Verification/Roadmap: Difference between revisions
m (Teolemon moved page Project:Error detection to Error detection/Roadmap) |
No edit summary |
||
(6 intermediate revisions by 2 users not shown) | |||
Line 5: | Line 5: | ||
== Ideas to automatically detect products that contain errors entering information == | == Ideas to automatically detect products that contain errors entering information == | ||
=== Rules === | |||
* Sum of components of nutritional table well above 100g | * Sum of components of nutritional table well above 100g | ||
* Sum of ingredients well over 100g (detection and complicated products ) | * Sum of ingredients well over 100g (detection and complicated products ) | ||
Line 16: | Line 16: | ||
* Quantity of "sugars" greater than the amount of "carbohydrate" | * Quantity of "sugars" greater than the amount of "carbohydrate" | ||
* Sum of fatty acids greater than the amount of "lipids" | * Sum of fatty acids greater than the amount of "lipids" | ||
* Use the EU Organic codes to check whether the certification country is the same as the production country | |||
* Year in a label is 2 or more greater than best before. | |||
* etc. | * etc. | ||
=== Methods === | |||
* Consistency checks within a category | |||
* Consistency checks with logical or biological rules | |||
* Consistency checks with Laws | |||
* Consistency checks with outside corpora | |||
* Consistency checks of contributed data with [[OCR]] results | |||
* [[Bots]] : several bots already exist. | |||
== Note == | == Note == | ||
Each track (each rule) must have a unique number associated clearly: Once an accurate method of error detection is translated computer (is implemented), then this method becomes a rule. The automatically assigned number to this rule allows contributors to designate it unambiguously. | Each track (each rule) must have a unique number associated clearly: Once an accurate method of error detection is translated computer (is implemented), then this method becomes a rule. The automatically assigned number to this rule allows contributors to designate it unambiguously. | ||
[[category: | [[category:Project]] | ||
[[category:Verification]] | |||
[[category:Roadmap]] | [[category:Roadmap]] | ||
[[category:Data quality]] |
Latest revision as of 12:19, 30 July 2024
Objectives
- Detect potential errors in order to correct them and prevent them
- Do not take into account the products whose information is questionable in the calculations of averages
Ideas to automatically detect products that contain errors entering information
Rules
- Sum of components of nutritional table well above 100g
- Sum of ingredients well over 100g (detection and complicated products )
- Confusion between "l" ( lowercase letter L ) and 1 (one) in the list of ingredients because of the OCR, which gives: nom_de_l'ingrédient l5 %
- Sum of " that sugar / starch / etc . " than carbohydrates (and the same for fat etc.).
- Energy over 4000 kJ
- Height from the upper portion to the size of package
- "CO2 Carbon Footprint" greater than 3 000g
- Over 12 additives
- Quantity of "sugars" greater than the amount of "carbohydrate"
- Sum of fatty acids greater than the amount of "lipids"
- Use the EU Organic codes to check whether the certification country is the same as the production country
- Year in a label is 2 or more greater than best before.
- etc.
Methods
- Consistency checks within a category
- Consistency checks with logical or biological rules
- Consistency checks with Laws
- Consistency checks with outside corpora
- Consistency checks of contributed data with OCR results
- Bots : several bots already exist.
Note
Each track (each rule) must have a unique number associated clearly: Once an accurate method of error detection is translated computer (is implemented), then this method becomes a rule. The automatically assigned number to this rule allows contributors to designate it unambiguously.