Verification/Roadmap: Difference between revisions
No edit summary |
|||
Line 25: | Line 25: | ||
* Consistency checks of contributed data with [[OCR]] results | * Consistency checks of contributed data with [[OCR]] results | ||
* Bots | * [[Bots]] : several bots already exist. | ||
== Note == | == Note == |
Revision as of 16:28, 24 January 2016
Objectives
- Detect potential errors in order to correct them and prevent them
- Do not take into account the products whose information is questionable in the calculations of averages
Ideas to automatically detect products that contain errors entering information
Rules
- Sum of components of nutritional table well above 100g
- Sum of ingredients well over 100g (detection and complicated products )
- Confusion between "l" ( lowercase letter L ) and 1 (one) in the list of ingredients because of the OCR, which gives: nom_de_l'ingrédient l5 %
- Sum of " that sugar / starch / etc . " than carbohydrates (and the same for fat etc.).
- Energy over 4000 kJ
- Height from the upper portion to the size of package
- "CO2 Carbon Footprint" greater than 3 000g
- Over 12 additives
- Quantity of "sugars" greater than the amount of "carbohydrate"
- Sum of fatty acids greater than the amount of "lipids"
- Use the EU Organic codes to check whether the certification country is the same as the production country
- etc.
Methods
- Consistency checks within a category
- Consistency checks with logical or biological rules
- Consistency checks with Laws
- Consistency checks with outside corpora
- Consistency checks of contributed data with OCR results
- Bots : several bots already exist.
Note
Each track (each rule) must have a unique number associated clearly: Once an accurate method of error detection is translated computer (is implemented), then this method becomes a rule. The automatically assigned number to this rule allows contributors to designate it unambiguously.