Student projects/GSOC/Proposals: Difference between revisions

Student projects/GSOC/Proposals (view source)

521 bytes added , 6 February 2019

1,082

edits

@@ Line 116: / Line 116: @@
 Background: We have started in the past year to ramp up effort, and we have processed 1,5 million images with OCR and general entity, barcode and QR-code recognition. The result is 1,5 million matching JSON files with bounding boxes.
-* '''Slack channels: #ai-machinelearning'''
+* '''Slack channels: #ai-machinelearning #spellcheck'''
 * '''Github AI / machine learning: openfoodfacts-ai'''
-=== Automatically classify products ===
+=== Ingredients spellcheck ===
+* Ingredients lists from OCR very often contain errors that could be easily corrected if we build dedicated models for ingredients lists
+* We already have a large amount of correct ingredients lists in many languages that we could use to build dictionaries, compute frequencies, ngrams etc.
+* The solution needs to be easily retrained for new languages and for new training data so that it can continue to improve
+=== Data extraction from OCR and other field values ===
 * Detect field values from other field values or bag of words from the OCR
@@ Line 125: / Line 131: @@
 ** Brands (in some cases, a strong feature can be the barcode prefix)
 ** Labels
-* When certain, detected values can be applied immediately
+* When precision is very high (99%), we can apply the results directly
-* When less certain, we can ask users to confirm suggestions
+* For slightly lower precision, we can offer suggestions to users and ask them to confirm them
 === Automatically detect errors ===