Ingredients Extraction and Analysis: Difference between revisions

Ingredients Extraction and Analysis (view source)

Revision as of 14:07, 10 September 2019

2,338 bytes added , 10 September 2019

no edit summary

VisualWikitext

Stephane

Bureaucrats, Administrators

1,082

edits

@@ Line 47: / Line 47: @@
 ** Perfect quality
 ** Needs cropping and/or rotation to select the ingredients list
+=== Steps for ingredients lists ===
+==== Picture taking ====
+* Taken with mobile app, uploaded to OFF server
+==== Ingredients list cropping ====
+* Done on mobile app just after picture taking
+** Cropping may be very inaccurate
+* Or done on web site at a later time, possibly by another user
+** Cropping slightly easier than on mobile
+==== OCR ====
+* Launched after cropping, done by the server which calls Google Cloud Vision
+* Cloud Vision returns a JSON object which is stored on the server
+==== Ingredients list cutting ====
+* The image sent by the OCR can also contain other text content
+** Things that are not ingredients
+** Ingredients in other languages
+** The word "Ingredients:"
+* Current solution
+** Hardcoded regular expressions
+* Other possible solutions
+** Language identification to remove other languages
+* Metrics
+** False negatives (words before or after the ingredients list that should have been removed)
+** False positives (words that were removed but are part of the ingredients and should have been kept)
+*** It is very important to have as few false positives as possible as it destructs data
+* Test and training sets
+** Only a few adhoc tests run during builds
+** Test sets needs to be created
+==== Validation and/or correction by users ====
+* Current solution:
+** Users on the app or the web site are shown the OCR result
+** OCR result is not applied if not validated by the user
+** but users tend to validate lists without changes even if there are errors, especially on mobile
+* Other possible solutions
+** Use the result of ingredient analysis to show users ingredients that were not recognized
+** Show spell suggestions
+==== Spell correction ====
+* Current solution:
+** Currently only done during ingredients analysis, not during ingredients extraction
+** Very simple (and slow) implementation of Peter Norvig algorithm
+* Other possible solutions
+** Spell checkers trained on ingredients
+*** Elastic search spellchecker
+*** Simspell
+* Metrics
+** Recall and precision
+* Test and training sets
+** Language models can be build with lists of ingredients from OFF
+*** e.g. including only ingredients lists from producers, or lists for which we have a very high ingredients recognition rate
+** Test sets need to be created
+*** Run spellcheckers on actual ingredients lists from OFF, review corrections