Jump to content

OCR: Difference between revisions

863 bytes added ,  21 March 2022
More up to date informations
No edit summary
(More up to date informations)
Line 1: Line 1:
{{Box
OCR (aka Optical Character Recognition) is an important building block for Open Food Facts. As most users feeds in images (convenient from mobile device), we have to extract information from those image, and OCR is one of the best way to do this.{{Box
  | 1    =  Slack channel
  | 1    =  Slack channel
  | 2    =  [https://openfoodfacts.slack.com/messages/ocr/ #ocr]
  | 2    =  [https://openfoodfacts.slack.com/messages/ocr/ #ocr]
}}
}}
== Current state ==  
=== Current state ===  
* On-demand OCR extraction of ingredients using Tesseract 2 (production) and Tesseract 3 (.net) and Google Cloud Vision
* On-demand OCR extraction of ingredients [https://cloud.google.com/vision/overview/docs/ Google Cloud Vision], as google gently give us with free credits.
* Each photo, when uploaded is sent to the OCR<ref group="Code">See [https://github.com/openfoodfacts/openfoodfacts-server/blob/main/scripts/process_new_image_off.sh process_new_image_off.sh]  called through [https://github.com/openfoodfacts/openfoodfacts-server/blob/main/conf/incron.conf incron]</ref>
* you can retrieve the OCR file associated to a photo replacing the <code>.jpg</code> extension in original image by <code>.json</code>
* some old photos might have [[OCR#Tesseract|Tesseract OCR]] output
 
== Archive ==
 
=== Tesseract ===
Previously we where using tesseract
* Uses the French dictionary for all languages
* Uses the French dictionary for all languages
<pre>
<pre>
Line 14: Line 22:
**https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_own_dictionary
**https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_own_dictionary


== Roadmap ==
=== Old Roadmap ===
[[OCR/Roadmap]]
[[OCR/Roadmap]]
== Exploiting OCR results ==
=== Old OCR results ===
[[OCR/Results]]
[[OCR/Results]]
[[Category:OCR]]
[[Category:OCR]]
[[Category:Artificial Intelligence]]
[[Category:Artificial Intelligence]]
212

edits