Jump to content

OCR: Difference between revisions

126 bytes removed ,  13 December 2022
m
no edit summary
(Describe OCR dumps)
mNo edit summary
Line 1: Line 1:
OCR (aka Optical Character Recognition) is an important building block for Open Food Facts. As most users feeds in images (convenient from mobile device), we have to extract information from those image, and OCR is one of the best way to do this.{{Box
OCR (aka Optical Character Recognition) is an important building block for Open Food Facts. As most users feeds in images (convenient from mobile device), we have to extract information from those image, and OCR is one of the best way to do this.
| 1    =  Slack channel
| 2    =  [https://openfoodfacts.slack.com/messages/ocr/ #ocr]
}}
=== Current state ===  
=== Current state ===  
* On-demand OCR extraction of ingredients [https://cloud.google.com/vision/overview/docs/ Google Cloud Vision], as google gently give us with free credits.
* On-demand OCR extraction of ingredients [https://cloud.google.com/vision/overview/docs/ Google Cloud Vision], as Google gently give us with free credits.
* Each photo, when uploaded is sent to the OCR<ref group="Code">See [https://github.com/openfoodfacts/openfoodfacts-server/blob/main/scripts/process_new_image_off.sh process_new_image_off.sh]  called through [https://github.com/openfoodfacts/openfoodfacts-server/blob/main/conf/incron.conf incron]</ref>
* Each photo, when uploaded is sent to the OCR<ref group="Code">See [https://github.com/openfoodfacts/openfoodfacts-server/blob/main/scripts/process_new_image_off.sh process_new_image_off.sh]  called through [https://github.com/openfoodfacts/openfoodfacts-server/blob/main/conf/incron.conf incron]</ref>
* you can retrieve the OCR file associated to a photo replacing the <code>.jpg</code> extension in original image by <code>.json</code>
* you can retrieve the OCR file associated to a photo replacing the <code>.jpg</code> extension in original image by <code>.json</code>




Line 18: Line 16:
* <code>content</code>: the OCR response returned by Google Cloud API.
* <code>content</code>: the OCR response returned by Google Cloud API.
* <code>created_at</code>: timestamp of last modification.
* <code>created_at</code>: timestamp of last modification.




Line 35: Line 34:
[[Category:OCR]]
[[Category:OCR]]
[[Category:Artificial Intelligence]]
[[Category:Artificial Intelligence]]
<references group="Code" />