OCR: Difference between revisions

From Open Food Facts wiki
(Created page with "== Status == == Roadmap == OCR/Roadmap")
 
No edit summary
Line 1: Line 1:
== Status ==
== Current state ==  
* OCR extraction of Ingredients using Tesseract 2 (production) and 3 (.net)
* Uses the French dictionary for all languages
<pre>
-- /home/off-fr/cgi# grep get_ocr *
Ingredients.pm:use Image::OCR::Tesseract 'get_ocr';
Ingredients.pm: $text =  decode utf8=>get_ocr($image,undef,'fra');
</pre>
* Has a small custom dictionary for French ( /usr/share/tesseract-ocr/tessdata/fra.user-words)
**https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_own_dictionary
 
== Roadmap ==
== Roadmap ==
[[OCR/Roadmap]]
[[OCR/Roadmap]]

Revision as of 12:03, 24 January 2016

Current state

  • OCR extraction of Ingredients using Tesseract 2 (production) and 3 (.net)
  • Uses the French dictionary for all languages
-- /home/off-fr/cgi# grep get_ocr *
Ingredients.pm:use Image::OCR::Tesseract 'get_ocr';
Ingredients.pm: $text =  decode utf8=>get_ocr($image,undef,'fra');

Roadmap

OCR/Roadmap