Product Opener/Installation/Issues

From Open Food Facts wiki
Revision as of 10:27, 20 January 2015 by Stephane (talk | contribs) (tesseract issue (prod is using v2, debian 7 installs v3))

Dotted fields in MongoDB

Problem:

MongoDB 2.6 complains about fields that contains dots. Before 2015, Product Opener stored some backup versions of some fields by appending a dot and some string.

"The dotted field 'countries.20131226' in 'countries.20131226' is not valid for storage."

Resolution:

Those fields can be safely removed. (done in update_all_products_from_dir_in_mongodb.pl)

List of tag values (e.g. list of categories)

Problem:

With MongoDB 2.6, the aggregate command used to list all values of tags such as categories does not return the same structure.

in Display.pm:

        eval {
                $results = $products_collection->aggregate( $aggregate_parameters );
        };
 -e: Not an ARRAY reference at /home/off/cgi//Blogs/Display.pm line 789.
display.pl : query_string: /lieux-de-fabrication
analyze_request : query_string 1 : /lieux-de-fabrication
analyze_request : query_string 2 : /lieux-de-fabrication
analyze_request : query_string 3 : lieux-de-fabrication
Display::analyze_request - last component - lieux-de-fabrication - plural? manufacturing_places
Display::analyze_request - list of tags - groupby: manufacturing_places
Display::analyze_request - lc: fr lang: fr text:  - product:  - tagtype/tagid: / - tagtype2/tagid2: / - groupby: manufacturing_places
display.pl blogid:  tagid:  urlsdate:  urlid:  user:  query:
Display.pm - display_list_of_tags - query:
$VAR1 = {
          'countries_tags' => 'en:france',
          '_tags' => undef
        };

Display.pm - display_list_of_tags - aggregate_parameters:
$VAR1 = [
          {
            '$match' => {
                          'countries_tags' => 'en:france',
                          '_tags' => undef
                        }
          },
          {
            '$unwind' => '$manufacturing_places_tags'
          },
          {
            '$group' => {
                          'count' => {
                                       '$sum' => 1
                                     },
                          '_id' => '$manufacturing_places_tags'
                        }
          },
          {
            '$sort' => {
                         'count' => -1
                       }
          }
        ];

Display.pm - display_list_of_tags - aggregate query done
Display.pm - display_list_of_tags - results:
$VAR1 = bless( {
                 '_database' => bless( {
                                         '_connection' => bless( {
                                                                   'w' => 1,
                                                                   'query_timeout' => 30000,
                                                                   'find_master' => 0,
                                                                   'db_name' => 'admin',
                                                                   'auto_reconnect' => 1,
                                                                   '_servers' => {},
                                                                   'ts' => 0,
                                                                   'right_port' => 27017,
                                                                   'wtimeout' => 1000,
                                                                   'port' => 27017,
                                                                   'left_port' => 27017,
                                                                   'host' => 'localhost:27017',
                                                                   'max_bson_size' => 16777216,
                                                                   'timeout' => 20000,
                                                                   'auto_connect' => 1
                                                                 }, 'MongoDB::Connection' ),
                                         'name' => 'off'
                                       }, 'MongoDB::Database' ),
                 'name' => 'products.aggregate'
               }, 'MongoDB::Collection' );

[Tue Jan 20 10:23:20 2015] [error] [Tue Jan 20 10:23:20 2015] -e: Not an ARRAY reference at /home/off/cgi//Blogs/Display.pm line 789.\n

Expected:

Display.pm - display_list_of_tags - aggregate query done
Display.pm - display_list_of_tags - results:
$VAR1 = [
          {
            'count' => 3055,
            '_id' => 'france'
          },
          {
            'count' => 542,
            '_id' => 'bretagne'
          },
..


Cause : old version of MongoDB Perl Module

Debian 7.6 has a very old version of the module:

root@ns431999:/home/off/logs# cat /etc/debian_version
7.6
root@ns431999:/home/off/logs# apt-get install libmongodb-perl
Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances
Lecture des informations d'état... Fait
libmongodb-perl est déjà la plus récente version disponible.
0 mis à jour, 0 nouvellement installés, 0 à enlever et 75 non mis à jour.
root@ns431999:/home/off/logs# perl -MMongoDB -e 'print $MongoDB::VERSION . "\n"'
0.45

Solution: use CPAN

cpan install MongoDB

root@ns431999:/home/off/logs# perl -MMongoDB -e 'print $MongoDB::VERSION . "\n"'
v0.707.2.0


Tesseract OCR / segmentation fault

sh : ligne 1 : 18042 Erreur de segmentation  /usr/bin/tesseract /home/off/html/images/products/324/541/299/9925/ingredients.8.full.jpg.tmp.14217494077210.tif /home/off/html/images/products/324/541/299/9925/ingredients.8.full.jpg.tmp.14217494077210.tif -l fra 2> /dev/null

Running manually:

 tesseract --version
tesseract 3.02

off@ns431999:~/cgi$ /usr/bin/tesseract /home/off/html/images/products/324/541/299/9925/ingredients.8.full.jpg.tmp.14217494077210.tif /home/off/html/images/products/324/541/299/9925/ingredients.8.full.jpg.tmp.14217494077210.tif -l fra
Error opening data file /usr/share/tesseract-ocr/tessdata/fra.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'fra'
Tesseract couldn't load any languages!
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 0
Erreur de segmentation

As of January 2015, OFF is running an older version of tesseract:

 apt-cache show tesseract-ocr | grep -i version
Version: 2.04-2+squeeze1