How to build and deploy taxonomies: Difference between revisions
(Instructions to build and deploy taxonomies) Â |
No edit summary |
||
Line 76: | Line 76: | ||
Note: in production, with 1.5 million products, it can take multiple days to re-process all products. | Note: in production, with 1.5 million products, it can take multiple days to re-process all products. | ||
= Incorporating translations made from the web = | |||
'''Translations are stored in files on the server''' | |||
<pre> | |||
off1:/srv/off/translate# ls -lrt | |||
total 164 | |||
-rw-r--r-- 1 off off 4100 Mar 26 19:42 ingredients.nl.txt | |||
drwxr-xr-x 2 root root 4096 Mar 29 10:20 applied.20190329 | |||
-rw-r--r-- 1 off off  619 Mar 29 16:17 ingredients.de.txt | |||
-rw-r--r-- 1 off off 9472 Apr 1 18:21 ingredients.fr.txt | |||
drwxr-xr-x 2 root root 4096 Apr 9 19:16 applied.20190409 | |||
-rw-r--r-- 1 off off  659 Apr 9 22:12 labels.hu.txt | |||
-rw-r--r-- 1 off off  924 Apr 11 08:54 nova_groups.ca.txt | |||
-rw-r--r-- 1 off off 6699 Apr 19 00:27 categories.hu.txt | |||
-rw-r--r-- 1 off off  176 Apr 19 17:10 categories.zh.txt | |||
-rw-r--r-- 1 off off  710 Apr 19 17:25 labels.zh.txt | |||
-rw-r--r-- 1 off off 2392 Apr 21 23:22 labels.pl.txt | |||
-rw-r--r-- 1 off off 9864 Apr 24 11:52 categories.ca.txt | |||
-rw-r--r-- 1 off off 13479 Apr 24 13:40 labels.ca.txt | |||
-rw-r--r-- 1 off off 2141 Apr 25 10:22 labels.he.txt | |||
-rw-r--r-- 1 off off 1008 May 3 13:06 categories.pl.txt | |||
-rw-r--r-- 1 off off  616 May 5 22:35 categories.it.txt | |||
-rw-r--r-- 1 off off 8493 May 8 21:34 categories.de.txt | |||
-rw-r--r-- 1 off off 2488 May 9 16:09 categories.fr.txt | |||
-rw-r--r-- 1 off off 2213 May 9 16:13 labels.de.txt | |||
-rw-r--r-- 1 off off  955 May 9 16:15 labels.fr.txt | |||
-rw-r--r-- 1 off off 34333 May 9 18:52 categories.nl.txt | |||
</pre> | |||
== Steps == | |||
'''Try this on the test server first.''' | |||
=== Add the translations === | |||
* ''/srv/off/scripts# ./add_users_translations_to_taxonomy.pl categories > /home/off/openfoodfacts-server/taxonomies/categories.txt'' | |||
* Review the diffs: ''git diff'' (there should be mostly additions) | |||
* Commit and push | |||
* Move the applied translations to a new folder | |||
* ''/srv/off/translate# mkdir applied.20190513'' | |||
* ''/srv/off/translate# mv categories.* applied.20190513/'' | |||
=== Build the taxonomy === | |||
* as root (sudo su) | |||
* ''cp -a /home/off/openfoodfacts-server/taxonomies/categories.txt /srv/off/taxonomies/'' | |||
* ''export PERL5LIB=.'' | |||
* ''/srv/off/scripts# ./build_tags_taxonomy.pl categories publish'' | |||
* Wait. Some taxonomies like categories can take 30 minutes to build. | |||
=== Stop and start Apache (not a restart) === | |||
* ''systemctl stop apache2@off'' | |||
* ''systemctl start apache2@off'' | |||
=== Apply the new taxonomy to existing products === | |||
* load https://world.openfoodfacts.org/categories in a browser tab, so that you can check big differences at the top of the list | |||
* as user off (VERY IMPORTANT) | |||
* ''export PERL5LIB=.'' | |||
* ''/srv/off/scripts$ ./update_all_products.pl --key update_categories_taxonomy --fields categories'' | |||
* load https://world.openfoodfacts.org/categories in a new tab, look to see if there are big differences |
Revision as of 12:06, 12 June 2020
Introduction
Open Food Facts uses Global taxonomies (multilingual hierarchies) for categories, labels, ingredients, additives and many other product facets.
Taxonomy files
Source
The taxonomies are defined in text files (e.g. labels.txt) which are kept /taxonomy directory on GitHub
Build
When the source text file of a category is updated, it needs to be built in a structured representation, stored in Perl binary .sto files (e.g. labels.result.sto).
Built taxonomies are also stored on GitHub, but they may not be up-to-date if the source file has changed and the taxonomy has not been rebuilt.
JSON export
The build taxonomy is also exported to a JSON structure (e.g. labels.json).
Building taxonomies
The build_tags_taxonomy.pl is used to compile the taxonomy source file in a built taxonomy.
cd scripts export PERL5LIB=. ./build_tags_taxonomy.pl labels publish
The built taxonomy files will be stored in the taxonomy directory, along with the source files.
Deploying taxonomies
Stop and start the web site backend to reload taxonomies
The Open Food Facts web site backend (Apache + mod_perl) needs to be stopped and started for the new taxonomies to be loaded.
Update products with the new taxonomies
Recompute facets
Taxonomies correspond to product facets that are stored in MongoDB.
e.g. for labels, the labels tag is parsed with the labels taxonomy and it populates the labels_tags field, which is an array of canonical entries like "en:organic".
To recompute the facets corresponding to the taxonomy, we need to update all products.
This script updates all products in MongoDB and on the file server, it must be run as the off user, or products won't be editable.
sudo su off cd scripts export PERL5LIB=. nice ./update_all_products.pl --fields labels --key labels-20200611
The key field is used to tag updated products, so that we don't have to go through every product if the script is killed.
Ingredients analysis reprocessing
Some taxonomies are used for ingredients processing: ingredients.txt, ingredients_processing.txt, additives.txt, labels.txt, vitamins.txt, minerals.txt.
To re-process the ingredients analysis:
This script updates all products in MongoDB and on the file server, it must be run as the off user, or products won't be editable.
sudo su off cd scripts export PERL5LIB=. nice ./update_all_products.pl --process-ingredients --key labels-20200611
The key field is used to tag updated products, so that we don't have to go through every product if the script is killed.
Note: in production, with 1.5 million products, it can take multiple days to re-process all products.
Incorporating translations made from the web
Translations are stored in files on the server
off1:/srv/off/translate# ls -lrt total 164 -rw-r--r-- 1 off off 4100 Mar 26 19:42 ingredients.nl.txt drwxr-xr-x 2 root root 4096 Mar 29 10:20 applied.20190329 -rw-r--r-- 1 off off 619 Mar 29 16:17 ingredients.de.txt -rw-r--r-- 1 off off 9472 Apr 1 18:21 ingredients.fr.txt drwxr-xr-x 2 root root 4096 Apr 9 19:16 applied.20190409 -rw-r--r-- 1 off off 659 Apr 9 22:12 labels.hu.txt -rw-r--r-- 1 off off 924 Apr 11 08:54 nova_groups.ca.txt -rw-r--r-- 1 off off 6699 Apr 19 00:27 categories.hu.txt -rw-r--r-- 1 off off 176 Apr 19 17:10 categories.zh.txt -rw-r--r-- 1 off off 710 Apr 19 17:25 labels.zh.txt -rw-r--r-- 1 off off 2392 Apr 21 23:22 labels.pl.txt -rw-r--r-- 1 off off 9864 Apr 24 11:52 categories.ca.txt -rw-r--r-- 1 off off 13479 Apr 24 13:40 labels.ca.txt -rw-r--r-- 1 off off 2141 Apr 25 10:22 labels.he.txt -rw-r--r-- 1 off off 1008 May 3 13:06 categories.pl.txt -rw-r--r-- 1 off off 616 May 5 22:35 categories.it.txt -rw-r--r-- 1 off off 8493 May 8 21:34 categories.de.txt -rw-r--r-- 1 off off 2488 May 9 16:09 categories.fr.txt -rw-r--r-- 1 off off 2213 May 9 16:13 labels.de.txt -rw-r--r-- 1 off off 955 May 9 16:15 labels.fr.txt -rw-r--r-- 1 off off 34333 May 9 18:52 categories.nl.txt
Steps
Try this on the test server first.
Add the translations
- /srv/off/scripts# ./add_users_translations_to_taxonomy.pl categories > /home/off/openfoodfacts-server/taxonomies/categories.txt
- Review the diffs: git diff (there should be mostly additions)
- Commit and push
- Move the applied translations to a new folder
- /srv/off/translate# mkdir applied.20190513
- /srv/off/translate# mv categories.* applied.20190513/
Build the taxonomy
- as root (sudo su)
- cp -a /home/off/openfoodfacts-server/taxonomies/categories.txt /srv/off/taxonomies/
- export PERL5LIB=.
- /srv/off/scripts# ./build_tags_taxonomy.pl categories publish
- Wait. Some taxonomies like categories can take 30 minutes to build.
Stop and start Apache (not a restart)
- systemctl stop apache2@off
- systemctl start apache2@off
Apply the new taxonomy to existing products
- load https://world.openfoodfacts.org/categories in a browser tab, so that you can check big differences at the top of the list
- as user off (VERY IMPORTANT)
- export PERL5LIB=.
- /srv/off/scripts$ ./update_all_products.pl --key update_categories_taxonomy --fields categories
- load https://world.openfoodfacts.org/categories in a new tab, look to see if there are big differences