How to build and deploy taxonomies

From Open Food Facts wiki
Revision as of 18:52, 11 June 2020 by Stephane (talk | contribs) (Instructions to build and deploy taxonomies)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

Open Food Facts uses Global taxonomies (multilingual hierarchies) for categories, labels, ingredients, additives and many other product facets.

Taxonomy files

Source

The taxonomies are defined in text files (e.g. labels.txt) which are kept /taxonomy directory on GitHub

Build

When the source text file of a category is updated, it needs to be built in a structured representation, stored in Perl binary .sto files (e.g. labels.result.sto).

Built taxonomies are also stored on GitHub, but they may not be up-to-date if the source file has changed and the taxonomy has not been rebuilt.

JSON export

The build taxonomy is also exported to a JSON structure (e.g. labels.json).

Building taxonomies

The build_tags_taxonomy.pl is used to compile the taxonomy source file in a built taxonomy.

cd scripts
export PERL5LIB=.
./build_tags_taxonomy.pl labels publish

The built taxonomy files will be stored in the taxonomy directory, along with the source files.

Deploying taxonomies

Stop and start the web site backend to reload taxonomies

The Open Food Facts web site backend (Apache + mod_perl) needs to be stopped and started for the new taxonomies to be loaded.

Update products with the new taxonomies

Recompute facets

Taxonomies correspond to product facets that are stored in MongoDB.

e.g. for labels, the labels tag is parsed with the labels taxonomy and it populates the labels_tags field, which is an array of canonical entries like "en:organic".

To recompute the facets corresponding to the taxonomy, we need to update all products.

This script updates all products in MongoDB and on the file server, it must be run as the off user, or products won't be editable.

sudo su off
cd scripts
export PERL5LIB=.
nice ./update_all_products.pl --fields labels --key labels-20200611

The key field is used to tag updated products, so that we don't have to go through every product if the script is killed.

Ingredients analysis reprocessing

Some taxonomies are used for ingredients processing: ingredients.txt, ingredients_processing.txt, additives.txt, labels.txt, vitamins.txt, minerals.txt.

To re-process the ingredients analysis:

This script updates all products in MongoDB and on the file server, it must be run as the off user, or products won't be editable.

sudo su off
cd scripts
export PERL5LIB=.
nice ./update_all_products.pl --process-ingredients --key labels-20200611

The key field is used to tag updated products, so that we don't have to go through every product if the script is killed.

Note: in production, with 1.5 million products, it can take multiple days to re-process all products.