Taxonomy hierarchies

From Open Food Facts wiki
Revision as of 18:05, 18 December 2023 by Aleene (talk | contribs) (starting rewrite)

A taxonomy can be just a list of entries with corresponding translations and other attributes. This is for instance the case for the languages or countries taxonomy. Often however it is required to look at a group of entries together based on some kind of communality among the members of the group. This grouping is only done on need basis. We do not need to now languages with common origin or the countries belonging to South America.

This page describes some of the commonalities that have been applied.

General principles

There are some guiding principles that are relevant to all taxonomies.

Taxonomy specifics

BELOW IS OLD

The taxonomies

Ingredients taxonomy

The ingredient taxonomy encodes the ingredients found on the product. This should be a one-on-one mapping between the ingredients list and the taxonomy. With this it is possible to filter out products containing a specific ingredient.

There are products where there is no ingredient list or a nicely parsable list with ingredients hidden in the text (even partially), then these can be gathered and entered as ingredients.

By exploiting the hierarchies defined in the ingredients taxonomy, it is possible to filter on a group of ingredients.

Label taxonomy

Labels are claims, logos, and other statements found on products. Sometimes these labels are statements which can be checked against the ingredients. For instance the label no added sugar, should imply that the ingredient sugar is not on the ingredient list. The labels can refer to processes in the value chain, like fair-trade or organic. These labels can be encoded on the ingredients list as well, but is a bit more obscure.

The label can also encode other claimed characteristics, which refer to processes, for example filtered, artisanal, etc.

It is also possible to add storage instructions as labels (lacking better solutions), such as frozen or refrigerated.

The hierarchy is not (yet) much developed, but it is possible to get any organic or any fair-trade label,


Category taxonomy =

The categories taxonomy is more complicated as there is not a clear relationship with what is found on the product. The name of the products is not always well matched to the ingredients. Assigning the correct is a combination of interpreting marketing names, ingredient lists and labels found on the packaging. Often also knowledge of the current taxonomy and its hierarchy is required. And their is also a relationship with legislation, as naming a product is not always free.

There are some basic principles behind the hierarchy found in the taxonomy;

  • understandable - no abstract categories that a user does not understand or encounters in the supermarket aisles;
  • marketing driven - some categories correspond to a category in the users mind, as it has often been used in the marketing message. For instance Bolognaise lasagnas is clear to the user.
  • ingredients driven - the ingredients determine to a large extent the category. This is most obvious for unprocessed single ingredient products;
  • recipe complexity - the products that OFF covers goes from basic food, like potatoes that just come out the ground to fancy ready to eat meals. And everything in-between (not very clear how to cut things up):
    • raw/unprocessed products - mainly from natural/vegetable origin. Maybe some processes are allowed (cleaning/cutting)? There is a very large overlap with NOVA 1 products;
    • extended shelf life products (?) - some processes are allowed in order to extend the shelf life of unprocessed products, like pasteurisation, pre-cooking (for cans), ..?
    • baking involved - prefried products;
    • mixing involved - adding sauces;
    • cooking involved - vegetable mixes, ingredients packages;
    • meals - only heat them in the oven or microwave;
  • geographic origin - the location where a product is produced. This if often used for PGO-products. Example: Olive oils from Italy. Any hierarchy implies a sub- or super-region. it mould be possible to define the category Foods from Italy of Foods from the EU and combine it with the origins taxonomy;
  • legislation - legislation might require an approach to the taxonomy. Thus the usage of juices and nectars is regulated in the EU.
  • assortments / variety packs - is used for products that contain two (or more) sub-products that should belong to multiple categories. These should have a different ingredient and nutritional list for each subproduct. Examples are fruit yoghurts packs with multiple flavours;
  • no doubling - there should be no doubling, i.e. the same concept appearing in multiple taxonomies. For instance why adding frozen pizzas as a new category, when frozen is already available as a label and pizzas as a category;

Convenience categories

There are some categories that can be called convenience categories, as they limit the work a user needs to do in order to classify products.These categories combine two characteristics into one, for example the category potatoes and the label frozen are converged into one category: frozen potatoes.

Implied categories

It is possible to automatically apply a category to a product, based on two or more characteristics, without any intervention of the user. For instance if a herbal tea has the label organic, it can be assigned the category organic herbal teas. This in turn can be used user in any facet. It is possible to set up the rules for this in the taxonomies.

  • alcoholic
  • french -
  • organic
  • pasteurised

Oppositional categories

This a a pair of categories, where one is the opposite of the other. For instance the category sweetened beverages is opposite to the the unsweetened beverages. Both can not be true at the same time. However there will always be a unknown variant as well, die to lack of information;

  • sweetened <> unsweetened
  • alcoholic <> non-alcoholic
  • vegan <> non-vegan
  • pasteurized <> non-pasteurised