Global taxonomies: Difference between revisions
No edit summary |
|||
Line 126: | Line 126: | ||
=== Taxonomy architecture === | === Taxonomy architecture === | ||
The taxonomy is not a strict hierarchy: values can have multiple parents. But cycles are not allowed. | The taxonomy is not a strict hierarchy: values can have multiple parents. But cycles are not allowed. | ||
=== Finding new opportunities for taxonomization === | |||
* A good trick to find candidates for taxonomization is | |||
* https://world.openfoodfacts.org/categories?filter=- | |||
* https://world.openfoodfacts.org/labels?filter=- | |||
* https://world.openfoodfacts.org/origins?filter=- | |||
* https://world.openfoodfacts.org/ingredients?filter=- | |||
* Everything in italics is up for grabs | |||
== Format == | == Format == |
Revision as of 11:07, 1 September 2021
Open Food Facts uses global taxonomies for fields such as categories, brands, labels and countries. This page explains how taxonomies work in Open Food Facts and how they can be updated and enhanced.
Quick presentation
Quick presentation of the taxonomies
Features
- A global hierarchy / taxonomy for each type of data field (categories, brands, labels, countries etc.)
- Translations for every language of each field value
- Multiple synonyms for each field value in each language
- Stopwords for each language/field type
Generalities
Taxonomy term
A taxonomy term is the main object of a taxonomy. Its simple form is the term itself with its language as a prefix. Example: en:Authentic Trappist Product
is a term in the labels' taxonomy.
Languages
Each language has a 2-letter prefix. e.g. "en" for English and "fr" for French.
Whenever possible, the canonical language for each field value should be English. e.g. en:soups is the canonical value for the Soups category.
A value can be defined in another language (which becomes the canonical language), e.g. fr:soupes-a-l-oignon
could be the canonical value for "Onion Soups" if we don't have an English translation yet.
New values (e.g. categories that do not exist yet) should have an English canonical value.
Each field value can be translated to any language.
When a field value needs to be translated to a target language, if the translation does not exist yet, English is shown (or the canonical language if the English translation does not exist either).
Remarks
- Which standard is used for the codes? It can be the ISO-639-1 standard, eventually this can be extended the 3-letter codes.
Singular or plural?
Generally, we use the plural for categories but some of them are in singular. We don't put the plural form when it has a different meaning. For example Beef and Beefs. We are talking of the meat and not the animal, so there is no "s". But "Rillettes" in french (and others languages) doesn't have a singular form.
Sometimes plural or singular depends on the language. The situation for dutch is described in Dutch translation issues.
If the category is in plural form, the translations should be in the plural form.
@stephane new proposal (not yet adopted, to be discussed) In the categories taxonomy, always use the plural (en:beers, fr:bières). Then add the singular (especially if it's not a simple rule like removing the final s) in a property: singular:en:beer singular:fr:bière For the ingredients taxonomy, do the reverse: always use the singular for the main entry, and then add the plural: en:tomato plural:en:tomatoes
Synonyms
In each language, each value can have a number of synonyms.
Simple synonyms (simple singular) are done automatically when possible.
Synonyms are recursive: if en:yoghurt
is a synonym of en:yogurt
, then en:banana_yoghurt
will automatically be added as a synonym of en:banana_yogurt
.
Remarks
- What are the simple synonym rules? How does translate to other languages?
- Not that recursion does not work for languages, where the adjectives change based on gender.
Stopwords
Stopwords can be used to further extend synonyms. e.g. if "à" and "la" are stopwords for French, then "Yaourts fraise" will automatically be mapped to "Yaourts à la fraise".
In the ingredients taxonomy, stopwords are also words that can be ignored. For instance contains is not an ingredient.
Description
The description:
prefix allows to describe the term. Example:
en:Fair trade description:en:Fair trade is an arrangement designed to help producers in developing countries achieve sustainable and equitable trade relationships. Members of the fair trade movement add the payment of higher prices to exporters, as well as improved social and environmental standards.
The description is reused on Open Food Facts website. Example: https://world.openfoodfacts.org/label/fair-trade
Image (logo)
The image:
prefix allows to add an image/logo related to the term.
<en:Organic en:Bio Austria de:Bio Austria country:en:Austria image:en:bio-austria.67x90.svg
The description is reused on Open Food Facts website. Example: https://world.openfoodfacts.org/label/bio-austria
Wikidata
The wikidata:
prefix allows to link the term with its equivalent on Wikidata database. Example:
fr:Label Rouge xx:Label Rouge wikidata:en:Q3214309
The description is reused on Open Food Facts website. Example: https://world.openfoodfacts.org/label/label-rouge
Opposite
"we use "opposite" for imports (e.g. when there is a column "Organic" with values like "No")
en:Non-fair trade, Not fair trade
fr:Non issu du commerce équitable
opposite:en: en:fair-trade"
Wikipedia
The wikipedia:
prefix allows to link the term with its equivalent on Wikipedia. Example:
<en:Fair trade en:Fairtrade USA xx:Fairtrade USA wikipedia:en:https://en.wikipedia.org/wiki/Fair_Trade_USA
The wikipedia: property is not reused on Open Food Facts website.
Unused properties
Some people have added more properties in the different taxonomies. They are not used for the moment, but they can lead to interesting usages:
- give more information for the contributor who is editing the taxonomy
- prepare new usages or features
Here is a list of properties already used in the taxonomies:
- wikipedia:
- country:
- label_categories:
- eu_groups:
- auth_name:
- auth_address:
- auth_url:
- exceptions:
Taxonomy architecture
The taxonomy is not a strict hierarchy: values can have multiple parents. But cycles are not allowed.
Finding new opportunities for taxonomization
- A good trick to find candidates for taxonomization is
* https://world.openfoodfacts.org/categories?filter=- * https://world.openfoodfacts.org/labels?filter=- * https://world.openfoodfacts.org/origins?filter=- * https://world.openfoodfacts.org/ingredients?filter=-
- Everything in italics is up for grabs
Format
# stopwords stopwords:en: some,stopwords stopwords:fr: word,that,are,removed,when,matching # synonyms that are not field values but that are contained in field values synonyms:en: global,international en: value, a synonym value, another synonym value fr: valeur, une valeur synonyme, une autre valeur synonyme <en: value en: a child value, a synonym for a child value fr: une valeur enfant, un synonyme d'une valeur enfant <en: value en: another child value <en: a child value <en: another child value en: a grand-child value # properties en: value fr: valeur description:en: a property of value description:fr: french version of the property country_code:en: a property that is the same for all languages -> use English suffix en: wikidata:en:Q89
List of taxonomies
The definitions can be edited on GitHub, they are periodically synchronized on the Open Food Facts database and web site.
Taxonomies
(on Github, account and VCS knowledge needed)
- Test taxonomy showing the basic taxonomy definition features
- Ingredients taxonomy
- Global categories taxonomy
- Global brands and companies taxonomy
- Global labels taxonomy
- Global labels taxonomy logos
- Global languages taxonomy
- Global countries taxonomy
- Global origins taxonomy
- Global additives taxonomy
- Global additives classes taxonomy
- Global vitamins taxonomy
- Global minerals taxonomy
- Global amino acids taxonomy
- Global nucleotides taxonomy
- Global other nutritional substances taxonomy
- Global allergens taxonomy
- Global states taxonomy
- Global NOVA groups taxonomy
Draft Taxonomies
- Global packaging taxonomy
- Global stores taxonomy
- Global Religious Certification taxonomy
- Global Food Preparation taxonomy (related to Project:Microwave)
- Global IGP taxonomy
- Global EC marks taxonomy
Building and deploying taxonomies
Changes to taxonomies on GitHub are not deployed instantly, the need to be built, deployed, and products need to be re-processed with the new taxonomy.
More info
For detailed information specific for the ingredients taxonomy see Ingredients Ontology.