Internationalization/Multilingual products: Difference between revisions

From Open Food Facts wiki
No edit summary
(more details about nutrition facts tables (EU vs US) and the "main" language)
Line 28: Line 28:
=== Data entry ===
=== Data entry ===


* There is a field to indicate the "main" language of a product
* We currently have a field to indicate the "main" language of a product
** The intention of this field was to put the language that is the most prominent on the label.
*** There are a few products where the split is 50% / 50% but those are not very common.
** Values entered in the product edit form are considered to be in the "main" language of the product
** Values entered in the product edit form are considered to be in the "main" language of the product
** Values entered in fields for which we have a taxonomy (categories, countries, labels, traces) are mapped according to the "main" language of the product
** Values entered in fields for which we have a taxonomy (categories, countries, labels, traces) are mapped according to the "main" language of the product
*** e.g. entering "jus de fruits" in the categories field when the main language is set to French will result in en:fruit-juices to be assigned.
*** e.g. entering "jus de fruits" in the categories field when the main language is set to French will result in en:fruit-juices to be assigned.
* Values for nutrition facts are global and assigned to a canonical field
* Values for nutrition facts are global and assigned to a canonical field
** Except when the nutrient's name is unknown (i.e. not in our current nutrient taxonomy)
** Except when the nutrient's name is unknown (i.e. not in our current nutrient taxonomy)
* The default nutrients shown depend on the country (EU nutrition table vs US/CA nutrition table)
* Images are selected/cropped for the "main" language of the product (product front, ingredients and nutrition facts)
* Images are selected/cropped for the "main" language of the product (product front, ingredients and nutrition facts)


Line 42: Line 47:
** Only if the target language exists for this category in the taxonomy
** Only if the target language exists for this category in the taxonomy
*** If the target language doesn't exist, English is used to display the field value
*** If the target language doesn't exist, English is used to display the field value
* Nutrition facts are displayed in the target language
* Nutrition facts are displayed in the target language
** The nutrition facts are displayed according to the country (different order and presentation for EU nutrition tables vs US/CA nutrition tables)
* Fields without a taxonomy are displayed in the language they were entered in
* Fields without a taxonomy are displayed in the language they were entered in
** common name, quantity, packaging, brands, origin of ingredients, manufacturing or processing places, city/state/country, stores, link to the product page, best before date
** common name, quantity, packaging, brands, origin of ingredients, manufacturing or processing places, city/state/country, stores, link to the product page, best before date
** ingredients  
** ingredients  
* Images for product front, ingredients and nutrition facts are for the main language of the product (and not the target language)
* Images for product front, ingredients and nutrition facts are for the main language of the product (and not the target language)


Line 57: Line 66:
** Packaging, origins of ingredients, purchase places
** Packaging, origins of ingredients, purchase places
*** Note: packaging and purchase places do not correspond to text on the label
*** Note: packaging and purchase places do not correspond to text on the label
* Enable users to enter data for more than one language:
* Enable users to enter data for more than one language:
** Select different images or different part of the images for product front, ingredients and nutrition facts
** Select different images or different part of the images for product front, ingredients and nutrition facts
** Enter data (text) for more than one language
** Enter data (text) for more than one language
* We keep the "main language" field
** Indicates the most prominent language on the label
** When the split is 50% / 50% (e.g. some products sold in Belgium with one side in Dutch and another side in French), picking either is fine.


=== Data display ===
=== Data display ===
Line 99: Line 113:


* Keep all exisiting fields as-is
* Keep all exisiting fields as-is
** including the "main language" field
** e.g. ingredients, generic_name etc.
** e.g. ingredients, generic_name etc.
** "ingredients_text":"Chocolate, milk"
** "ingredients_text":"Chocolate, milk"

Revision as of 15:21, 21 September 2015

Multilingual products

This page is to discuss how to handle products that have text in multiple languages on their label.

Problems addressed

  • Single products with more than one language on the label
    • Bilingual products
      • Products where all the texts on the label are in two (or more) languages
      • Common in bilingual countries (Belgium, Canada...)
      • e.g. most Cora products have all the text in French and in Dutch: http://world.openfoodfacts.org/brand/cora
  • Products with some information in multiple languages
    • e.g. ingredients and nutrition facts in many languages
    • ingredients.8.400.jpg

Problems not addressed

  • Products that have different labels in different languages, with the same barcode
  • Products that have different labels in different languages, with a different barcode
    • we simply store them as different products
    • we might consider linking the products in some way in a future project
  • Translating ingredients
    • will be addressed by ingredients taxonomy

Current status

Data entry

  • We currently have a field to indicate the "main" language of a product
    • The intention of this field was to put the language that is the most prominent on the label.
      • There are a few products where the split is 50% / 50% but those are not very common.
    • Values entered in the product edit form are considered to be in the "main" language of the product
    • Values entered in fields for which we have a taxonomy (categories, countries, labels, traces) are mapped according to the "main" language of the product
      • e.g. entering "jus de fruits" in the categories field when the main language is set to French will result in en:fruit-juices to be assigned.
  • Values for nutrition facts are global and assigned to a canonical field
    • Except when the nutrient's name is unknown (i.e. not in our current nutrient taxonomy)
  • The default nutrients shown depend on the country (EU nutrition table vs US/CA nutrition table)
  • Images are selected/cropped for the "main" language of the product (product front, ingredients and nutrition facts)

Data display

  • Fields for which we have a taxonomy (categories, countries, labels, traces) are displayed in the target language. (target language is set by the subdomain, e.g. using http://world.openfoodfacts.org , we will see categories in English even for French products, using http://es.openfoodfacts.org will result in Spanish)
    • Only if the category exists in the taxonomy
    • Only if the target language exists for this category in the taxonomy
      • If the target language doesn't exist, English is used to display the field value
  • Nutrition facts are displayed in the target language
    • The nutrition facts are displayed according to the country (different order and presentation for EU nutrition tables vs US/CA nutrition tables)
  • Fields without a taxonomy are displayed in the language they were entered in
    • common name, quantity, packaging, brands, origin of ingredients, manufacturing or processing places, city/state/country, stores, link to the product page, best before date
    • ingredients
  • Images for product front, ingredients and nutrition facts are for the main language of the product (and not the target language)


Solution

Data entry

  • Taxonomize as many fields as possible
    • Taxonomized field need to be completed in only one language
    • Packaging, origins of ingredients, purchase places
      • Note: packaging and purchase places do not correspond to text on the label
  • Enable users to enter data for more than one language:
    • Select different images or different part of the images for product front, ingredients and nutrition facts
    • Enter data (text) for more than one language
  • We keep the "main language" field
    • Indicates the most prominent language on the label
    • When the split is 50% / 50% (e.g. some products sold in Belgium with one side in Dutch and another side in French), picking either is fine.

Data display

  • Taxonomized fields will be displayed in the target language
  • Display other fields and pictures in the target language if the product has data in that language
  • Indicate in which languages the product data is available, and provide a way to see it


Interface design

  • How to make data entry not too overwhelming?
  • Solution 1: tabs to switch between languages
    • + button to add a language tab
    • tab displays only the fields that need to be completed in other languages:
      • images
      • fiels like generic name, ingredients
  • Solution 2: multiply each field by the number of languages, display all languages for one field together
    • + button to dynamically add a new language

Technical design

Taxonomize all or most multilingual tag fields

  • Packaging, origins of ingredients and purchase places are tag fields (they contain comma separated values) that are not yet taxonomized
  • The existing taxonomy system will take care of the mapping between different languages
  • The taxonomies need to be created or completed
  • To be determined: what to do with brands?
    • Typically not translated
      • But exceptions exist
    • Could benefit from being taxonomized in order to have a hierarchy


Selection/crop of images in more than one language

  • For product, ingredients and nutrition facts

Entry of fields in more than one language

  • Keep all exisiting fields as-is
    • including the "main language" field
    • e.g. ingredients, generic_name etc.
    • "ingredients_text":"Chocolate, milk"
    • For compatibility and to enable incremental implementation and deployment
  • Create new fields suffixed by _lang, with a hash
    • "ingredients_text_lang":{"en":"Chocolate, milk", "fr":"Chocolat, lait"}
  • Create new languages field
    • Hash that contains languages for which we have values for at least one field
    • "languages":{"en":"1", "fr":"1"}