Internationalization/Multilingual products

From Open Food Facts wiki
Revision as of 15:21, 21 September 2015 by Stephane (talk | contribs) (more details about nutrition facts tables (EU vs US) and the "main" language)

Multilingual products

This page is to discuss how to handle products that have text in multiple languages on their label.

Problems addressed

  • Single products with more than one language on the label
    • Bilingual products
      • Products where all the texts on the label are in two (or more) languages
      • Common in bilingual countries (Belgium, Canada...)
      • e.g. most Cora products have all the text in French and in Dutch: http://world.openfoodfacts.org/brand/cora
  • Products with some information in multiple languages
    • e.g. ingredients and nutrition facts in many languages
    • ingredients.8.400.jpg

Problems not addressed

  • Products that have different labels in different languages, with the same barcode
  • Products that have different labels in different languages, with a different barcode
    • we simply store them as different products
    • we might consider linking the products in some way in a future project
  • Translating ingredients
    • will be addressed by ingredients taxonomy

Current status

Data entry

  • We currently have a field to indicate the "main" language of a product
    • The intention of this field was to put the language that is the most prominent on the label.
      • There are a few products where the split is 50% / 50% but those are not very common.
    • Values entered in the product edit form are considered to be in the "main" language of the product
    • Values entered in fields for which we have a taxonomy (categories, countries, labels, traces) are mapped according to the "main" language of the product
      • e.g. entering "jus de fruits" in the categories field when the main language is set to French will result in en:fruit-juices to be assigned.
  • Values for nutrition facts are global and assigned to a canonical field
    • Except when the nutrient's name is unknown (i.e. not in our current nutrient taxonomy)
  • The default nutrients shown depend on the country (EU nutrition table vs US/CA nutrition table)
  • Images are selected/cropped for the "main" language of the product (product front, ingredients and nutrition facts)

Data display

  • Fields for which we have a taxonomy (categories, countries, labels, traces) are displayed in the target language. (target language is set by the subdomain, e.g. using http://world.openfoodfacts.org , we will see categories in English even for French products, using http://es.openfoodfacts.org will result in Spanish)
    • Only if the category exists in the taxonomy
    • Only if the target language exists for this category in the taxonomy
      • If the target language doesn't exist, English is used to display the field value
  • Nutrition facts are displayed in the target language
    • The nutrition facts are displayed according to the country (different order and presentation for EU nutrition tables vs US/CA nutrition tables)
  • Fields without a taxonomy are displayed in the language they were entered in
    • common name, quantity, packaging, brands, origin of ingredients, manufacturing or processing places, city/state/country, stores, link to the product page, best before date
    • ingredients
  • Images for product front, ingredients and nutrition facts are for the main language of the product (and not the target language)


Solution

Data entry

  • Taxonomize as many fields as possible
    • Taxonomized field need to be completed in only one language
    • Packaging, origins of ingredients, purchase places
      • Note: packaging and purchase places do not correspond to text on the label
  • Enable users to enter data for more than one language:
    • Select different images or different part of the images for product front, ingredients and nutrition facts
    • Enter data (text) for more than one language
  • We keep the "main language" field
    • Indicates the most prominent language on the label
    • When the split is 50% / 50% (e.g. some products sold in Belgium with one side in Dutch and another side in French), picking either is fine.

Data display

  • Taxonomized fields will be displayed in the target language
  • Display other fields and pictures in the target language if the product has data in that language
  • Indicate in which languages the product data is available, and provide a way to see it


Interface design

  • How to make data entry not too overwhelming?
  • Solution 1: tabs to switch between languages
    • + button to add a language tab
    • tab displays only the fields that need to be completed in other languages:
      • images
      • fiels like generic name, ingredients
  • Solution 2: multiply each field by the number of languages, display all languages for one field together
    • + button to dynamically add a new language

Technical design

Taxonomize all or most multilingual tag fields

  • Packaging, origins of ingredients and purchase places are tag fields (they contain comma separated values) that are not yet taxonomized
  • The existing taxonomy system will take care of the mapping between different languages
  • The taxonomies need to be created or completed
  • To be determined: what to do with brands?
    • Typically not translated
      • But exceptions exist
    • Could benefit from being taxonomized in order to have a hierarchy


Selection/crop of images in more than one language

  • For product, ingredients and nutrition facts

Entry of fields in more than one language

  • Keep all exisiting fields as-is
    • including the "main language" field
    • e.g. ingredients, generic_name etc.
    • "ingredients_text":"Chocolate, milk"
    • For compatibility and to enable incremental implementation and deployment
  • Create new fields suffixed by _lang, with a hash
    • "ingredients_text_lang":{"en":"Chocolate, milk", "fr":"Chocolat, lait"}
  • Create new languages field
    • Hash that contains languages for which we have values for at least one field
    • "languages":{"en":"1", "fr":"1"}