Internationalization/Multilingual products: Difference between revisions
No edit summary |
|||
(11 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:ProductOpener]] | |||
== Get in touch == | |||
{{Box | |||
| 1 = Slack channel | |||
| 2 = [https://openfoodfacts.slack.com/messages/C0KDDRFEY/ #multilingualproducts] | |||
}} | |||
== GitHub == | |||
* https://github.com/openfoodfacts/openfoodfacts-server/labels/%F0%9F%8C%8D%20Multilingual%20products | |||
= Multilingual products = | = Multilingual products = | ||
Line 8: | Line 17: | ||
** Bilingual products | ** Bilingual products | ||
*** Products where all the texts on the label are in two (or more) languages | *** Products where all the texts on the label are in two (or more) languages | ||
*** Common in bilingual countries (Belgium, Canada...) | *** Common in bilingual countries (Belgium, Canada, Luxembourg, ...) | ||
*** e.g. most Cora products have all the text in French and in Dutch: http://world.openfoodfacts.org/brand/cora | *** e.g. most Cora products have all the text in French and in Dutch: http://world.openfoodfacts.org/brand/cora | ||
* Products with some information in multiple languages | * Products with some information in multiple languages | ||
Line 68: | Line 77: | ||
* Enable users to enter data for more than one language: | * Enable users to enter data for more than one language: | ||
** Select different images or different part of the images for product front, ingredients and nutrition facts | ** Select different images or different part of the images for product front, ingredients and nutrition facts (Usually multi-lingual nutrition facts are found in the same table, but undoubtedly exceptions exist. But do no require multiple image selections when it not necessary --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 17:55, 15 November 2015 (CET)) | ||
** Enter data (text) for more than one language | ** Enter data (text) for more than one language | ||
** Allow users to use another language than the main language through the use of a prefix 'nl:' for example. This is useful for country specific labels, c.f. '[http://world-fr.openfoodfacts.org/cgi/search.pl?search_terms=Glasbak&search_simple=1&action=process Glasbak]', 'Ecoponto azul', etc. --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 17:55, 15 November 2015 (CET) | |||
* We keep the "main language" field | * We keep the "main language" field | ||
** Indicates the most prominent language on the label | ** Indicates the most prominent language on the label (and thus the data entry language used. Not sure whether 'the most prominent' should be a requirement. If I buy a prominent arabic/english product in Egypt, I will only be able to enter the eglish part of the label --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 17:55, 15 November 2015 (CET)) | ||
** When the split is 50% / 50% (e.g. some products sold in Belgium with one side in Dutch and another side in French), picking either is fine. | ** When the split is 50% / 50% (e.g. some products sold in Belgium with one side in Dutch and another side in French), picking either is fine. | ||
=== Data display === | === Data display === | ||
* Taxonomized fields will be displayed in the target language | * Taxonomized fields will be displayed in the target language. (What about country specific labels? Target language (and thus translation) is nice, but the corresponding label should be displayed as well. It helps a non-native speaker to interpret the labels --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 18:00, 15 November 2015 (CET)) | ||
* Display other fields and pictures in the target language if the product has data in that language | * Display other fields and pictures in the target language if the product has data in that language. (No translation!? --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 18:00, 15 November 2015 (CET)) | ||
* Indicate in which languages the product data is available, and provide a way to see | * Indicate in which languages the product data is available, and provide a way to see each language | ||
=== Interface design === | === Interface design === | ||
* How to make data entry not too overwhelming? | * How to make data entry not too overwhelming? | ||
* Solution 1: tabs to switch between languages | * Interface should take into account touch interfaces (touch targets large enough) --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 18:57, 15 November 2015 (CET). | ||
* Not all language dependent fields need to be present on a label, so do not multiply all fields for all languages --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 18:57, 15 November 2015 (CET) | |||
* Solution 1: tabs to switch between languages (slide-down menu? One tab per product or one tab per relevant field? --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 18:04, 15 November 2015 (CET)) | |||
** + button to add a language tab | ** + button to add a language tab | ||
** tab displays only the fields that need to be completed in other languages: | ** tab displays only the fields that need to be completed in other languages: | ||
*** images | *** images | ||
*** | *** fields like generic name, ingredients | ||
* Solution 2: multiply each field by the number of languages, display all languages for one field together | * Solution 2: multiply each field by the number of languages, display all languages for one field together | ||
** + button to dynamically add a new language | ** + button to dynamically add a new language | ||
Line 97: | Line 108: | ||
==== Taxonomize all or most multilingual tag fields ==== | ==== Taxonomize all or most multilingual tag fields ==== | ||
* Packaging, origins of ingredients and purchase places are tag fields (they contain comma separated values) that are not yet taxonomized | * Packaging, origins of ingredients and purchase places are tag fields (they contain comma separated values) that are not yet taxonomized. (Origin of ingredients is a two part tag: ingredient+origin. How to keep them apart? --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 19:05, 15 November 2015 (CET)) | ||
* The existing taxonomy system will take care of the mapping between different languages | * The existing taxonomy system will take care of the mapping between different languages (country and language specific labels? --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 19:05, 15 November 2015 (CET)) | ||
* The taxonomies need to be created or completed | * The taxonomies need to be created or completed | ||
* To be determined: what to do with brands? | * To be determined: what to do with brands? | ||
Line 104: | Line 115: | ||
*** But exceptions exist | *** But exceptions exist | ||
** Could benefit from being taxonomized in order to have a hierarchy | ** Could benefit from being taxonomized in order to have a hierarchy | ||
** An Hierarchy could also help to normalize brands, i.e. remove errors --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 19:05, 15 November 2015 (CET) | |||
==== Selection/crop of images in more than one language ==== | ==== Selection/crop of images in more than one language ==== | ||
Line 119: | Line 130: | ||
* Create new fields suffixed by _lang, with a hash | * Create new fields suffixed by _lang, with a hash | ||
** "ingredients_text_lang":{"en":"Chocolate, milk", "fr":"Chocolat, lait"} | ** "ingredients_text_lang":{"en":"Chocolate, milk", "fr":"Chocolat, lait"} | ||
** So multiple languages can be found in the same field. --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 19:18, 15 November 2015 (CET) | |||
* Create new languages field | * Create new languages field | ||
** Hash that contains languages for which we have values for at least one field | ** Hash that contains languages for which we have values for at least one field | ||
** "languages":{"en":"1", "fr":"1"} | ** "languages":{"en":"1", "fr":"1"} | ||
** What is the purpose of this field? --[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 19:18, 15 November 2015 (CET) | |||
Could there be conflicts between the ingredients_text qnd ingredients_text_lang fields? And how should these be resolved? | |||
Could the number of hashes in *_lang fields be different? Also different in content (i.e. "it" and "en" in one field and "fr" and "nl" in another field) | |||
--[[User:Aleene|Aleene]] ([[User talk:Aleene|talk]]) 19:18, 15 November 2015 (CET) |
Latest revision as of 13:46, 14 August 2024
Get in touch
|
---|
GitHub
Multilingual products
This page is to discuss how to handle products that have text in multiple languages on their label.
Problems addressed
- Single products with more than one language on the label
- Bilingual products
- Products where all the texts on the label are in two (or more) languages
- Common in bilingual countries (Belgium, Canada, Luxembourg, ...)
- e.g. most Cora products have all the text in French and in Dutch: http://world.openfoodfacts.org/brand/cora
- Bilingual products
- Products with some information in multiple languages
- e.g. ingredients and nutrition facts in many languages
Note that the information can be different between languages (happens on beer).
Problems not addressed
- Products that have different labels in different languages, with the same barcode
- will be addressed by Project:Product versions and history
- Products that have different labels in different languages, with a different barcode
- we simply store them as different products
- we might consider linking the products in some way in a future project
- Translating ingredients
- will be addressed by ingredients taxonomy
Current status
Data entry
- We currently have a field to indicate the "main" language of a product
- The intention of this field was to put the language that is the most prominent on the label.
- There are a few products where the split is 50% / 50% but those are not very common.
- Values entered in the product edit form are considered to be in the "main" language of the product
- Values entered in fields for which we have a taxonomy (categories, countries, labels, traces) are mapped according to the "main" language of the product
- e.g. entering "jus de fruits" in the categories field when the main language is set to French will result in en:fruit-juices to be assigned.
- The intention of this field was to put the language that is the most prominent on the label.
So "main language" means more the language, which is used to enter the fields. --Aleene (talk) 17:18, 15 November 2015 (CET)
- Values for nutrition facts are global and assigned to a canonical field
- Except when the nutrient's name is unknown (i.e. not in our current nutrient taxonomy)
- The default nutrients shown depend on the country (EU nutrition table vs US/CA nutrition table)
- Images are selected/cropped for the "main" language of the product (product front, ingredients and nutrition facts). Often images for the other languages are missing --Aleene (talk) 17:18, 15 November 2015 (CET)
Data display
- Fields for which we have a taxonomy (categories, countries, labels, traces) are displayed in the target language. (target language is set by the subdomain, e.g. using http://world.openfoodfacts.org , we will see categories in English even for French products, using http://es.openfoodfacts.org will result in Spanish)
- Nutrition facts are displayed in the target language (if available in the taxonomy --Aleene (talk) 17:23, 15 November 2015 (CET))
- The nutrition facts are displayed according to the country (different order and presentation for EU nutrition tables vs US/CA nutrition tables). This is set by the system. The user can not specify which table format is on the label. (I bought a product with a US label in a french shop --Aleene (talk) 17:26, 15 November 2015 (CET))
- Fields without a taxonomy are displayed in the language (the main language) they were entered in
- common name, quantity, packaging, brands, origin of ingredients, manufacturing or processing places, city/state/country, stores, link to the product page, best before date (shouldn't there be a common formatting attempt for fractions and dates, dependent on the main language? --Aleene (talk) 17:29, 15 November 2015 (CET))
- ingredients
- Images for product front, ingredients and nutrition facts are for the main language of the product (and not the target language). (These images are the same as the ones at data entry, i.e. no special processing for display has been applied --Aleene (talk) 17:34, 15 November 2015 (CET))
Solution
Data entry
- Taxonomize as many fields as possible
- Enable users to enter data for more than one language:
- Select different images or different part of the images for product front, ingredients and nutrition facts (Usually multi-lingual nutrition facts are found in the same table, but undoubtedly exceptions exist. But do no require multiple image selections when it not necessary --Aleene (talk) 17:55, 15 November 2015 (CET))
- Enter data (text) for more than one language
- Allow users to use another language than the main language through the use of a prefix 'nl:' for example. This is useful for country specific labels, c.f. 'Glasbak', 'Ecoponto azul', etc. --Aleene (talk) 17:55, 15 November 2015 (CET)
- We keep the "main language" field
- Indicates the most prominent language on the label (and thus the data entry language used. Not sure whether 'the most prominent' should be a requirement. If I buy a prominent arabic/english product in Egypt, I will only be able to enter the eglish part of the label --Aleene (talk) 17:55, 15 November 2015 (CET))
- When the split is 50% / 50% (e.g. some products sold in Belgium with one side in Dutch and another side in French), picking either is fine.
Data display
- Taxonomized fields will be displayed in the target language. (What about country specific labels? Target language (and thus translation) is nice, but the corresponding label should be displayed as well. It helps a non-native speaker to interpret the labels --Aleene (talk) 18:00, 15 November 2015 (CET))
- Display other fields and pictures in the target language if the product has data in that language. (No translation!? --Aleene (talk) 18:00, 15 November 2015 (CET))
- Indicate in which languages the product data is available, and provide a way to see each language
Interface design
- How to make data entry not too overwhelming?
- Interface should take into account touch interfaces (touch targets large enough) --Aleene (talk) 18:57, 15 November 2015 (CET).
- Not all language dependent fields need to be present on a label, so do not multiply all fields for all languages --Aleene (talk) 18:57, 15 November 2015 (CET)
- Solution 1: tabs to switch between languages (slide-down menu? One tab per product or one tab per relevant field? --Aleene (talk) 18:04, 15 November 2015 (CET))
- + button to add a language tab
- tab displays only the fields that need to be completed in other languages:
- images
- fields like generic name, ingredients
- Solution 2: multiply each field by the number of languages, display all languages for one field together
- + button to dynamically add a new language
Technical design
Taxonomize all or most multilingual tag fields
- Packaging, origins of ingredients and purchase places are tag fields (they contain comma separated values) that are not yet taxonomized. (Origin of ingredients is a two part tag: ingredient+origin. How to keep them apart? --Aleene (talk) 19:05, 15 November 2015 (CET))
- The existing taxonomy system will take care of the mapping between different languages (country and language specific labels? --Aleene (talk) 19:05, 15 November 2015 (CET))
- The taxonomies need to be created or completed
- To be determined: what to do with brands?
Selection/crop of images in more than one language
- For product, ingredients and nutrition facts
Entry of fields in more than one language
- Keep all exisiting fields as-is
- including the "main language" field
- e.g. ingredients, generic_name etc.
- "ingredients_text":"Chocolate, milk"
- For compatibility and to enable incremental implementation and deployment
- Create new fields suffixed by _lang, with a hash
- Create new languages field
Could there be conflicts between the ingredients_text qnd ingredients_text_lang fields? And how should these be resolved? Could the number of hashes in *_lang fields be different? Also different in content (i.e. "it" and "en" in one field and "fr" and "nl" in another field) --Aleene (talk) 19:18, 15 November 2015 (CET)