Translating OFF english

From Open Food Facts wiki

alternative title:OFF cares about the world

OFF has the intention to be able to support anyone in the world. This implies that not only it should contain all products, but also be able to speak and understand all languages. Let's start with a small introduction how this will be achieved.

Multilingual interface

Let's start with the interface. If you access OFF through the url world.openfoodfacts.org you'll get the international english interface. On the top-left you will be able the country relevant to you. So if you select Spain, you will see that the interface changed to Spanish. Note also that the url changed to es.openfoodfacts.org. So if you are interested only in Spain, you can use this directly.

Next to the country selection field, you will see now a language field. As multiple languages are spoken in Spain, Open Food Facts allows you to select your desired language. For Spain this can be Catalan, Euskara and Galego (or just english). By selecting `Catala` for instance you will a language change and notice also the change of the url to es-ca.openfoodfacts.org.

On a closer look at the main page in catalan, you will notice that not everything is translated, some texts remain in english. So why is the text not translated? Well someone still has to do it. Someone has to help us with the translation. So we are looking for native speakers, in this case catalan, that can help us translate from english to Catalan.

OFF use the translation service set up by CrowdIn. So if you are a non-native English speaker, check out your native language on CrowdIn and check if translations are still needed. If so, we would appreciate your help a lot. It will make OFF more accessible for speakers of your language.

Notice by the way that also the url's are translated. For instance: https://es-ca.openfoodfacts.org/producte/20944599/kikos-snackday. The word `producte` is replaced with the catalan word `producte`.

Naturally the translations also works for right to left languages, such as Arabic. Just check out the entry page for Egypt.

Also this multi-language approach is not complete yet. Languages for specific countries might be missing (cantonese in Hong-Kong for instance), or countries that support multiple scripts such as Kazakhstan with Cyrillic, Arabic and Latin script. But to get this working we need support from people familiar with this scripts and languages.

When using a latin script, you might not have noticed it, but the country and language selectors are translated as well. So if you stuck on the page for Thailand, switch to the english language first. Then the countries switch to english as well and you will be able to find your country. The english language is available on each country specific page.

Multilingual products

Supporting multiple languages, scripts and countries is one thing. The most important thing is to have products from anywhere. Each country page should only show the products that are sold in that country. This is based on the countries-field available for each products. If you select the Spain entry point, you will also only get the products that are sold in Spain (23.617 products on 15 June 2019).

Some products (with the same barcode) are sold in multiple countries. The product-country field will thus list multiple countries. Thus a product sold in Germany and Austria, has these countries listed in the countries-field of the product and will show up on the OFF-page of Germany and Austria.

Luckily the Germany and Austria use the same language: german. If that product is also sold in Switzerland, not only this country must be added to the product, but also the other languages spoken in Switzerland.

A product sold In Switzerland has at least two languages printed on the product and often three (German, French, Italian). Thus an Italian speaking Swiss not only wants to see the interface in Italian, but also the products should be shown in Italian. This however will depend on the information that is available on the packages, otherwise one of the other languages will be shown.

In order to be able to do this, OFF had to support multi-lingual products. For each product multiple languages can be added for the product name, the generic name, the ingredients and the nutritional values. Naturally this extra data should be supported by corresponding images.

An Italian user in Switzerland, which uses his device or computer in Italian, will now not only see an Italian interface, but also the product information in Italian (if available on the product) and the supporting Italian images.

Taxonomies

Product name and ingredients are not the only fields of a product. There are several others, which are useful to the user.

There are fields that do not have to be translated, such as the brand of a product. Probably there are exceptions to this, but this approach is good enough at the moment. This more or less true for the product size and portion size as well. One can later envisage a translation of gram to Chinese or Russian. Also the manufacturer code is untranslatable.

There are fields that can have a limited number of values and can be translated, such as the countries where a product is sold in. A user on the Spain page will indicate his country by EspaƱa, where a dutch user should see `Spanje`. This kind of translation is supported by a taxonomy. A taxonomy is a list of words with for each word a translation into any other language. This approach is used for countries and languages.

Fortunately there are only a limited number of countries, and a limited number of languages (large!). The same is true for the allergens and traces that are found on a package. These are also added in a taxonomy. Although the number of entries in this taxonomy is limited, each entry has a range of synonyms. Each entry in the ingredients list is matched against this allergen taxonomy, so OFF can show you the allergens in a product. Thus butter or buttermilk are a synonym of the allergen milk. If you see that we do not detect an ingredient in your language as an allergen, inform us, so we can add it as a synonym.

Also the nutrients noted on a product are in a taxonomy. So you should see the nutrients in your language. However a lot of translations are still not in the taxonomy.

The same is true for the additives. OFF also has a taxonomy for possible additives that can be found in ingredient lists. OFF likes to show what additives are present in a product, so it matches all ingredients against this additives taxonomy. And remember, inform us when we miss something.

That leaves several fields that are in the process of being taxonomized. The furtherest developed is the categories field. Each product is assigned by a contributor to a category. This category is used for the calculation of the Nutri-Score and allows to compare products within the same category.

The number of possible categories should be limited, otherwise it is impossible to compare products. And categories across languages should be the same. Thus a category taxonomy is useful. It is a list of all possible groupings of products, with their translations. There can be synonyms for each entry. Again if you access the French or Dutch site, you will see the translated category.

The category taxonomy is a hierarchical taxonomy. This allows the user not only to compare brown long-grain rices from Thailand, but also the category rices in general. You can help with translating the categories.

There are also taxonomies for packaging and labels. Work on these are still in progress. There is also a taxonomy for ingredients, but that is not seen by the user.

Product entry

Multilingual products only work well, when a contributor has created and edited a product in the right way. It is good to describe the contributor part of multilingual products.

Supporting multiple languages also introduces the concept of primary language. This language is shown when the user has no preferred language(s) set. The main language is the language that is shown on the front of the package. Often there is just one language on the front.

Often one sees English used on the front, but not for the ingredients (should english then be an extra language, or not?). Or multiple languages are used on the front, all with equal emphasis. Then the choice of the main language is arbitrary.

Usually the contributor will add or edit products in his own language. If he uses the site or app, it will also probably be set to his language. So a dutch user will view the site in dutch and uses it to work on dutch products. So he will create products with dutch as the main language, adds dutch product names, dutch packaging information (`glasbak`), dutch ingredient lists, etc. This will be true for most countries.

Even countries like Belgium and Switzerland will work like this. (How does this work in Norway?). If a package has an additional language, it can be added together with the name, generic name and ingredients. And please also add images for the specific language information, if the image is not yet available.

For some fields the user can use suggestions based on the taxonomies (categories, nutrients, countries), but is free to add his own entries.

If you are a multilingual user, as I am, life is a bit more complex. Your computer, locale and product language might be different. So how to enter a dutch product on your computer set to english if you are living in France (this is a real use-case).

First when adding a new product, the main language should be checked and if needed adjusted. Filling names and ingredients is straightforward as the language to use is specified. Filling in the nutrients is language independent, so it will present no issues. Also any additional language will be straightforward.

The other fields are more difficult. One can use the english language of the interface. So specify Milk or Nuts as allergens. However the package might specify items, which can/should not be translated. In that case a language prefix can be used. So enter `nl:Glasbak` in the packaging field. Or fr:Sans colorants in the labels field. If the taxonomy recognizes these entries, then they will be translated into the interface language.