Data fields

From Open Food Facts wiki
Revision as of 15:55, 26 October 2021 by Laramba (talk | contribs) (→‎Common name: example image)

A data field is a structured information that has at least a specific usage. For example, the "product name" field allows us to easily recognize the main name printed on the packaging.

Open Food Facts manages different kinds of data fields:

  1. fields that can be completed by users, such as the name of the product, the brand, etc.
  2. fields that are always computed by machines such as the name of the contributor or the date of the contribution
  3. fields that are sometimes computed based on other fields, such as the Nutri-Score, the Nova score, etc.

This page deals only with "Fields completed by users". All these fields can entered or modified by hand by the users. [to be completed]

Product name

The product name is the main name printed on the packaging. It can be a registered trademark such as Nutella. This data is important and useful as it's one of the most used data.

If it's not a part of the name, it shouldn't contain the number of portions or a quantity: bad examples are "1 Onglet", "10 Burgers", "1L Sirop cerise"; "100% Cacao", "1848 Lait noisette" (name of a product); it shouldn't contain registered trademark symbols ®, HTML code such as "; it shouldn't be in capital letters except if they are used on the product; it shouldn't contain brands except if it's included in the name ("Kinder Bueno" is good while "Kronembourg 1664", "Stella" or "Vodka Smirnoff" are not); it shouldn't contain price.

At the beginning of 2020, more than 95% of Open Food Facts products have a product name:

The product name shouldn't include any other information such as the brand of the product, the weight, etc.

In the database, the technical name for this field is product_name.

Good examples

Bad examples

  • Petit déjeuner Nesquick => you don't have to explain, just put the name from the packaging
  • Nutella by Ferrero => you shouldn't fill the brand here, there's a field for that :)
  • Nesquick® => Don't use symbols ®, ™, © or similar in product name data field or even in other fields.

Common name

 
Example in Coca-Cola Cherry: "Sparkling Cherry Flavour Soft Drink with Vegetables Extracts". This info should be in "Common name" filed.

The common name defines the product. It is the name used when you don't want or can't use the product name. This is the place where you say Cocoa and hazelnuts spreads instead of Nutella. This name is very useful for our AI (artificial intelligence): it helps to guess the category of the product.

The common name might be equivalent to product category but sometimes not [examples].

In the database, the technical name for this field is generic_name.

Quantity

This is the quantity of the product, with the corresponding number of portions or unit. The best way to fill it is to enter the value as indicated on the product. Don't forget the units! If we can deduce the quantity in grams it can be used to calculate some things such as the carbon impact.

Examples:

  • 230g
  • 230 g
  • 6 (for 6 eggs)
  • 3 x 150g (for a product with 3 boxes, each of 150g)

A complete wiki page is dedicated to products quantities.

In the database, the technical name for this field is quantity.

See quantities that are not recognized.

See issues related to quantity.

Packaging

This is the packaging of the product. Multiple values are allowed. There is no taxonomy for this field, so you can enter anything you find relevant including:

  • the substance of the packaging: glass, metal, plastic, etc.
  • the shape: bottle, can, etc.

You can write it in your own language. Don't hesitate to add as much data as you find relevant.

Two draft taxonomies has been proposed, don't hesitate to comment or help:

In the database, there's two fields related to packaging:

  • packaging: the list of values entered in the packaging field; e.g.: Bottle,Glass
  • packaging_tags: an array of different packaging tags

See issues related to packaging.

Brands

This is the brands of the product. The main brand, generally clearly displayed on the front pack, should be entered first. A product can have other brands:

There's no taxonomy for brands for the moment, so just do your best and don't waste too much time to enter brands.

A complete wiki page is dedicated to brands.

In the database, this field is called brands.

See: issues related to brands.

Categories

[to be completed]

This field list the specific category of a product and its related parents. This field is very important: it needs to be completed for Nutri-Score computation, as this computation makes differences between types of products (beverages, ...).

→ Indicate only the most specific category. "Parents" categories will be automatically added.

Examples: Sardines in olive oil, Orange juice from concentrate

See also:

Labels

[to be completed]

"Labels, certifications, awards" on the website.

Example: Organic.

See the Labels page for more informations.

See also:

Manufacturing or processing places

[to be completed]

This field lists the places where the product has been manufactured or processed.

EMB code

[to be completed]

This field is dedicated for various codes related to packaging marks, identification marks or health marks:

Open Food Facts gathered 25,000+ codes as of 2021-08-18. These codes allow to localize the places on a map: https://world.openfoodfacts.org/packager-code/fr-72-264-002-ce

We use it to produce the world map of products: https://cestemballepresdechezvous.fr/ (fr) and https://madenear.me/ (en) [down on 2021-08-19].

Countries where sold

This field contains all the countries where the product is sold. If the field contains France, the product will be listed on the https://fr.openfoodfacts.org/ website.

The list of all existing countries can be found here: https://world.openfoodfacts.org/countries

In 2021, the United Nation recognize 193 countries but Open Food Facts also recognize other territories, such as overseas regions/territories such as French Guiana, Guadeloupe, French Polynesia, etc. This field is taxonomized (source code). Though, Open Food Facts accepts all values and some people or some bogus tools enter other values, which leads to have bad values.

In the database, this field is called countries. This field allows to compute the countries_tags field, the normalized version of countries, which allows to show the name of a country in the desired language, thanks to countries taxonomy. Eg: when Italy is entered from a page in english (say, world.openfoodfacts.org), countries_tags becomes en:italy.

Ingredients

This field lists the ingredients of the product. This field is one of the most important as it is used for:

  • Nova calculation
  • Nutri-Score calculation: it is used to calculate the proportion of fruit and nuts
  • evaluation of some food preferences such as vegetarians, vegans, etc.
  • automatic identification of allergens (both substances and traces, see below)
  • etc.

Thanks to automatic Optical Characters Recognition (OCR), this field can be filled by softwares. But OCR is not always good, and you should always verify the result.

Ingredients analysis also produce an array of all ingredients which allows translation in other languages. To allow analysis and translation, Open Food Facts community has build a taxonomy. You can contribute to it.

In the database, this field is called ingredients_[country code].

See issues related to ingredients.

Substances or products causing allergies or intolerances

The substances are ingredients that are actually in the product, which could cause common allergies. This field can be filled by hand, but is also completed by automatic ingredients analysis.

Examples:

  • Milk
  • Gluten
  • Nuts

In the database, this field is an array of tags called allergens_tags.

See issues related to allergens_tags.

Traces

The traces are ingredients which are not used for the product itself but lay in the factory or the production process: the product might contains traces of these ingredients. Traces are really important if you are allergic.

This field can be filled by hand, but is also completed by automatic ingredients analysis.

Examples:

  • Milk
  • Gluten
  • Nuts

In the database, the technical name for this field is traces.

See issues related to traces.

Best before date (expiration date)

The expiration date is a way to track product changes over time and to identify the most recent version. It's a data for manual usages. At this moment (2020-03), Open Food Facts apps and website don't make any usage of this field. An issue is open to throw off very old products in averages, it could be useful for it.

Be aware that, for the moment, this field is NOT normalized, so it probably contains dates in various formats that can be ambiguous (31/12/2019, 12/31/2019, 13 mai 2018, etc.).

It is possible to see:

In the database and in Product Opener software, the technical name for this field is expiration_date.

Serving size

Serving size has a specific goal: to let Open Food Facts app make a proportional calculation of each nutrient per serving size. If a candy's weight is 5 g, it can be chosen as the serving size: if these candies has 66 g of sugar per 100 g, it has about 3 g per candy. Allowed units are: kg, g, mg, µg, oz, l, dl, cl, ml, fl.oz, fl oz, г, мг, кг, л, дл, кл, мл, 毫克, 公斤, 毫升, 公升, 吨.

grammes, liter, etc., are NOT recognized.

Decimals can be written with a comma (,) or a point (.).

In the database and in Product Opener software, the technical name for this field is serving_size. Based on serving_size, Open Food Facts computes a serving_quantity float number for 100g or 100ml. The serving_quantity can be found in the API or the data exports.

See issues related to serving_size.

Good examples

  • 60 g (preferred, for readability reasons)
  • 30g
  • 35G
  • 90 ml
  • 1L

Possible examples (while not recommended)

  • cookie 25g
  • One Slice (50g)
  • 97 g (0.5 cup)

Bad examples

  • 30 gr => gr is not a correct unit
  • 9 candies and 2 biscuits => it's not possible to calculate a ratio because we don't know the weight of this portion
  • 30 => there is no unit