Data fields: Difference between revisions

From Open Food Facts wiki
(More information on countries.)
(Change in hierarchy levels so we can add more sub-levels and have a better division between sections)
Line 1: Line 1:
== Data fields ==
A '''data field''' is a structured information that has at least a specific usage. For example, the "product name" field allows us to easily recognize the main name printed on the packaging.
 
A "data field" is a structured information that has at least a specific usage. For example, the "product name" field allows us to easily recognize the main name printed on the packaging.


Open Food Facts manages different kinds of data fields:
Open Food Facts manages different kinds of data fields:
Line 8: Line 6:
# fields that are sometimes computed based on other fields, such as the Nutri-Score, the Nova score, etc.
# fields that are sometimes computed based on other fields, such as the Nutri-Score, the Nova score, etc.


=== Fields completed by users ===
This page deals only with "Fields completed by users". All these fields can entered or modified by hand by the users.
All these fields can entered or modified by hand by the users.
''[to be completed]''
[to be completed]


==== Product name ====
== Product name ==
The product name is the main name printed on the packaging. It can be a registered trademark such as Nutella. This data is important and useful as it's one of the most used data.
The product name is the main name printed on the packaging. It can be a registered trademark such as Nutella. This data is important and useful as it's one of the most used data.


Line 23: Line 20:
The product name shouldn't include any other information such as the brand of the product, the weight, etc.
The product name shouldn't include any other information such as the brand of the product, the weight, etc.


Good examples:
In the database, the technical name for this field is <code>product_name</code>.
 
=== Good examples ===
* <code>Nesquick</code> ([https://world.openfoodfacts.org/product/3033710065967/nesquik-poudre-cacaotee-boite-nestle link])
* <code>Nesquick</code> ([https://world.openfoodfacts.org/product/3033710065967/nesquik-poudre-cacaotee-boite-nestle link])


Bad examples:
=== Bad examples ===
* <code>Petit déjeuner Nesquick</code> => you don't have to explain, just put the name from the packaging
* <code>Petit déjeuner Nesquick</code> => you don't have to explain, just put the name from the packaging
* <code>Nutella by Ferrero</code> => you shouldn't fill the brand here, there's a field for that :)
* <code>Nutella by Ferrero</code> => you shouldn't fill the brand here, there's a field for that :)
* <code>Nesquick®</code> => Don't use symbols ®, ™, © or similar in product name data field or even in other fields.
* <code>Nesquick®</code> => Don't use symbols ®, ™, © or similar in product name data field or even in other fields.


In the database, the technical name for this field is <code>product_name</code>.
== Common name ==
 
==== Common name ====
The common name defines the product. It is the name used when you don't want or can't use the product name. This is the place where you say <code>Cocoa and hazelnuts spreads</code> instead of <code>Nutella</code>. This name is very useful for our AI (artificial intelligence): it helps to guess the category of the product.
The common name defines the product. It is the name used when you don't want or can't use the product name. This is the place where you say <code>Cocoa and hazelnuts spreads</code> instead of <code>Nutella</code>. This name is very useful for our AI (artificial intelligence): it helps to guess the category of the product.


Line 40: Line 37:
In the database, the technical name for this field is <code>generic_name</code>.
In the database, the technical name for this field is <code>generic_name</code>.


==== Quantity ====
== Quantity ==
This is the quantity of the product, with the corresponding number of portions or unit. The best way to fill it is to enter the value as indicated on the product. Don't forget the units! If we can deduce the quantity in grams it can be used to calculate some things such as the carbon impact.
This is the quantity of the product, with the corresponding number of portions or unit. The best way to fill it is to enter the value as indicated on the product. Don't forget the units! If we can deduce the quantity in grams it can be used to calculate some things such as the carbon impact.


Line 57: Line 54:
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+quantity+label%3Aquantity issues related to <code>quantity</code>].
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+quantity+label%3Aquantity issues related to <code>quantity</code>].


==== Packaging ====
== Packaging ==
This is the packaging of the product. Multiple values are allowed. There is no taxonomy for this field, so you can enter anything you find relevant including:
This is the packaging of the product. Multiple values are allowed. There is no taxonomy for this field, so you can enter anything you find relevant including:
* the substance of the packaging: glass, metal, plastic, etc.
* the substance of the packaging: glass, metal, plastic, etc.
Line 74: Line 71:
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+packaging+label%3Apackaging issues related to <code>packaging</code>].
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+packaging+label%3Apackaging issues related to <code>packaging</code>].


==== Brands ====
== Brands ==
This is the brands of the product. The main brand, generally clearly displayed on the front pack, should be entered first. A product can have other brands:
This is the brands of the product. The main brand, generally clearly displayed on the front pack, should be entered first. A product can have other brands:
* when a product is a brand sold by a big company: <code>Actimel</code> is sold by <code>Danone</code>, see https://world.openfoodfacts.org/product/4009700036810/actimel-granatapfel
* when a product is a brand sold by a big company: <code>Actimel</code> is sold by <code>Danone</code>, see https://world.openfoodfacts.org/product/4009700036810/actimel-granatapfel
Line 87: Line 84:
See: [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+ingredients+label%3Abrands issues related to <code>brands</code>].
See: [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+ingredients+label%3Abrands issues related to <code>brands</code>].


==== Categories ====
== Categories ==
[to be completed]
[to be completed]


Line 100: Line 97:
* [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+category+label%3Acategories issues related to <code>categories</code>].
* [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+category+label%3Acategories issues related to <code>categories</code>].


==== Labels ====
== Labels ==
[to be completed]
[to be completed]


Line 113: Line 110:
* [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+category+label%3Alabels issues related to <code>labels</code>].
* [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+category+label%3Alabels issues related to <code>labels</code>].


==== Manufacturing or processing places ====
== Manufacturing or processing places ==
[to be completed]
[to be completed]


This field lists the places where the product has been manufactured or processed.
This field lists the places where the product has been manufactured or processed.


==== EMB code ====
== EMB code ==
[to be completed]
[to be completed]


Line 130: Line 127:
We use it to produce the world map of products: https://cestemballepresdechezvous.fr/ (fr) and https://madenear.me/ (en) [down on 2021-08-19].
We use it to produce the world map of products: https://cestemballepresdechezvous.fr/ (fr) and https://madenear.me/ (en) [down on 2021-08-19].


==== '''Countries where sold''' ====
== Countries where sold ==
This field contains all the countries where the product is sold. If the field contains France, the product will be listed on the https://fr.openfoodfacts.org/ website.
This field contains all the countries where the product is sold. If the field contains France, the product will be listed on the https://fr.openfoodfacts.org/ website.


Line 139: Line 136:
In the database, this field is called <code>countries</code>. This field allows to compute the <code>countries_tags</code> field, the normalized version of countries, which allows to show the name of a country in the desired language, thanks to countries taxonomy. Eg: when <code>Italy</code> is entered from a page in english (say,  world.openfoodfacts.org), <code>countries_tags</code> becomes <code>en:italy</code>.
In the database, this field is called <code>countries</code>. This field allows to compute the <code>countries_tags</code> field, the normalized version of countries, which allows to show the name of a country in the desired language, thanks to countries taxonomy. Eg: when <code>Italy</code> is entered from a page in english (say,  world.openfoodfacts.org), <code>countries_tags</code> becomes <code>en:italy</code>.


==== Ingredients ====
== Ingredients ==
This field lists the [[ingredients]] of the product. This field is '''one of the most important''' as it is used for:
This field lists the [[ingredients]] of the product. This field is '''one of the most important''' as it is used for:
* Nova calculation
* Nova calculation
Line 153: Line 150:


See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+ingredients+label%3Aingredients issues related to <code>ingredients</code>].
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+ingredients+label%3Aingredients issues related to <code>ingredients</code>].
==== Substances or products causing allergies or intolerances ====
 
== Substances or products causing allergies or intolerances ==
The substances are ingredients that are actually in the product, which could cause common allergies. This field can be filled by hand, but is also completed by automatic ingredients analysis.
The substances are ingredients that are actually in the product, which could cause common allergies. This field can be filled by hand, but is also completed by automatic ingredients analysis.


Line 164: Line 162:


See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+traces+label%3Aallergens issues related to <code>allergens_tags</code>].
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+traces+label%3Aallergens issues related to <code>allergens_tags</code>].
==== Traces ====
== Traces ==
The traces are ingredients which are not used for the product itself but lay in the factory or the production process: the product might contains traces of these ingredients. Traces are really important if you are allergic.
The traces are ingredients which are not used for the product itself but lay in the factory or the production process: the product might contains traces of these ingredients. Traces are really important if you are allergic.


Line 177: Line 175:


See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+traces+label%3Atraces issues related to <code>traces</code>].
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+traces+label%3Atraces issues related to <code>traces</code>].
==== Best before date (expiration date) ====
 
== Best before date (expiration date) ==
The expiration date is a way to track product changes over time and to identify the most recent version. It's a data for manual usages. At this moment (2020-03), Open Food Facts apps and website don't make any usage of this field. An issue is open to [https://github.com/openfoodfacts/openfoodfacts-server/issues/76 throw off very old products in averages], it could be useful for it.
The expiration date is a way to track product changes over time and to identify the most recent version. It's a data for manual usages. At this moment (2020-03), Open Food Facts apps and website don't make any usage of this field. An issue is open to [https://github.com/openfoodfacts/openfoodfacts-server/issues/76 throw off very old products in averages], it could be useful for it.


Line 187: Line 186:


In the database and in Product Opener software, the technical name for this field is <code>expiration_date</code>.
In the database and in Product Opener software, the technical name for this field is <code>expiration_date</code>.
==== Serving size ====
 
== Serving size ==
Serving size has a specific goal: to let Open Food Facts app make a proportional calculation of each nutrient per serving size. If a candy's weight is 5 g, it can be chosen as the serving size: if these candies has 66 g of sugar per 100 g, it has about 3 g per candy. [https://github.com/openfoodfacts/openfoodfacts-server/blob/f25308b7d47255be83210f699f897cba87c9517f/lib/ProductOpener/Food.pm#L3835 Allowed units] are: <code>kg, g, mg, µg, oz, l, dl, cl, ml, fl.oz, fl oz, г, мг, кг, л, дл, кл, мл, 毫克, 公斤, 毫升, 公升, 吨</code>.
Serving size has a specific goal: to let Open Food Facts app make a proportional calculation of each nutrient per serving size. If a candy's weight is 5 g, it can be chosen as the serving size: if these candies has 66 g of sugar per 100 g, it has about 3 g per candy. [https://github.com/openfoodfacts/openfoodfacts-server/blob/f25308b7d47255be83210f699f897cba87c9517f/lib/ProductOpener/Food.pm#L3835 Allowed units] are: <code>kg, g, mg, µg, oz, l, dl, cl, ml, fl.oz, fl oz, г, мг, кг, л, дл, кл, мл, 毫克, 公斤, 毫升, 公升, 吨</code>.


Line 194: Line 194:
Decimals can be written with a comma (<code>,</code>) or a point (<code>.</code>).
Decimals can be written with a comma (<code>,</code>) or a point (<code>.</code>).


Good:
In the database and in Product Opener software, the technical name for this field is <code>serving_size</code>. Based on <code>serving_size</code>, Open Food Facts computes a <code>serving_quantity</code> float number for 100g or 100ml. The <code>serving_quantity</code> can be found in the API or the data exports.
 
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+serving+label%3Aportions issues related to <code>serving_size</code>].
 
=== Good examples ===
* <code>60 g</code> (preferred, for readability reasons)
* <code>60 g</code> (preferred, for readability reasons)
* <code>30g</code>
* <code>30g</code>
Line 201: Line 205:
* <code>1L</code>
* <code>1L</code>


Possible (while not recommended):
=== Possible examples (while not recommended) ===
* <code>cookie 25g</code>
* <code>cookie 25g</code>
* <code>One Slice (50g)</code>
* <code>One Slice (50g)</code>
* <code>97 g (0.5 cup)</code>
* <code>97 g (0.5 cup)</code>


Bad:
=== Bad examples ===
* <code>30 gr</code> => <code>gr</code> is not a correct unit
* <code>30 gr</code> => <code>gr</code> is not a correct unit
* <code>9 candies and 2 biscuits</code> => it's not possible to calculate a ratio because we don't know the weight of this portion
* <code>9 candies and 2 biscuits</code> => it's not possible to calculate a ratio because we don't know the weight of this portion
* <code>30</code> => there is no unit
* <code>30</code> => there is no unit
In the database and in Product Opener software, the technical name for this field is <code>serving_size</code>. Based on <code>serving_size</code>, Open Food Facts computes a <code>serving_quantity</code> float number for 100g or 100ml. The <code>serving_quantity</code> can be found in the API or the data exports.
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+serving+label%3Aportions issues related to <code>serving_size</code>].

Revision as of 15:41, 26 October 2021

A data field is a structured information that has at least a specific usage. For example, the "product name" field allows us to easily recognize the main name printed on the packaging.

Open Food Facts manages different kinds of data fields:

  1. fields that can be completed by users, such as the name of the product, the brand, etc.
  2. fields that are always computed by machines such as the name of the contributor or the date of the contribution
  3. fields that are sometimes computed based on other fields, such as the Nutri-Score, the Nova score, etc.

This page deals only with "Fields completed by users". All these fields can entered or modified by hand by the users. [to be completed]

Product name

The product name is the main name printed on the packaging. It can be a registered trademark such as Nutella. This data is important and useful as it's one of the most used data.

If it's not a part of the name, it shouldn't contain the number of portions or a quantity: bad examples are "1 Onglet", "10 Burgers", "1L Sirop cerise"; "100% Cacao", "1848 Lait noisette" (name of a product); it shouldn't contain registered trademark symbols ®, HTML code such as &quot;; it shouldn't be in capital letters except if they are used on the product; it shouldn't contain brands except if it's included in the name ("Kinder Bueno" is good while "Kronembourg 1664", "Stella" or "Vodka Smirnoff" are not); it shouldn't contain price.

At the beginning of 2020, more than 95% of Open Food Facts products have a product name:

The product name shouldn't include any other information such as the brand of the product, the weight, etc.

In the database, the technical name for this field is product_name.

Good examples

Bad examples

  • Petit déjeuner Nesquick => you don't have to explain, just put the name from the packaging
  • Nutella by Ferrero => you shouldn't fill the brand here, there's a field for that :)
  • Nesquick® => Don't use symbols ®, ™, © or similar in product name data field or even in other fields.

Common name

The common name defines the product. It is the name used when you don't want or can't use the product name. This is the place where you say Cocoa and hazelnuts spreads instead of Nutella. This name is very useful for our AI (artificial intelligence): it helps to guess the category of the product.

The common name might be equivalent to product category but sometimes not [examples].

In the database, the technical name for this field is generic_name.

Quantity

This is the quantity of the product, with the corresponding number of portions or unit. The best way to fill it is to enter the value as indicated on the product. Don't forget the units! If we can deduce the quantity in grams it can be used to calculate some things such as the carbon impact.

Examples:

  • 230g
  • 230 g
  • 6 (for 6 eggs)
  • 3 x 150g (for a product with 3 boxes, each of 150g)

A complete wiki page is dedicated to products quantities.

In the database, the technical name for this field is quantity.

See quantities that are not recognized.

See issues related to quantity.

Packaging

This is the packaging of the product. Multiple values are allowed. There is no taxonomy for this field, so you can enter anything you find relevant including:

  • the substance of the packaging: glass, metal, plastic, etc.
  • the shape: bottle, can, etc.

You can write it in your own language. Don't hesitate to add as much data as you find relevant.

Two draft taxonomies has been proposed, don't hesitate to comment or help:

In the database, there's two fields related to packaging:

  • packaging: the list of values entered in the packaging field; e.g.: Bottle,Glass
  • packaging_tags: an array of different packaging tags

See issues related to packaging.

Brands

This is the brands of the product. The main brand, generally clearly displayed on the front pack, should be entered first. A product can have other brands:

There's no taxonomy for brands for the moment, so just do your best and don't waste too much time to enter brands.

A complete wiki page is dedicated to brands.

In the database, this field is called brands.

See: issues related to brands.

Categories

[to be completed]

This field list the specific category of a product and its related parents. This field is very important: it needs to be completed for Nutri-Score computation, as this computation makes differences between types of products (beverages, ...).

→ Indicate only the most specific category. "Parents" categories will be automatically added.

Examples: Sardines in olive oil, Orange juice from concentrate

See also:

Labels

[to be completed]

"Labels, certifications, awards" on the website.

Example: Organic.

See the Labels page for more informations.

See also:

Manufacturing or processing places

[to be completed]

This field lists the places where the product has been manufactured or processed.

EMB code

[to be completed]

This field is dedicated for various codes related to packaging marks, identification marks or health marks:

Open Food Facts gathered 25,000+ codes as of 2021-08-18. These codes allow to localize the places on a map: https://world.openfoodfacts.org/packager-code/fr-72-264-002-ce

We use it to produce the world map of products: https://cestemballepresdechezvous.fr/ (fr) and https://madenear.me/ (en) [down on 2021-08-19].

Countries where sold

This field contains all the countries where the product is sold. If the field contains France, the product will be listed on the https://fr.openfoodfacts.org/ website.

The list of all existing countries can be found here: https://world.openfoodfacts.org/countries

In 2021, the United Nation recognize 193 countries but Open Food Facts also recognize other territories, such as overseas regions/territories such as French Guiana, Guadeloupe, French Polynesia, etc. This field is taxonomized (source code). Though, Open Food Facts accepts all values and some people or some bogus tools enter other values, which leads to have bad values.

In the database, this field is called countries. This field allows to compute the countries_tags field, the normalized version of countries, which allows to show the name of a country in the desired language, thanks to countries taxonomy. Eg: when Italy is entered from a page in english (say, world.openfoodfacts.org), countries_tags becomes en:italy.

Ingredients

This field lists the ingredients of the product. This field is one of the most important as it is used for:

  • Nova calculation
  • Nutri-Score calculation: it is used to calculate the proportion of fruit and nuts
  • evaluation of some food preferences such as vegetarians, vegans, etc.
  • automatic identification of allergens (both substances and traces, see below)
  • etc.

Thanks to automatic Optical Characters Recognition (OCR), this field can be filled by softwares. But OCR is not always good, and you should always verify the result.

Ingredients analysis also produce an array of all ingredients which allows translation in other languages. To allow analysis and translation, Open Food Facts community has build a taxonomy. You can contribute to it.

In the database, this field is called ingredients_[country code].

See issues related to ingredients.

Substances or products causing allergies or intolerances

The substances are ingredients that are actually in the product, which could cause common allergies. This field can be filled by hand, but is also completed by automatic ingredients analysis.

Examples:

  • Milk
  • Gluten
  • Nuts

In the database, this field is an array of tags called allergens_tags.

See issues related to allergens_tags.

Traces

The traces are ingredients which are not used for the product itself but lay in the factory or the production process: the product might contains traces of these ingredients. Traces are really important if you are allergic.

This field can be filled by hand, but is also completed by automatic ingredients analysis.

Examples:

  • Milk
  • Gluten
  • Nuts

In the database, the technical name for this field is traces.

See issues related to traces.

Best before date (expiration date)

The expiration date is a way to track product changes over time and to identify the most recent version. It's a data for manual usages. At this moment (2020-03), Open Food Facts apps and website don't make any usage of this field. An issue is open to throw off very old products in averages, it could be useful for it.

Be aware that, for the moment, this field is NOT normalized, so it probably contains dates in various formats that can be ambiguous (31/12/2019, 12/31/2019, 13 mai 2018, etc.).

It is possible to see:

In the database and in Product Opener software, the technical name for this field is expiration_date.

Serving size

Serving size has a specific goal: to let Open Food Facts app make a proportional calculation of each nutrient per serving size. If a candy's weight is 5 g, it can be chosen as the serving size: if these candies has 66 g of sugar per 100 g, it has about 3 g per candy. Allowed units are: kg, g, mg, µg, oz, l, dl, cl, ml, fl.oz, fl oz, г, мг, кг, л, дл, кл, мл, 毫克, 公斤, 毫升, 公升, 吨.

grammes, liter, etc., are NOT recognized.

Decimals can be written with a comma (,) or a point (.).

In the database and in Product Opener software, the technical name for this field is serving_size. Based on serving_size, Open Food Facts computes a serving_quantity float number for 100g or 100ml. The serving_quantity can be found in the API or the data exports.

See issues related to serving_size.

Good examples

  • 60 g (preferred, for readability reasons)
  • 30g
  • 35G
  • 90 ml
  • 1L

Possible examples (while not recommended)

  • cookie 25g
  • One Slice (50g)
  • 97 g (0.5 cup)

Bad examples

  • 30 gr => gr is not a correct unit
  • 9 candies and 2 biscuits => it's not possible to calculate a ratio because we don't know the weight of this portion
  • 30 => there is no unit