Data fields: Difference between revisions

From Open Food Facts wiki
(Add Ingredients field + substances and traces enhancement)
(→‎Ingredients: add taxonomy)
Line 57: Line 57:
* etc.
* etc.
Thanks to automatic Optical Characters Recognition (OCR), this field can be filled by softwares. But OCR is not always good, and you should always verify the result.
Thanks to automatic Optical Characters Recognition (OCR), this field can be filled by softwares. But OCR is not always good, and you should always verify the result.
Ingredients analysis also produce an array of all ingredients which allows translation in other languages. To allow analysis and translation, Open Food Facts community [https://github.com/openfoodfacts/openfoodfacts-server/blob/master/taxonomies/ingredients.txt has build a taxonomy]. You can contribute to it.


In the database, this field is called <code>ingredients_[country code]</code>.
In the database, this field is called <code>ingredients_[country code]</code>.


See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+ingredients+label%3Aingredients issues related to <code>ingredients</code>].
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+ingredients+label%3Aingredients issues related to <code>ingredients</code>].


==== Substances or products causing allergies or intolerances ====
==== Substances or products causing allergies or intolerances ====

Revision as of 15:32, 16 March 2020

Data fields

A "data field" is a structured information that has at least a specific usage. For example, the "product name" field allows us to easily recognize the main name printed on the packaging.

Open Food Facts manages different kinds of data fields:

  1. fields that can be completed by users, such as the name of the product, the brand, etc.
  2. fields that are always computed by machines such as the name of the contributor or the date of the contribution
  3. fields that are sometimes computed based on other fields, such as the Nutri-Score, the Nova score, etc.

Fields completed by users

All these fields can entered or modified by hand by the users. [to be completed]

Product name

The product name is the main name printed on the packaging. It can be a registered trademark such as Nutella. This data is important and useful as it's one of the most used data.

If it's not a part of the name, it shouldn't contain the number of portions or a quantity: bad examples are "1 Onglet", "10 Burgers", "1L Sirop cerise"; "100% Cacao", "1848 Lait noisette" (name of a product); it shouldn't contain registered trademark symbols ®, HTML code such as &quot;; it shouldn't be in capital letters except if they are used on the product; it shouldn't contain brands except if it's included in the name ("Kinder Bueno" is good while "Kronembourg 1664", "Stella" or "Vodka Smirnoff" are not); it shouldn't contain price.

At the beginning of 2020, more than 95% of Open Food Facts products have a product name:

The product name shouldn't include any other information such as the brand of the product, the weight, etc.

Good examples:

Bad examples:

  • Petit déjeuner Nesquick => you don't have to explain, just put the name from the packaging
  • Nutella by Ferrero => you shouldn't fill the brand here, there's a field for that :)

In the database, the technical name for this field is product_name.


Common name

The common name defines the product. It is the name used when you don't want or can't use the product name. This is the place where you say "Cocoa and hazelnuts spreads" instead of "Nutella". This name is very useful for our AI (artificial intelligence): it helps to guess the category of the product.

The common name might be equivalent to product category but sometimes not [examples].

In the database, the technical name for this field is generic_name.


Quantity

This is the quantity of the product, with the corresponding number of portions or unit; example: "230g", "6" (for 6 eggs), etc.

In the database, the technical name for this field is quantity.

See issues related to quantity.


Ingredients

This field lists the ingredients of the product. This field is one of the most important as it is used for:

  • Nova calculation
  • Nutri-Score calculation: it is used to calculate the proportion of fruit and nuts
  • evaluation of some food preferences such as vegetarians, vegans, etc.
  • automatic identification of allergens (both substances and traces, see below)
  • etc.

Thanks to automatic Optical Characters Recognition (OCR), this field can be filled by softwares. But OCR is not always good, and you should always verify the result.

Ingredients analysis also produce an array of all ingredients which allows translation in other languages. To allow analysis and translation, Open Food Facts community has build a taxonomy. You can contribute to it.

In the database, this field is called ingredients_[country code].

See issues related to ingredients.

Substances or products causing allergies or intolerances

The substances are ingredients that are actually in the product, which could cause common allergies. This field can be filled by hand, but is also completed by automatic ingredients analysis.

Examples:

  • Milk
  • Gluten
  • Nuts

In the database, this field is an array of tags called allergens_tags.

See issues related to allergens_tags.


Traces

The traces are ingredients which are not used for the product itself but lay in the factory or the production process: the product might contains traces of these ingredients. Traces are really important if you are allergic. This field can be filled by hand, but is also completed by automatic ingredients analysis.

Examples:

  • Milk
  • Gluten
  • Nuts

In the database, the technical name for this field is traces.

See issues related to traces.


Best before date (expiration date)

The expiration date is a way to track product changes over time and to identify the most recent version. It's a data for manual usages. At this moment (2020-03), Open Food Facts apps and website don't make any usage of this field. An issue is open to throw off very old products in averages, it could be useful for it.

Be aware that, for the moment, this field is NOT normalized, so it probably contains dates in various formats that can be ambiguous (31/12/2019, 12/31/2019, 13 mai 2018, etc.).

It is possible to see:

In the database and in Product Opener software, the technical name for this field is expiration_date.


Serving size

Serving size has a specific goal: to let Open Food Facts app make a proportional calculation of each nutrient per serving size. If a candy's weight is 5 g, it can be chosen as the serving size: if these candies has 66 g of sugar per 100 g, it has about 3 g per candy. Allowed units are: kg, g, mg, µg, oz, l, dl, cl, ml, fl.oz, fl oz, г, мг, кг, л, дл, кл, мл, 毫克, 公斤, 毫升, 公升, 吨.

grammes, liter, etc., are NOT recognized.

Decimals can be written with a comma (,) or a point (.).

Good:

  • 60 g (preferred, for readability reasons)
  • 30g
  • 35G
  • 90 ml
  • 1L

Possible (while not recommended):

  • cookie 25g
  • One Slice (50g)
  • 97 g (0.5 cup)

Bad:

  • 30 gr => gr is not a correct unit
  • 9 candies and 2 biscuits => it's not possible to calculate a ratio because we don't know the weight of this portion
  • 30 => there is no unit

In the database and in Product Opener software, the technical name for this field is serving_size.

See issues related to serving_size.