Data fields: Difference between revisions
(→Ingredients: add taxonomy) |
(More on quantity) |
||
Line 43: | Line 43: | ||
==== Quantity ==== | ==== Quantity ==== | ||
This is the quantity of the product, with the corresponding number of portions or unit; example: "230g", "6" (for 6 eggs), etc. | This is the quantity of the product, with the corresponding number of portions or unit; example: "230g", "6" (for 6 eggs), etc. | ||
As indicated on the product. If we can deduce the quantity in grams it can be used to calculate some things such as the carbon impact. | |||
In the database, the technical name for this field is <code>quantity</code>. | In the database, the technical name for this field is <code>quantity</code>. | ||
See [https://world.openfoodfacts.org/data-quality/quantity-not-recognized quantities that are not recognized]. | |||
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+quantity+label%3Aquantity issues related to <code>quantity</code>]. | See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+quantity+label%3Aquantity issues related to <code>quantity</code>]. | ||
Line 50: | Line 54: | ||
==== Ingredients ==== | ==== Ingredients ==== | ||
This field lists the ingredients of the product. This field is '''one of the most important''' as it is used for: | This field lists the [[ingredients]] of the product. This field is '''one of the most important''' as it is used for: | ||
* Nova calculation | * Nova calculation | ||
* Nutri-Score calculation: it is used to calculate the proportion of fruit and nuts | * Nutri-Score calculation: it is used to calculate the proportion of fruit and nuts | ||
Line 63: | Line 67: | ||
See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+ingredients+label%3Aingredients issues related to <code>ingredients</code>]. | See [https://github.com/openfoodfacts/openfoodfacts-server/issues?q=is%3Aissue+is%3Aopen+ingredients+label%3Aingredients issues related to <code>ingredients</code>]. | ||
==== Substances or products causing allergies or intolerances ==== | ==== Substances or products causing allergies or intolerances ==== |
Revision as of 13:43, 23 March 2020
Data fields
A "data field" is a structured information that has at least a specific usage. For example, the "product name" field allows us to easily recognize the main name printed on the packaging.
Open Food Facts manages different kinds of data fields:
- fields that can be completed by users, such as the name of the product, the brand, etc.
- fields that are always computed by machines such as the name of the contributor or the date of the contribution
- fields that are sometimes computed based on other fields, such as the Nutri-Score, the Nova score, etc.
Fields completed by users
All these fields can entered or modified by hand by the users. [to be completed]
Product name
The product name is the main name printed on the packaging. It can be a registered trademark such as Nutella. This data is important and useful as it's one of the most used data.
If it's not a part of the name, it shouldn't contain the number of portions or a quantity: bad examples are "1 Onglet", "10 Burgers", "1L Sirop cerise"; "100% Cacao", "1848 Lait noisette" (name of a product); it shouldn't contain registered trademark symbols ®, HTML code such as "
; it shouldn't be in capital letters except if they are used on the product; it shouldn't contain brands except if it's included in the name ("Kinder Bueno" is good while "Kronembourg 1664", "Stella" or "Vodka Smirnoff" are not); it shouldn't contain price.
At the beginning of 2020, more than 95% of Open Food Facts products have a product name:
The product name shouldn't include any other information such as the brand of the product, the weight, etc.
Good examples:
Nesquick
(link)
Bad examples:
Petit déjeuner Nesquick
=> you don't have to explain, just put the name from the packagingNutella by Ferrero
=> you shouldn't fill the brand here, there's a field for that :)
In the database, the technical name for this field is product_name
.
Common name
The common name defines the product. It is the name used when you don't want or can't use the product name. This is the place where you say "Cocoa and hazelnuts spreads" instead of "Nutella". This name is very useful for our AI (artificial intelligence): it helps to guess the category of the product.
The common name might be equivalent to product category but sometimes not [examples].
In the database, the technical name for this field is generic_name
.
Quantity
This is the quantity of the product, with the corresponding number of portions or unit; example: "230g", "6" (for 6 eggs), etc.
As indicated on the product. If we can deduce the quantity in grams it can be used to calculate some things such as the carbon impact.
In the database, the technical name for this field is quantity
.
See quantities that are not recognized.
See issues related to quantity
.
Ingredients
This field lists the ingredients of the product. This field is one of the most important as it is used for:
- Nova calculation
- Nutri-Score calculation: it is used to calculate the proportion of fruit and nuts
- evaluation of some food preferences such as vegetarians, vegans, etc.
- automatic identification of allergens (both substances and traces, see below)
- etc.
Thanks to automatic Optical Characters Recognition (OCR), this field can be filled by softwares. But OCR is not always good, and you should always verify the result.
Ingredients analysis also produce an array of all ingredients which allows translation in other languages. To allow analysis and translation, Open Food Facts community has build a taxonomy. You can contribute to it.
In the database, this field is called ingredients_[country code]
.
See issues related to ingredients
.
Substances or products causing allergies or intolerances
The substances are ingredients that are actually in the product, which could cause common allergies. This field can be filled by hand, but is also completed by automatic ingredients analysis.
Examples:
Milk
Gluten
Nuts
In the database, this field is an array of tags called allergens_tags
.
See issues related to allergens_tags
.
Traces
The traces are ingredients which are not used for the product itself but lay in the factory or the production process: the product might contains traces of these ingredients. Traces are really important if you are allergic. This field can be filled by hand, but is also completed by automatic ingredients analysis.
Examples:
Milk
Gluten
Nuts
In the database, the technical name for this field is traces
.
Best before date (expiration date)
The expiration date is a way to track product changes over time and to identify the most recent version. It's a data for manual usages. At this moment (2020-03), Open Food Facts apps and website don't make any usage of this field. An issue is open to throw off very old products in averages, it could be useful for it.
Be aware that, for the moment, this field is NOT normalized, so it probably contains dates in various formats that can be ambiguous (31/12/2019, 12/31/2019, 13 mai 2018, etc.).
It is possible to see:
- how many products do have an expiration date (a bit more than 10% at the beginning of 2020)
- and how many don't
In the database and in Product Opener software, the technical name for this field is expiration_date
.
Serving size
Serving size has a specific goal: to let Open Food Facts app make a proportional calculation of each nutrient per serving size. If a candy's weight is 5 g, it can be chosen as the serving size: if these candies has 66 g of sugar per 100 g, it has about 3 g per candy. Allowed units are: kg, g, mg, µg, oz, l, dl, cl, ml, fl.oz, fl oz, г, мг, кг, л, дл, кл, мл, 毫克, 公斤, 毫升, 公升, 吨
.
grammes
, liter
, etc., are NOT recognized.
Decimals can be written with a comma (,
) or a point (.
).
Good:
60 g
(preferred, for readability reasons)30g
35G
90 ml
1L
Possible (while not recommended):
cookie 25g
One Slice (50g)
97 g (0.5 cup)
Bad:
30 gr
=>gr
is not a correct unit9 candies and 2 biscuits
=> it's not possible to calculate a ratio because we don't know the weight of this portion30
=> there is no unit
In the database and in Product Opener software, the technical name for this field is serving_size
.