Data fields: Difference between revisions
(Countries) |
(More information on countries.) |
||
Line 136: | Line 136: | ||
In 2021, the [https://en.wikipedia.org/wiki/List_of_sovereign_states United Nation recognize 193 countries] but Open Food Facts also recognize other territories, such as overseas regions/territories such as French Guiana, Guadeloupe, French Polynesia, etc. This field is taxonomized ([https://github.com/openfoodfacts/openfoodfacts-server/blob/main/taxonomies/countries.txt source code]). Though, Open Food Facts accepts all values and some people or some bogus tools enter other values, which leads to have bad values. | In 2021, the [https://en.wikipedia.org/wiki/List_of_sovereign_states United Nation recognize 193 countries] but Open Food Facts also recognize other territories, such as overseas regions/territories such as French Guiana, Guadeloupe, French Polynesia, etc. This field is taxonomized ([https://github.com/openfoodfacts/openfoodfacts-server/blob/main/taxonomies/countries.txt source code]). Though, Open Food Facts accepts all values and some people or some bogus tools enter other values, which leads to have bad values. | ||
In the database, this field is called <code>countries</code>. This field allows to compute the <code>countries_tags</code> field, the normalized version of countries, which allows to show the name of a country in the desired language, thanks to countries taxonomy. Eg: when <code>Italy</code> is entered from a page in english (say, world.openfoodfacts.org), <code>countries_tags</code> becomes <code>en:italy</code>. | |||
==== Ingredients ==== | ==== Ingredients ==== |
Revision as of 14:21, 23 August 2021
Data fields
A "data field" is a structured information that has at least a specific usage. For example, the "product name" field allows us to easily recognize the main name printed on the packaging.
Open Food Facts manages different kinds of data fields:
- fields that can be completed by users, such as the name of the product, the brand, etc.
- fields that are always computed by machines such as the name of the contributor or the date of the contribution
- fields that are sometimes computed based on other fields, such as the Nutri-Score, the Nova score, etc.
Fields completed by users
All these fields can entered or modified by hand by the users. [to be completed]
Product name
The product name is the main name printed on the packaging. It can be a registered trademark such as Nutella. This data is important and useful as it's one of the most used data.
If it's not a part of the name, it shouldn't contain the number of portions or a quantity: bad examples are "1 Onglet", "10 Burgers", "1L Sirop cerise"; "100% Cacao", "1848 Lait noisette" (name of a product); it shouldn't contain registered trademark symbols ®, HTML code such as "
; it shouldn't be in capital letters except if they are used on the product; it shouldn't contain brands except if it's included in the name ("Kinder Bueno" is good while "Kronembourg 1664", "Stella" or "Vodka Smirnoff" are not); it shouldn't contain price.
At the beginning of 2020, more than 95% of Open Food Facts products have a product name:
The product name shouldn't include any other information such as the brand of the product, the weight, etc.
Good examples:
Nesquick
(link)
Bad examples:
Petit déjeuner Nesquick
=> you don't have to explain, just put the name from the packagingNutella by Ferrero
=> you shouldn't fill the brand here, there's a field for that :)Nesquick®
=> Don't use symbols ®, ™, © or similar in product name data field or even in other fields.
In the database, the technical name for this field is product_name
.
Common name
The common name defines the product. It is the name used when you don't want or can't use the product name. This is the place where you say Cocoa and hazelnuts spreads
instead of Nutella
. This name is very useful for our AI (artificial intelligence): it helps to guess the category of the product.
The common name might be equivalent to product category but sometimes not [examples].
In the database, the technical name for this field is generic_name
.
Quantity
This is the quantity of the product, with the corresponding number of portions or unit. The best way to fill it is to enter the value as indicated on the product. Don't forget the units! If we can deduce the quantity in grams it can be used to calculate some things such as the carbon impact.
Examples:
230g
230 g
6
(for 6 eggs)3 x 150g
(for a product with 3 boxes, each of 150g)
A complete wiki page is dedicated to products quantities.
In the database, the technical name for this field is quantity
.
See quantities that are not recognized.
See issues related to quantity
.
Packaging
This is the packaging of the product. Multiple values are allowed. There is no taxonomy for this field, so you can enter anything you find relevant including:
- the substance of the packaging: glass, metal, plastic, etc.
- the shape: bottle, can, etc.
You can write it in your own language. Don't hesitate to add as much data as you find relevant.
Two draft taxonomies has been proposed, don't hesitate to comment or help:
In the database, there's two fields related to packaging:
packaging
: the list of values entered in the packaging field; e.g.:Bottle,Glass
packaging_tags
: an array of different packaging tags
See issues related to packaging
.
Brands
This is the brands of the product. The main brand, generally clearly displayed on the front pack, should be entered first. A product can have other brands:
- when a product is a brand sold by a big company:
Actimel
is sold byDanone
, see https://world.openfoodfacts.org/product/4009700036810/actimel-granatapfel - when a product is sold with its brand translated in two languages:
Nature Valley
is sometimes writtenVal Nature
; see https://world.openfoodfacts.org/product/0065633280267/barre-granola-erable-et-cassonade-nature-valley
There's no taxonomy for brands for the moment, so just do your best and don't waste too much time to enter brands.
A complete wiki page is dedicated to brands.
In the database, this field is called brands
.
See: issues related to brands
.
Categories
[to be completed]
This field list the specific category of a product and its related parents. This field is very important: it needs to be completed for Nutri-Score computation, as this computation makes differences between types of products (beverages, ...).
→ Indicate only the most specific category. "Parents" categories will be automatically added.
Examples: Sardines in olive oil
, Orange juice from concentrate
See also:
Labels
[to be completed]
"Labels, certifications, awards" on the website.
Example: Organic.
See the Labels page for more informations.
See also:
Manufacturing or processing places
[to be completed]
This field lists the places where the product has been manufactured or processed.
EMB code
[to be completed]
This field is dedicated for various codes related to packaging marks, identification marks or health marks:
- EC identification and health marks in use in the European Community to identify food producers or food packagers. Example:
FR 72.264.002 CE
. - EMB codes (fr): packaging codes in use in France. Example:
EMB 72264
.
Open Food Facts gathered 25,000+ codes as of 2021-08-18. These codes allow to localize the places on a map: https://world.openfoodfacts.org/packager-code/fr-72-264-002-ce
We use it to produce the world map of products: https://cestemballepresdechezvous.fr/ (fr) and https://madenear.me/ (en) [down on 2021-08-19].
Countries where sold
This field contains all the countries where the product is sold. If the field contains France, the product will be listed on the https://fr.openfoodfacts.org/ website.
The list of all existing countries can be found here: https://world.openfoodfacts.org/countries
In 2021, the United Nation recognize 193 countries but Open Food Facts also recognize other territories, such as overseas regions/territories such as French Guiana, Guadeloupe, French Polynesia, etc. This field is taxonomized (source code). Though, Open Food Facts accepts all values and some people or some bogus tools enter other values, which leads to have bad values.
In the database, this field is called countries
. This field allows to compute the countries_tags
field, the normalized version of countries, which allows to show the name of a country in the desired language, thanks to countries taxonomy. Eg: when Italy
is entered from a page in english (say, world.openfoodfacts.org), countries_tags
becomes en:italy
.
Ingredients
This field lists the ingredients of the product. This field is one of the most important as it is used for:
- Nova calculation
- Nutri-Score calculation: it is used to calculate the proportion of fruit and nuts
- evaluation of some food preferences such as vegetarians, vegans, etc.
- automatic identification of allergens (both substances and traces, see below)
- etc.
Thanks to automatic Optical Characters Recognition (OCR), this field can be filled by softwares. But OCR is not always good, and you should always verify the result.
Ingredients analysis also produce an array of all ingredients which allows translation in other languages. To allow analysis and translation, Open Food Facts community has build a taxonomy. You can contribute to it.
In the database, this field is called ingredients_[country code]
.
See issues related to ingredients
.
Substances or products causing allergies or intolerances
The substances are ingredients that are actually in the product, which could cause common allergies. This field can be filled by hand, but is also completed by automatic ingredients analysis.
Examples:
Milk
Gluten
Nuts
In the database, this field is an array of tags called allergens_tags
.
See issues related to allergens_tags
.
Traces
The traces are ingredients which are not used for the product itself but lay in the factory or the production process: the product might contains traces of these ingredients. Traces are really important if you are allergic.
This field can be filled by hand, but is also completed by automatic ingredients analysis.
Examples:
Milk
Gluten
Nuts
In the database, the technical name for this field is traces
.
Best before date (expiration date)
The expiration date is a way to track product changes over time and to identify the most recent version. It's a data for manual usages. At this moment (2020-03), Open Food Facts apps and website don't make any usage of this field. An issue is open to throw off very old products in averages, it could be useful for it.
Be aware that, for the moment, this field is NOT normalized, so it probably contains dates in various formats that can be ambiguous (31/12/2019, 12/31/2019, 13 mai 2018, etc.).
It is possible to see:
- how many products do have an expiration date (a bit more than 10% at the beginning of 2020)
- and how many don't
In the database and in Product Opener software, the technical name for this field is expiration_date
.
Serving size
Serving size has a specific goal: to let Open Food Facts app make a proportional calculation of each nutrient per serving size. If a candy's weight is 5 g, it can be chosen as the serving size: if these candies has 66 g of sugar per 100 g, it has about 3 g per candy. Allowed units are: kg, g, mg, µg, oz, l, dl, cl, ml, fl.oz, fl oz, г, мг, кг, л, дл, кл, мл, 毫克, 公斤, 毫升, 公升, 吨
.
grammes
, liter
, etc., are NOT recognized.
Decimals can be written with a comma (,
) or a point (.
).
Good:
60 g
(preferred, for readability reasons)30g
35G
90 ml
1L
Possible (while not recommended):
cookie 25g
One Slice (50g)
97 g (0.5 cup)
Bad:
30 gr
=>gr
is not a correct unit9 candies and 2 biscuits
=> it's not possible to calculate a ratio because we don't know the weight of this portion30
=> there is no unit
In the database and in Product Opener software, the technical name for this field is serving_size
. Based on serving_size
, Open Food Facts computes a serving_quantity
float number for 100g or 100ml. The serving_quantity
can be found in the API or the data exports.