Ingredients/Parsing: Difference between revisions
No edit summary |
No edit summary |
||
(11 intermediate revisions by 4 users not shown) | |||
Line 7: | Line 7: | ||
*Van biologische oorsprong. | *Van biologische oorsprong. | ||
</pre> | </pre> | ||
== Comma separator == | == EU Regulation based parsing == | ||
Ingredients are separated by a comma (,) generally followed by a single space | === Comma separator === | ||
== Dot == | * Ingredients are separated by a comma (,) generally followed by a single space | ||
<pre>rietsuiker, plantaardige olie</pre> | |||
* They can be separated by other symbols | |||
<pre> • Polyquaternium-67 • Propylene Glycol • </pre> | |||
=== Dot === | |||
Indicates the end of the ingredients list. | Indicates the end of the ingredients list. | ||
== Asterisk(s) at the start of an ingredient == | Can indicate the start of a new main ingredient, usually when followed by a new Ingredient name followed by a colon: | ||
ingredients: .... . | |||
Cocosmilk: .... . | |||
Curry: .... . | |||
=== Asterisk(s) ('*') or Hash(es) ('#') at the start of an ingredient === | |||
Indicates an annotation for one or more ingredients that have the same number of asterisks at the end. For instance, some items above are indicated as of biological origin | Indicates an annotation for one or more ingredients that have the same number of asterisks at the end. For instance, some items above are indicated as of biological origin | ||
== Asterisk(s) at the end of an ingredient == | |||
=== Asterisk(s) or Hash(es) ('#') at the end of an ingredient === | |||
Indicates that this ingredient has an annotation (see above) | Indicates that this ingredient has an annotation (see above) | ||
== Parenthesis == | |||
=== Parenthesis === | |||
Can indicate sub-components, that can also sometimes interpret to an E-Number. | Can indicate sub-components, that can also sometimes interpret to an E-Number. | ||
For instance: emulgator (sojalecithine) interprets to E322 | For instance: emulgator (sojalecithine) interprets to E322 | ||
== Percentage == | |||
Can indicate percentage if no other data present (e.g. (34%) in which case the parentheses can be ignored. | |||
Can indicate an alternative name, typically indicated as (=alternative name), which can be ignored (alternate names are in the taxonomys), unless perhaps if the original name is not yet known (then report) | |||
Sometimes alternated with square brackets ([]) in recursive details. | |||
=== Colon === | |||
Can indicate sub-components, similar to parenthesis | |||
=== Percentage === | |||
Indicates the quantity | Indicates the quantity | ||
== Order == | === Order === | ||
Items are required to be listed in order of largest to smallest quantity | Items are required to be listed in order of largest to smallest quantity | ||
== | == Some examples == | ||
* http://world.openfoodfacts.org/product/3760122960121/petits-sables-ronds-et-bons-aux-gouttes-de-chocolat-michel-et-augustin | |||
* http://th.openfoodfacts.org/product/8850487049106/%E0%B8%9A%E0%B9%8A%E0%B8%A7%E0%B8%A2%E0%B8%94%E0%B8%AD%E0%B8%87-%E0%B9%81%E0%B8%A1%E0%B9%88%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B8%99%E0%B8%AD%E0%B8%A1 | |||
* http://uk.openfoodfacts.org/product/5038862238977/innocent-super-smoothie-antioxidant | |||
* http://jp.openfoodfacts.org/product/4540118002726/%E9%9B%AA%E5%A1%A9%E3%81%A1%E3%82%93%E3%81%99%E3%81%93%E3%81%86 | |||
* http://world.openfoodfacts.org/product/0000027533048/luxury-christmas-pudding-asda | |||
* http://world.openfoodfacts.org/product/4607005408112/72-%D0%BA%D0%B0%D0%BA%D0%B0%D0%BE-%D0%BF%D0%BE%D0%B1%D0%B5%D0%B4%D0%B0-%D0%B2%D0%BA%D1%83%D1%81%D0%B0 | |||
== Links == | |||
* http://world.openfoodfacts.org/ingredients (will crash your browser) | * http://world.openfoodfacts.org/ingredients (will crash your browser) | ||
* http://world.openfoodfacts.org/files/ingredients.20151117.txt (lighter text version of the above) | * http://world.openfoodfacts.org/files/ingredients.20151117.txt (lighter text version of the above) | ||
* https://files.slack.com/files-pri/T02KVRT1Q-F192GGZE3/download/top500ingredients.xls (normalised Excel version for top 500 of the above) | * https://files.slack.com/files-pri/T02KVRT1Q-F192GGZE3/download/top500ingredients.xls (normalised Excel version for top 500 of the above) | ||
* [[ | * [[Ingredients taxonomy]] (Taxonomisation start) | ||
* Wikidata (We could generate a list from Wikidata) | * Wikidata (We could generate a list from Wikidata) | ||
* List of ingredients for multilingual products: http://world.openfoodfacts.org/language/multilingual/ingredients | * List of ingredients for multilingual products: http://world.openfoodfacts.org/language/multilingual/ingredients | ||
Line 33: | Line 63: | ||
* http://ec.europa.eu/dgs/health_food-safety/dgs_consultations/food/docs/consult_20150104_allergy-intolerance_guidance.pdf | * http://ec.europa.eu/dgs/health_food-safety/dgs_consultations/food/docs/consult_20150104_allergy-intolerance_guidance.pdf | ||
* http://www.fda.gov/Food/GuidanceRegulation/GuidanceDocumentsRegulatoryInformation/LabelingNutrition/ucm2006828.htm | * http://www.fda.gov/Food/GuidanceRegulation/GuidanceDocumentsRegulatoryInformation/LabelingNutrition/ucm2006828.htm | ||
[[Category:Ingredients]] |
Latest revision as of 16:08, 7 August 2024
This page collects what we know about ingredient parsing.
rietsuiker*, plantaardige olie* (zonnebloem, palm), 13% _hazelnoot_*, 7.5% magere cacaopoeder*, magere _melk_poeder*, emulgator (_soja_lecithine), vanille*. *Van biologische oorsprong.
EU Regulation based parsing
Comma separator
- Ingredients are separated by a comma (,) generally followed by a single space
rietsuiker, plantaardige olie
- They can be separated by other symbols
• Polyquaternium-67 • Propylene Glycol •
Dot
Indicates the end of the ingredients list. Can indicate the start of a new main ingredient, usually when followed by a new Ingredient name followed by a colon:
ingredients: .... . Cocosmilk: .... . Curry: .... .
Asterisk(s) ('*') or Hash(es) ('#') at the start of an ingredient
Indicates an annotation for one or more ingredients that have the same number of asterisks at the end. For instance, some items above are indicated as of biological origin
Asterisk(s) or Hash(es) ('#') at the end of an ingredient
Indicates that this ingredient has an annotation (see above)
Parenthesis
Can indicate sub-components, that can also sometimes interpret to an E-Number. For instance: emulgator (sojalecithine) interprets to E322
Can indicate percentage if no other data present (e.g. (34%) in which case the parentheses can be ignored.
Can indicate an alternative name, typically indicated as (=alternative name), which can be ignored (alternate names are in the taxonomys), unless perhaps if the original name is not yet known (then report)
Sometimes alternated with square brackets ([]) in recursive details.
Colon
Can indicate sub-components, similar to parenthesis
Percentage
Indicates the quantity
Order
Items are required to be listed in order of largest to smallest quantity
Some examples
- http://world.openfoodfacts.org/product/3760122960121/petits-sables-ronds-et-bons-aux-gouttes-de-chocolat-michel-et-augustin
- http://th.openfoodfacts.org/product/8850487049106/%E0%B8%9A%E0%B9%8A%E0%B8%A7%E0%B8%A2%E0%B8%94%E0%B8%AD%E0%B8%87-%E0%B9%81%E0%B8%A1%E0%B9%88%E0%B8%9B%E0%B8%A3%E0%B8%B0%E0%B8%99%E0%B8%AD%E0%B8%A1
- http://uk.openfoodfacts.org/product/5038862238977/innocent-super-smoothie-antioxidant
- http://jp.openfoodfacts.org/product/4540118002726/%E9%9B%AA%E5%A1%A9%E3%81%A1%E3%82%93%E3%81%99%E3%81%93%E3%81%86
- http://world.openfoodfacts.org/product/0000027533048/luxury-christmas-pudding-asda
- http://world.openfoodfacts.org/product/4607005408112/72-%D0%BA%D0%B0%D0%BA%D0%B0%D0%BE-%D0%BF%D0%BE%D0%B1%D0%B5%D0%B4%D0%B0-%D0%B2%D0%BA%D1%83%D1%81%D0%B0
Links
- http://world.openfoodfacts.org/ingredients (will crash your browser)
- http://world.openfoodfacts.org/files/ingredients.20151117.txt (lighter text version of the above)
- https://files.slack.com/files-pri/T02KVRT1Q-F192GGZE3/download/top500ingredients.xls (normalised Excel version for top 500 of the above)
- Ingredients taxonomy (Taxonomisation start)
- Wikidata (We could generate a list from Wikidata)
- List of ingredients for multilingual products: http://world.openfoodfacts.org/language/multilingual/ingredients
- http://ec.europa.eu/food/safety/labelling_nutrition/labelling_legislation/index_en.htm
- http://ec.europa.eu/dgs/health_food-safety/dgs_consultations/food/docs/consult_20150104_allergy-intolerance_guidance.pdf
- http://www.fda.gov/Food/GuidanceRegulation/GuidanceDocumentsRegulatoryInformation/LabelingNutrition/ucm2006828.htm