Ingredients/Parsing: Difference between revisions

From Open Food Facts wiki
No edit summary
m (Text replacement - "Global ingredients taxonomy" to "Ingredients taxonomy")
Tags: Mobile edit Mobile web edit
Line 57: Line 57:
* http://world.openfoodfacts.org/files/ingredients.20151117.txt (lighter text version of the above)  
* http://world.openfoodfacts.org/files/ingredients.20151117.txt (lighter text version of the above)  
* https://files.slack.com/files-pri/T02KVRT1Q-F192GGZE3/download/top500ingredients.xls (normalised Excel version for top 500 of the above)
* https://files.slack.com/files-pri/T02KVRT1Q-F192GGZE3/download/top500ingredients.xls (normalised Excel version for top 500 of the above)
* [[Global ingredients taxonomy]] (Taxonomisation start)
* [[Ingredients taxonomy]] (Taxonomisation start)
* Wikidata (We could generate a list from Wikidata)
* Wikidata (We could generate a list from Wikidata)
* List of ingredients for multilingual products: http://world.openfoodfacts.org/language/multilingual/ingredients
* List of ingredients for multilingual products: http://world.openfoodfacts.org/language/multilingual/ingredients

Revision as of 15:39, 11 June 2020

This page collects what we know about ingredient parsing.


 rietsuiker*, plantaardige olie* (zonnebloem, palm), 13% _hazelnoot_*, 7.5% magere cacaopoeder*, magere _melk_poeder*, emulgator (_soja_lecithine), vanille*.
*Van biologische oorsprong.

EU Regulation based parsing

Comma separator

  • Ingredients are separated by a comma (,) generally followed by a single space
rietsuiker, plantaardige olie
  • They can be separated by other symbols
 • Polyquaternium-67 • Propylene Glycol • 

Dot

Indicates the end of the ingredients list. Can indicate the start of a new main ingredient, usually when followed by a new Ingredient name followed by a colon:

ingredients: .... . Cocosmilk: .... . Curry: .... .

Asterisk(s) ('*') or Hash(es) ('#') at the start of an ingredient

Indicates an annotation for one or more ingredients that have the same number of asterisks at the end. For instance, some items above are indicated as of biological origin

Asterisk(s) or Hash(es) ('#') at the end of an ingredient

Indicates that this ingredient has an annotation (see above)

Parenthesis

Can indicate sub-components, that can also sometimes interpret to an E-Number. For instance: emulgator (sojalecithine) interprets to E322

Can indicate percentage if no other data present (e.g. (34%) in which case the parentheses can be ignored.

Can indicate an alternative name, typically indicated as (=alternative name), which can be ignored (alternate names are in the taxonomys), unless perhaps if the original name is not yet known (then report)

Sometimes alternated with square brackets ([]) in recursive details.

Colon

Can indicate sub-components, similar to parenthesis

Percentage

Indicates the quantity

Order

Items are required to be listed in order of largest to smallest quantity

Some examples

Links