1,082
edits
No edit summary |
No edit summary |
||
Line 110: | Line 110: | ||
** Test sets need to be created | ** Test sets need to be created | ||
*** Run spellcheckers on actual ingredients lists from OFF, review corrections | *** Run spellcheckers on actual ingredients lists from OFF, review corrections | ||
== Ingredients analysis == | |||
=== Steps for ingredients analysis === | |||
==== Ingredients pre-parsing ==== | |||
* The ingredients list is transformed to make parsing easier | |||
** Remove / normalize strange characters | |||
** De-abbreviate abbreviations | |||
** Split enumerations | |||
*** e.g. "Vitamins A, B et C" -> Vitamine A, Vitamine B, Vitamine C | |||
** Additives E-numbers normalization (E330, e330, e-330, INS 330, SIN330 etc.) | |||
** Additives classes + additive splits | |||
*** e.g. "Colour caramel" -> Colour: Caramel | |||
** Split some "A of B, C and D" (but not all...) | |||
*** e.g. "Huile de palme, colza et tournesol" -> Huile de palme, huile de colza, huile de tournesol | |||
* Current solution | |||
** Perl code and regular expressions | |||
*** lib/ProductOpener/Ingredients.pm - preparse_ingredients_text() |