Jump to content

Ingredients Extraction and Analysis: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 49: Line 49:


=== Steps for ingredients lists ===
=== Steps for ingredients lists ===
Sample product for examples: https://fr.openfoodfacts.org/produit/5000112558265/coca-cola-zero


==== Picture taking ====
==== Picture taking ====
Line 54: Line 56:
* Taken with mobile app, uploaded to OFF server
* Taken with mobile app, uploaded to OFF server


Result:
[[File:21.jpg|200px|none|Ingredients photo]]
* https://fr.openfoodfacts.org/images/products/500/011/255/8265/21.jpg
 
[[File:21.jpg|200px|thumb]]


==== Ingredients list cropping ====
==== Ingredients list cropping ====
Line 65: Line 64:
* Or done on web site at a later time, possibly by another user
* Or done on web site at a later time, possibly by another user
** Cropping slightly easier than on mobile
** Cropping slightly easier than on mobile
[[File:Ingredients fr.115.full.jpg|200px|none|Cropped ingredients]]


==== OCR ====
==== OCR ====


* Launched after cropping, done by the server which calls Google Cloud Vision
* Launched after cropping, done through the server
* Cloud Vision returns a JSON object which is stored on the server
* Current solution:
** Google Cloud Vision
** Cloud Vision returns a JSON object which is stored on the server
 
Result:
* https://static.openfoodfacts.org/images/products/500/011/255/8265/ingredients.22.full.json
* "rédients:eaugazeitiee colorant:caramelE15M difiants: acide phosphorique et Citrate de sodium: édulcorants: aspartame etacésulfame-K;extraitsvegétaux Contientunesourcedephénylalanine."
* Your mileage will vary a lot


==== Ingredients list cutting ====
==== Ingredients list cutting ====
Line 116: Line 124:
*** Run spellcheckers on actual ingredients lists from OFF, review corrections
*** Run spellcheckers on actual ingredients lists from OFF, review corrections


Desired result:
* "Eau gazéifiée ; colorant : E150d ; acidifiants : acide phosphorique, citrate de sodium ; édulcorants : aspartame, acésulfame-K ; arÎmes naturels (extraits végétaux), dont caféine."


== Ingredients analysis ==
== Ingredients analysis ==
Line 154: Line 164:
** Perl code and regular expressions + multilingual ingredients taxonomy
** Perl code and regular expressions + multilingual ingredients taxonomy
*** lib/ProductOpener/Ingredients.pm - extract_ingredients_from_text()
*** lib/ProductOpener/Ingredients.pm - extract_ingredients_from_text()
Result:
* Add /api/v0 to get JSON results through API: https://fr.openfoodfacts.org/api/v0/produit/5000112558265/coca-cola-zero
* ingredients:
<pre>
ingredients: [
{
vegetarian: "yes",
text: "Eau gazéifiée",
id: "en:carbonated-water",
rank: 1,
vegan: "yes"
},
{
rank: 2,
id: "en:colour",
text: "colorant"
},
{
vegetarian: "yes",
id: "en:e150d",
rank: 3,
text: "e150d",
vegan: "yes"
},
{
text: "acidifiants",
rank: 4,
id: "en:acid"
},
{
vegan: "yes",
vegetarian: "yes",
rank: 5,
id: "en:e338",
text: "acide phosphorique"
},
{
text: "citrate de sodium",
rank: 6,
id: "en:sodium-citrate"
},
{
id: "en:sweetener",
rank: 7,
text: "Ă©dulcorants"
},
{
vegan: "yes",
id: "en:e951",
rank: 8,
text: "aspartame",
vegetarian: "yes"
},
{
vegetarian: "yes",
text: "acésulfame-K",
id: "en:e950",
rank: 9,
vegan: "yes"
},
{
vegetarian: "maybe",
rank: 10,
id: "en:natural-flavouring",
text: "arĂŽmes naturels",
vegan: "maybe"
},
{
id: "en:vegetable-extract",
rank: 11,
text: "extraits végétaux"
},
{
text: "dont caféine",
id: "en:caffeine",
rank: 12,
vegetarian: "yes",
vegan: "yes"
}
],
</pre>


== End to end metrics ==
== End to end metrics ==