Jump to content

Ingredients Extraction and Analysis: Difference between revisions

ingredients percent estimation
(ingredients percent estimation)
Line 186: Line 186:
** Perl code and regular expressions + multilingual ingredients taxonomy
** Perl code and regular expressions + multilingual ingredients taxonomy
*** lib/ProductOpener/Ingredients.pm - extract_ingredients_from_text()
*** lib/ProductOpener/Ingredients.pm - extract_ingredients_from_text()
Ingredient percent analysis
* Goal: for each ingredient and sub-ingredient, we compute the minimum and maximum absolute percent
* Constraints:
** Some ingredients have a % specified
** Ingredients are listed in descending order of quantity
* Current solution
** Perl code in lib/ProductOpener/Ingredients.pm - compute_ingredients_percent_values()
** We use a recursive function to go through ingredients, sub-ingredients, sub-sub-ingredients etc.
** For each list of ingredients (or sub-ingredients), we assign starting min, max or specific percent values that comply with the constraints
*** If a % is specified, we use it
*** otherwise the min is set to 0 and the max to the max of the parent ingredient (or 100% if there is no parent)
** We then apply logic rules based on the constraints:
*** The max of an ingredient must be lower or equal to the max of the ingredient that appears before
*** The max of an ingredient must be lower or equal to the  total max minus the sum of the minimums of all ingredients that appear before
*** The max of the 3rd ingredient has a max inferior or equal to the max of the 1st ingredient divided by 2 (and similarly the 4th ingredient has a max inferior or equal to the max of the 1st ingredient divided by 3, etc.)
*** The min of an ingredient must be greater or equal to the total min minus the sum of the maximums of all ingredients that appear before, divided by the number of ingredients that appear after + the current ingredient
*** The min of an ingredient must be greater or equal to the mean of an ingredient that appears after
*** The max of an ingredient must be lower or equal to the total max minus the sum of the minimums of all the ingredients after, divided by the number of ingredients that appear before + the current ingredient
*** The min of the first ingredient in the list must be greater or equal to the total min minus the sum of the maximums of all the ingredients after
** We then reapply all those rules as long as we can apply a new one, based on new min and max values of the ingredients
* Issues:
** We may end up with impossible values if the ingredients list was not analyzed correctly (e.g. wrong nesting, bad % etc.)
*** in that case we delete the min and max values


Result:
Result:
* Add /api/v0 to get JSON results through API: https://fr.openfoodfacts.org/api/v0/produit/5000112558265/coca-cola-zero
* Add /api/v2 to get JSON results through API: [https://fr.openfoodfacts.org/api/v0/produit/5000112558265/coca-cola-zero https://fr.openfoodfacts.org/api/v2/produit/5000112558265/coca-cola-zero]
* ingredients:
*  


<pre>
* ingredients: [
ingredients: [
** {
{
*** id: "en:carbonated-water",
vegetarian: "yes",
*** percent_estimate: 56.25,
text: "Eau gazéifiée",
*** percent_max: 100,
id: "en:carbonated-water",
*** percent_min: 12.5,
rank: 1,
*** text: "Eau gazéifiée",
vegan: "yes"
*** vegan: "yes",
},
*** vegetarian: "yes" },
{
** {
rank: 2,
*** id: "en:colour",
id: "en:colour",
*** ingredients: [
text: "colorant"
**** {
},
***** id: "en:e150d",
{
***** percent_estimate: 21.875,
vegetarian: "yes",
***** percent_max: 50,
id: "en:e150d",
***** percent_min: 0,
rank: 3,
***** text: "e150d",
text: "e150d",
***** vegan: "yes",
vegan: "yes"
***** vegetarian: "yes" }],
},
*** percent_estimate: 21.875,
{
*** percent_max: 50,
text: "acidifiants",
*** percent_min: 0,
rank: 4,
*** text: "colorant" },
id: "en:acid"
** {
},
*** id: "en:acid",
{
*** ingredients: [
vegan: "yes",
**** {
vegetarian: "yes",
***** id: "en:e338",
rank: 5,
***** percent_estimate: 10.9375,
id: "en:e338",
***** percent_max: 33.3333333333333,
text: "acide phosphorique"
***** percent_min: 0,
},
***** text: "acide phosphorique",
{
***** vegan: "yes",
text: "citrate de sodium",
***** vegetarian: "yes" }],
rank: 6,
*** percent_estimate: 10.9375,
id: "en:sodium-citrate"
*** percent_max: 33.3333333333333,
},
*** percent_min: 0,
{
*** text: "acidifiants" },
id: "en:sweetener",
** {
rank: 7,
*** id: "en:sodium-citrate",
text: "édulcorants"
*** percent_estimate: 5.46875,
},
*** percent_max: 25,
{
*** percent_min: 0,
vegan: "yes",
*** text: "citrate de sodium" },
id: "en:e951",
** {
rank: 8,
*** id: "en:sweetener",
text: "aspartame",
*** ingredients: [
vegetarian: "yes"
**** {
},
***** id: "en:e951",
{
***** percent_estimate: 2.734375,
vegetarian: "yes",
***** percent_max: 20,
text: "acésulfame-K",
***** percent_min: 0,
id: "en:e950",
***** text: "aspartame",
rank: 9,
***** vegan: "yes",
vegan: "yes"
***** vegetarian: "yes" }],
},
*** percent_estimate: 2.734375,
{
*** percent_max: 20,
vegetarian: "maybe",
*** percent_min: 0,
rank: 10,
*** text: "édulcorants" },
id: "en:natural-flavouring",
** {
text: "arômes naturels",
*** id: "en:e950",
vegan: "maybe"
*** percent_estimate: 1.3671875,
},
*** percent_max: 16.6666666666667,
{
*** percent_min: 0,
id: "en:vegetable-extract",
*** text: "acésulfame-K",
rank: 11,
*** vegan: "yes",
text: "extraits végétaux"
*** vegetarian: "yes" },
},
** {
{
*** id: "en:natural-flavouring",
text: "dont caféine",
*** ingredients: [
id: "en:caffeine",
**** {
rank: 12,
***** id: "en:extract",
vegetarian: "yes",
***** labels: "en:vegan",
vegan: "yes"
***** percent_estimate: 0.68359375,
}
***** percent_max: 14.2857142857143,
],
***** percent_min: 0,
</pre>
***** text: "extraits",
***** vegan: "en:yes",
***** vegetarian: "en:yes" }],
*** percent_estimate: 0.68359375,
*** percent_max: 14.2857142857143,
*** percent_min: 0,
*** text: "arômes naturels",
*** vegan: "maybe",
*** vegetarian: "maybe" },
** {
*** id: "en:caffeine",
*** percent_estimate: 0.68359375,
*** percent_max: 12.5,
*** percent_min: 0,
*** text: "dont caféine",
*** vegan: "yes",
*** vegetarian: "yes" }],


== End to end metrics ==
== End to end metrics ==