Jump to content

Ingredients analysis and search features extraction: Difference between revisions

+ Ingredients category
(+ Ingredients category)
 
(11 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Summary ==
== Summary ==


Ingredients analysis and search features extraction is one of the 4 sub-tasks of the [[Project:Personalized_Search]] funded by the NGI0 Discovery Fund managed by NlNet.
Ingredients analysis and search features extraction is the first of the 4 sub-tasks of the [[Project:Personalized_Search]] funded by the NGI0 Discovery Fund managed by NlNet.


This page documents the progress made in Q1 and Q2 2020.
This page documents the progress made in Q1 and Q2 2020.
Line 7: Line 7:
== Methods and infrastructure ==
== Methods and infrastructure ==


Ingredients analysis has been gradually added to Open Food Facts in a very organic way (the first versions of additives and palm oil detections in French ingredients lists date from 2012). A lot of progress has been made over the years, but ingredients analysis has remained a complex, undocumented, and artisanal effort mostly focused on French, with only one developer coding it and very few people able to improve it.
[[Ingredients Extraction and Analysis]] has been gradually added to Open Food Facts in a very organic way (the first versions of additives and palm oil detections in French ingredients lists date from 2012). A lot of progress has been made over the years, but ingredients analysis has remained a complex, undocumented, and artisanal effort mostly focused on French, with only one developer coding it and very few people able to improve it.


The first focus of the project as thus been to develop methods and infrastructure to industrialize ingredients analysis and bring it to the next level for many more languages.
The first focus of the project as thus been to develop methods and infrastructure to industrialize ingredients analysis and bring it to the next level for many more languages.
Line 23: Line 23:


* [[Ingredients Analysis Quality]] : definition of the ingredients analysis quality metrics and instructions to retrieve those metrics for a specific sub-set of products (e.g. all products sold in a specific country).
* [[Ingredients Analysis Quality]] : definition of the ingredients analysis quality metrics and instructions to retrieve those metrics for a specific sub-set of products (e.g. all products sold in a specific country).
We have also set up a monitoring system so that we can see the evolution of ingredients analysis metrics over time:
[[File:Graphana-ingredients.png]]


=== Visibility of ingredients analysis results and internals ===
=== Visibility of ingredients analysis results and internals ===
Line 29: Line 33:


[[File:Ingredients-analysis-example.png|border]]
[[File:Ingredients-analysis-example.png|border]]
-> [https://world.openfoodfacts.org/product/0021130016556/select-dark-chocolate-with-orange-cranberry-signature See it live on the Open Food Facts web site]


=== Testing tool ===
=== Testing tool ===
Line 40: Line 46:
In early March 2020, we recorded the quality metrics of ingredients analysis for major European languages, so that we could have a point of reference that we can compare to at the end of the project.
In early March 2020, we recorded the quality metrics of ingredients analysis for major European languages, so that we could have a point of reference that we can compare to at the end of the project.


* [[Ingredients Analysis Quality Evaluation - March 2020]]
* [[Ingredients Analysis Quality Evaluation - March 2020 - May 2020]]


== Improvements to ingredients analysis ==
== Improvements to ingredients analysis ==
Line 121: Line 127:
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3357 Handling enumerations of vitamins in Finnish]
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3357 Handling enumerations of vitamins in Finnish]
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3254 A better match for the signs like * that indicate organic ingredients]
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3254 A better match for the signs like * that indicate organic ingredients]
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3218] Better handling of abbreviations like L. Acidophilus]
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3218 Better handling of abbreviations like L. Acidophilus]
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3003 Better handling for symbols like °]
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3003 Better handling for symbols like °]
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3034 Ingredients parsing improvements and test web interface]
* [https://github.com/openfoodfacts/openfoodfacts-server/pull/3034 Ingredients parsing improvements and test web interface]
Line 127: Line 133:
== Results of improvements ==
== Results of improvements ==


* [[Ingredients Analysis Quality Evaluation - May 2020]]
In mid May 2020, we recorded the quality metrics for ingredients analysis so that we could compare them to the early March 2020 metrics, and measure the impact of the improvements we made.
 
* [[Ingredients Analysis Quality Evaluation - March 2020 - May 2020]]
 
We achieved significant improvements in both ingredients recognition and ingredients analysis in all tested languages. The link above has all the details and metrics country by country.
 
== Conclusion ==
 
From March to May 2020, we improved a lot ingredients recognition for several languages (mainly Dutch, German, English, Finnish, French, Italian and Spanish) and reduced the number of unrecognized ingredients.
 
Those improvements in ingredients recognition translated into very significant improvements in ingredients analysis (whether a product contains palm oil and is suitable for vegans and vegetarians). The result of this analysis is already visible on the Open Food Facts app and web site, and it will also be used for search filtering and ranking for the Personalized Search project.


== Report, dissemination and next steps ==
Those improvements are the result of different activities: cleaning the ingredients lists, improving and translating the ingredients taxonomy, creating an ingredient processing taxonomy and processing parsing feature, and development and deployment of many small parsing features and improvements.


=== Report ===
While we were doing those improvements, we also invested time and efforts to improve the methods and infrastructure (documentation, metrics, testing tools etc.) to make it easier and more efficient to contribute more improvements to ingredients analysis in existing languages, and to bring it to more languages.


=== Blog post ===
== Next steps for ingredients analysis ==


=== Next steps ===
There is still a lot of room to improve ingredients analysis and make it available in more languages.


=== Call for help to continue to improve ingredients analysis in more and more languages ===
If you would like to help, please join us in the [https://openfoodfacts.slack.com/archives/C06A7LENM #ingredients channel on the Open Food Facts Slack]. (Slack is the discussion tool we use to collaborate with project participants all around the world, [https://slack.openfoodfacts.org follow this link to join us on Slack]).


[[Category:Project:Personalized_Search]]
[[Category:Project:Personalized_Search]]
[[Category:ProductOpener]]
[[Category:Ingredients]]