Ingredients analysis and search features extraction: Difference between revisions
No edit summary |
|||
Line 18: | Line 18: | ||
* [[How to improve ingredients analysis]] : lists the concrete actions that can be taken to improve ingredients analysis in a specific language. | * [[How to improve ingredients analysis]] : lists the concrete actions that can be taken to improve ingredients analysis in a specific language. | ||
=== Metrics definition, reporting and monitoring | === Metrics definition, reporting and monitoring === | ||
In order to prioritize the work to improve ingredients analysis and monitor our progress, we have defined quality metrics and created tools to report them. | In order to prioritize the work to improve ingredients analysis and monitor our progress, we have defined quality metrics and created tools to report them. |
Revision as of 13:51, 13 May 2020
Summary
Ingredients analysis and search features extraction is one of the 4 sub-tasks of the Project:Personalized_Search funded by the NGI0 Discovery Fund managed by NlNet.
This page documents the progress made in Q1 and Q2 2020.
Methods and infrastructure
Ingredients analysis has been gradually added to Open Food Facts in a very organic way (the first versions of additives and palm oil detections in French ingredients lists date from 2012). A lot of progress has been made over the years, but ingredients analysis has remained a complex, undocumented, and artisanal effort mostly focused on French, with only one developer coding it and very few people able to improve it.
The first focus of the project as thus been to develop methods and infrastructure to industrialize ingredients analysis and bring it to the next level for many more languages.
Documentation
Ingredients analysis is a complex process with several tasks that are done in sequence, with the output of each task becoming the input of the next task. We have greatly improved the documentation to make it easier for more people to contribute improvements to the code, data and tests of each task.
- Ingredients Extraction and Analysis : an high-level diagram + detailed information on how we perform ingredients extraction and analysis.
- How to improve ingredients analysis : lists the concrete actions that can be taken to improve ingredients analysis in a specific language.
Metrics definition, reporting and monitoring
In order to prioritize the work to improve ingredients analysis and monitor our progress, we have defined quality metrics and created tools to report them.
- Ingredients Analysis Quality : definition of the ingredients analysis quality metrics and instructions to retrieve those metrics for a specific sub-set of products (e.g. all products sold in a specific country).
Visibility of ingredients analysis results and internals
To make it easier for more people to find, report and debug issues with ingredients analysis, we are now showing the result of ingredient analysis (whether a product is vegetarian, vegan or palm oil free) directly on the product page of the Open Food Facts web site, with a link to show exactly how we have parsed and analyzed the ingredients list.
Testing tool
In addition to seeing the details of the ingredients analysis for a specific product, users can also see the details of the analysis of an ingredient list they can type in, copy/paste, and modify in a simple web form. This tools greatly facilitate debugging and creating minimal tests to reproduce issues.
Assessment of the Ingredients Analysis Quality for major EU languages
In early March 2020, we recorded the quality metrics of ingredients analysis for major European languages, so that we could have a point of reference that we can compare to at the end of the project.
Improvements to ingredients analysis
The quality of ingredients analysis directly depends on the quality of the input data (clean ingredients lists), the parsing features (whether we can recognize a given wording structure, from simple enumerations like "X and Y" to much more complex formulations), and the supporting data used by those features (the most important one being our multilingual ingredients taxonomy).
For this project, we worked to improve all 3 aspects.