Ingredients analysis and search features extraction: Difference between revisions

From Open Food Facts wiki
No edit summary
Line 12: Line 12:


=== Documentation ===
=== Documentation ===
* [[Ingredients Extraction and Analysis]] : an high-level diagram + detailed information on how we perform ingredients extraction and analysis.
* [[How to improve ingredients analysis]] : lists the concrete actions that can be taken to improve ingredients analysis in a specific language.


=== Metrics definition ===
=== Metrics definition ===
* [[Ingredients Analysis Quality]]


=== Metrics reporting and exploring ===
=== Metrics reporting and exploring ===
Line 24: Line 29:


== Assessment of the Ingredients Analysis Quality for major EU languages ==
== Assessment of the Ingredients Analysis Quality for major EU languages ==
* [[Ingredients Analysis Quality]]


== Improvements to ingredients analysis ==
== Improvements to ingredients analysis ==

Revision as of 10:48, 13 May 2020

Summary

Ingredients analysis and search features extraction is one of the 4 sub-tasks of the Project:Personalized_Search funded by the NGI0 Discovery Fund managed by NlNet.

This page documents the progress made in Q1 and Q2 2020.

Methods and infrastructure

Ingredients analysis has been gradually added to Open Food Facts in a very organic way (the first versions of additives and palm oil detections in French ingredients lists date from 2012). A lot of progress has been made over the years, but ingredients analysis has remained a complex, undocumented, and artisanal effort mostly focused on French, with only one developer coding it and very few people able to improve it.

The first focus of the project as thus been to develop methods and infrastructure to industrialize ingredients analysis and bring it to the next level for many more languages.

Documentation

Metrics definition

Metrics reporting and exploring

Metrics monitoring

Visibility of ingredients analysis results and internals

Testing tool

Assessment of the Ingredients Analysis Quality for major EU languages

Improvements to ingredients analysis

Ingredients list cleaning

Wrong languages

Ingredients list cropping and new OCR extraction

Spelling correction

Ingredients taxonomy improvements

Ingredients processing taxonomy

Ingredients parsing features

Report, dissemination and next steps

Report

Blog post

Next steps

Call for help to continue to improve ingredients analysis in more and more languages