Nutrition facts table data extraction: Difference between revisions

From Open Food Facts wiki
(Created page with "= Introduction = The Nutrition facts table data extraction project aims to automatically extract the nutrition facts values from photos of nutrition facts table. Extracting...")
 
No edit summary
Line 4: Line 4:


Extracting nutritional facts from images would allow to fill values when nutritional facts are missing and to check values filled by users.
Extracting nutritional facts from images would allow to fill values when nutritional facts are missing and to check values filled by users.
== Why it's very important ==
* We need nutrition facts to compute nutritional quality with the Nutri-Score
* It takes 2 minutes per product to type in nutrition values, a process that is also tedious and error-prone
** 1M products x 2 minutes = 10 years of work full time without week-ends or vacations
* Being able to add complete data for a product quickly is key to attain a critical mass of products in new countries


* [https://docs.google.com/presentation/d/1GQ5wBRtu48GJCEg6Ir4BddNnPqS7Ui3kmeCZ_TaRBi8/edit#slide=id.g70e2e194c1_2_97 Presentation of the problem in French (for Share AI project)]
* [https://docs.google.com/presentation/d/1GQ5wBRtu48GJCEg6Ir4BddNnPqS7Ui3kmeCZ_TaRBi8/edit#slide=id.g70e2e194c1_2_97 Presentation of the problem in French (for Share AI project)]

Revision as of 10:35, 19 June 2020

Introduction

The Nutrition facts table data extraction project aims to automatically extract the nutrition facts values from photos of nutrition facts table.

Extracting nutritional facts from images would allow to fill values when nutritional facts are missing and to check values filled by users.

Why it's very important

  • We need nutrition facts to compute nutritional quality with the Nutri-Score
  • It takes 2 minutes per product to type in nutrition values, a process that is also tedious and error-prone
    • 1M products x 2 minutes = 10 years of work full time without week-ends or vacations
  • Being able to add complete data for a product quickly is key to attain a critical mass of products in new countries

Data, evaluation and test sets

Data sets

How the model fits in the OFF infrastructure

Approaches

Two approaches have been tested so far:

  • Regexes used on the OCR output. A new /predict/nutrient endpoint has been added to Robotoff. This approach works best when nutritional information are not displayed in a table (“text” only).
  • A clustering approach that tries to estimate the number of columns and lines of the nutritional tables using K-means or DBSCAN algorithms. Not integrated to Robotoff.

Previous attempts

  • Scoring des méthodes de détection de tableaux nutritionnels PDF docx
  • Analyse approfondie sur la détection

de tableaux nutritionnels pdf