Nutrition facts table data extraction/GSoC

From Open Food Facts wiki

Project 2: Automatically extract nutrition facts data from photos food products

Description:

Nutrition Facts is the most useful information of food products (e.g. to compute nutritional scores such as the Nutri-Score to compare the nutritional quality of food products). It’s also the most tedious for users to enter manually, as it’s easy to make mistakes when inputting lots of numbers on a phone.

The project goal is to make it possible to automatically extract nutrition facts data from a photo of the nutrition facts table, and save our contributors thousands of hours of manual work.

It should be part of Robotoff, our system to extract valuable information out of the many images contributors send us. Robotoff uses a mix of classic techniques (such as regular expressions on the output of Google Cloud Vision OCR) and machine learning to generate “guesses” that it can apply automatically if it is confident about them, or ask users for a validation.

Expected outcomes:

  • Create a model to automatically recognize and crop nutrition facts table from a photo of a product packaging.
  • Create a model to extract the individual nutrition facts from the nutrition facts photo
  • Create tests for the models.
  • Integrate the model in Robotoff

Notes:

  • Accuracy is key here as this information is precise, and needs to be. In this area, better have less data, but have them right. There have been previous attempts to solve this problem but they are not operational.
  • The proposed algorithm may use OCR (we use Google Cloud Vision OCR) as part of the process.
  • The eventual complexity lies in the layout analysis, as nutritional information is often presented as a table with multiple columns, row styles, etc.
  • The applicant must not just concentrate on the research work, but also take care of the integration in robotoff, including eventual “business” rules to apply post extraction (on robotoff side only).
  • Technical stack is Python and Tensorflow (we use Tensorflow Serving), but other tools might be included in agreement with Open Food Facts team.
  • Slack channels: #robotoff
  • Potential mentors: Alex Garel, RaphaĂŤl B
  • Project duration: 350 hours
  • Github: robotoff
  • Skills required: Machine Learning, Python
  • Difficulty rating: Hard

Additional documents