Nutrition facts table data extraction: Difference between revisions
Raphael0202 (talk | contribs) m (→Planning) |
No edit summary |
||
(7 intermediate revisions by 3 users not shown) | |||
Line 19: | Line 19: | ||
== Data sets == | == Data sets == | ||
* https://github.com/openfoodfacts/openfoodfacts-ai/blob/master/data-sets.md | * [https://github.com/openfoodfacts/openfoodfacts-ai/blob/master/data-sets.md#nutrition-tables-cropping-and-nutrition-facts-extraction https://github.com/openfoodfacts/openfoodfacts-ai/blob/master/data-sets.md] | ||
= How the model fits in the | = How the model fits in the Open Food Facts infrastructure = | ||
[[File:Nutrition Facts extraction flow.svg]] | [[File:Nutrition Facts extraction flow.svg]] | ||
Line 41: | Line 41: | ||
* Scoring des méthodes de détection de tableaux nutritionnels [https://drive.google.com/open?id=1-bo0v7eoRMtemZjcmqVzW0-0YMZaSOkB PDF] [https://drive.google.com/open?id=1KJ6H1HqHSHVBk3mWjEZZLLpcCt5whe6N docx] | * Scoring des méthodes de détection de tableaux nutritionnels [https://drive.google.com/open?id=1-bo0v7eoRMtemZjcmqVzW0-0YMZaSOkB PDF] [https://drive.google.com/open?id=1KJ6H1HqHSHVBk3mWjEZZLLpcCt5whe6N docx] | ||
* Analyse approfondie sur la détection | * Analyse approfondie sur la détection de tableaux nutritionnels [https://drive.google.com/open?id=1nPxOGFgjKaKuUcnVC0nbNAijt1al9rvo pdf] | ||
de tableaux nutritionnels [https://drive.google.com/open?id=1nPxOGFgjKaKuUcnVC0nbNAijt1al9rvo pdf] | |||
* Table extraction (V1): https://github.com/openfoodfacts/off-nutrition-table-extractor. Mainly focus on the nutritional table detection part, the extraction process remains quite simple. | * Table extraction (V1): https://github.com/openfoodfacts/off-nutrition-table-extractor. Mainly focus on the nutritional table detection part, the extraction process remains quite simple. | ||
* Table extraction (v2): https://github.com/cgandon/openfoodfacts-nutriments | * Table extraction (v2): https://github.com/cgandon/openfoodfacts-nutriments | ||
* Work by Sadok and Yichen as part of the Microsoft ShareAI program | |||
** TableNet | |||
** GraphNet | |||
= Other approaches = | = Other approaches = | ||
Line 63: | Line 65: | ||
* Test of an end-to-end model, through an API as a starter: Form Recognizer or Google Document AI API. If results are promising, we develop in-house end-to-end model, otherwise a layout model. | * Test of an end-to-end model, through an API as a starter: Form Recognizer or Google Document AI API. If results are promising, we develop in-house end-to-end model, otherwise a layout model. | ||
* | * Thorough literature review | ||
* Architecture choice | * Architecture choice | ||
* Annotation campaign | * Annotation campaign | ||
Line 72: | Line 74: | ||
Yichen: test of image preprocessing before Form Recognizer (layout model), test "manual" approaches. | Yichen: test of image preprocessing before Form Recognizer (layout model), test "manual" approaches. | ||
* tested a pre-processing method (ported from Mathematica to python) | |||
* OCR results worsen after binarization | |||
* testing method to morph deformed images (by detecting horizontal and vertical lines) | |||
* "manual" approaches: compare positions of bounding boxes of OCR, use borders if we have some | |||
** manual horizontal or vertical scans of bounding boxes, using the angles of bounding boxes | |||
** robotoff has python classes for importing/analyzing Google Cloud Vision | |||
Raphaël: | Raphaël: Literature review (to be continued) | ||
= Notes 24/07 = | |||
* Ramzi: tested end-to-end model with learning | |||
** missing one step, results expected next week | |||
* Yasmine + Yichen: working on pre-processing | |||
** Testing cylinders distortions -> flat rectangle | |||
*** open cv | |||
*** Tesseract performance very poor | |||
Next week: | |||
* Ramzi | |||
** finish test end-to-end model | |||
** increase size of data set to get significant results | |||
* Yasmine + Yichen | |||
** Cylinder unwrapping (classic + deep learning models) | |||
*** review litterature | |||
*** create data set for Ramzi using paid API "perfect label" | |||
** table structure detection : associate bounding boxes to key value pairs | |||
* All | |||
** Create test set with cylindric photos + unwrapped photos | |||
= Resources = | = Resources = | ||
Line 82: | Line 111: | ||
** Step 2: Download Lobe - For Mac: https://aka.ms/DownloadLobeMac - For PC: https://aka.ms/DownloadLobeWindows | ** Step 2: Download Lobe - For Mac: https://aka.ms/DownloadLobeMac - For PC: https://aka.ms/DownloadLobeWindows | ||
* [https://flow.microsoft.com/en-us/ai-builder/ AI Builder] - import your photos right away into a Power App (or a Power Automate flow) so you can use the output in any real world application. | * [https://flow.microsoft.com/en-us/ai-builder/ AI Builder] - import your photos right away into a Power App (or a Power Automate flow) so you can use the output in any real world application. | ||
= Quality metrics = | |||
* [https://docs.google.com/document/d/1sJNZOq2Gs6gnNSCV2hPlnl4-3Z_SdQFQuorqoRw1cjA/edit Quality Metrics & Golden sets] | |||
= Get in touch = | |||
{{Box | |||
| 1 = Slack channel | |||
| 2 = [https://openfoodfacts.slack.com/messages/CNMKJHFT2/ #ai-nutrition-table] | |||
}} |
Latest revision as of 18:01, 16 August 2024
Introduction
The Nutrition facts table data extraction project aims to automatically extract the nutrition facts values from photos of nutrition facts table.
Extracting nutritional facts from images would allow to fill values when nutritional facts are missing and to check values filled by users.
Why it's very important
- We need nutrition facts to compute nutritional quality with the Nutri-Score
- It takes 2 minutes per product to type in nutrition values, a process that is also tedious and error-prone
- 1M products x 2 minutes = 10 years of work full time without week-ends or vacations
- Being able to add complete data for a product quickly is key to attain a critical mass of products in new countries
Data, evaluation and test sets
Data sets
How the model fits in the Open Food Facts infrastructure
Approaches
Two approaches have been tested so far:
- Regexes used on the OCR output. A new /predict/nutrient endpoint has been added to Robotoff. This approach works best when nutritional information are not displayed in a table (“text” only).
- A clustering approach that tries to estimate the number of columns and lines of the nutritional tables using K-means or DBSCAN algorithms. Not integrated to Robotoff.
Previous attempts
- Scoring des méthodes de détection de tableaux nutritionnels PDF docx
- Analyse approfondie sur la détection de tableaux nutritionnels pdf
- Table extraction (V1): https://github.com/openfoodfacts/off-nutrition-table-extractor. Mainly focus on the nutritional table detection part, the extraction process remains quite simple.
- Table extraction (v2): https://github.com/cgandon/openfoodfacts-nutriments
- Work by Sadok and Yichen as part of the Microsoft ShareAI program
- TableNet
- GraphNet
Other approaches
The previous attempts are not state of the art for the table extraction task. Other approaches we may consider:
- Detection of the table structure: identifying rows, columns and cells of the table. Most of the time rows corresponds to nutrition labels (protein, carbohydrate, energy,...) and columns to the quantity (100g, per portion, % of daily intake). By using pattern matching, we can label the column and rows and extract the values of the nutritional table for 100g or per portion.
- Using an end-to-end model to directly extract nutritional values. This may be done by scoring candidates for each field (protein_100g, protein_serving, carbohydrate_100g,...), and selecting the highest scoring token as the field value. This is similar to Microsoft Form Recognizer API or Google Document API.
Each approach requires different kinds of annotation, so chosing the most effective approach at the beginning of the project is crucial.
Source: https://docs.google.com/drawings/d/1YKgnTEX1RBgsMQpO4JIt94sMwf1IC_fPWIvPnUjswTg/edit?usp=sharing
Planning
- Test of an end-to-end model, through an API as a starter: Form Recognizer or Google Document AI API. If results are promising, we develop in-house end-to-end model, otherwise a layout model.
- Thorough literature review
- Architecture choice
- Annotation campaign
- Experimentation: model training and scoring
- Integration to Robotoff/Hunger Games
Ramzi: test of end-to-end model with Form Recognizer API.
Yichen: test of image preprocessing before Form Recognizer (layout model), test "manual" approaches.
- tested a pre-processing method (ported from Mathematica to python)
- OCR results worsen after binarization
- testing method to morph deformed images (by detecting horizontal and vertical lines)
- "manual" approaches: compare positions of bounding boxes of OCR, use borders if we have some
- manual horizontal or vertical scans of bounding boxes, using the angles of bounding boxes
- robotoff has python classes for importing/analyzing Google Cloud Vision
Raphaël: Literature review (to be continued)
Notes 24/07
- Ramzi: tested end-to-end model with learning
- missing one step, results expected next week
- Yasmine + Yichen: working on pre-processing
- Testing cylinders distortions -> flat rectangle
- open cv
- Tesseract performance very poor
- Testing cylinders distortions -> flat rectangle
Next week:
- Ramzi
- finish test end-to-end model
- increase size of data set to get significant results
- Yasmine + Yichen
- Cylinder unwrapping (classic + deep learning models)
- review litterature
- create data set for Ramzi using paid API "perfect label"
- table structure detection : associate bounding boxes to key value pairs
- Cylinder unwrapping (classic + deep learning models)
- All
- Create test set with cylindric photos + unwrapped photos
Resources
- Trove - marketplace platform where AI developers who need photos can create projects looking for specific types of photos crowdsource photos from photo takers.
- Lobe - train a custom machine learning model using a simple visual interface with no code
- Step 1: Request an invite code from the team: lobeai@microsoft.com
- Step 2: Download Lobe - For Mac: https://aka.ms/DownloadLobeMac - For PC: https://aka.ms/DownloadLobeWindows
- AI Builder - import your photos right away into a Power App (or a Power Automate flow) so you can use the output in any real world application.
Quality metrics
Get in touch
|
---|