Student projects/2018/ENSAE Classification and Error Detection: Difference between revisions

From Open Food Facts wiki
No edit summary
 
(4 intermediate revisions by 2 users not shown)
Line 10: Line 10:


Participants:
Participants:
* Laeticia
* Laetitia
* Zoé
* Zoé


Line 16: Line 16:


Most of the products photos and data on https://world.openfoodfacts.org is crowdsourced by users who scan product barcodes, use the Open Food Facts mobile app to send photos of the product, ingredients and nutrition facts, and the Open Food Facts web site to input the corresponding data. This project aims to find errors in data entered by users, and to speed up data entry by automatically classifying products or making suggestions to contributors.
Most of the products photos and data on https://world.openfoodfacts.org is crowdsourced by users who scan product barcodes, use the Open Food Facts mobile app to send photos of the product, ingredients and nutrition facts, and the Open Food Facts web site to input the corresponding data. This project aims to find errors in data entered by users, and to speed up data entry by automatically classifying products or making suggestions to contributors.
=== Classification ===
Automatically classify products into product categories, brands, labels


=== Error detection ===
=== Error detection ===
Line 21: Line 27:
Find errors in values entered by users for the nutrion facts (energy, fat, carbohydrates, proteins, salt etc.)
Find errors in values entered by users for the nutrion facts (energy, fat, carbohydrates, proteins, salt etc.)


=== Classification ===


Automatically classify products into product categories, brands, labels


== Data ==
== Data ==
* https://world.openfoodfacts.org/data




== Approaches, algorithms etc. ==
== Approaches, algorithms etc. ==
(I just listed some initial ideas -- Stephane)
===  Classification ===
* use words extracted from OCR on product photos to classify products into categories, brands and labels
=== Error detection ===
* cluster products by category to find outliers
== Measuring results ==
== Source code ==
Open Food Facts uses Github for source control: https://github.com/openfoodfacts
We have some Python clients and libraries:
* https://github.com/openfoodfacts/openfoodfacts-python
* https://github.com/openfoodfacts/OpenFoodFacts-APIRestPython
== Misc. ==
* https://en.wikipedia.org/wiki/Weka_(machine_learning)
[[Category:Data quality]]
[[Category:Machine learning]]
[[Category:Robotoff]]

Latest revision as of 10:14, 7 August 2024

Error Detection and Automatic Classification

This wiki page is to document the Data Science project with ENSAE students (February - June 2018)

To edit this page:

  1. Create an account on https://world.openfoodfacts.org and log in on Open Food Facts with this account
  2. Go back to the wiki page https://en.wiki.openfoodfacts.org/Student_projects/ENSAE/2018 and click on "Log in" on the top right corner
  3. Click on "Edit" on the top right
  4. Feel free to edit the page and re-organize it as you see fit

Participants:

  • Laetitia
  • Zoé

Goals

Most of the products photos and data on https://world.openfoodfacts.org is crowdsourced by users who scan product barcodes, use the Open Food Facts mobile app to send photos of the product, ingredients and nutrition facts, and the Open Food Facts web site to input the corresponding data. This project aims to find errors in data entered by users, and to speed up data entry by automatically classifying products or making suggestions to contributors.


Classification

Automatically classify products into product categories, brands, labels


Error detection

Find errors in values entered by users for the nutrion facts (energy, fat, carbohydrates, proteins, salt etc.)


Data


Approaches, algorithms etc.

(I just listed some initial ideas -- Stephane)

Classification

  • use words extracted from OCR on product photos to classify products into categories, brands and labels

Error detection

  • cluster products by category to find outliers


Measuring results

Source code

Open Food Facts uses Github for source control: https://github.com/openfoodfacts

We have some Python clients and libraries:

Misc.