Student projects/2018/ENSAE Classification and Error Detection: Difference between revisions

From Open Food Facts wiki
No edit summary
No edit summary
Line 53: Line 53:


Open Food Facts uses Github for source control: https://github.com/openfoodfacts
Open Food Facts uses Github for source control: https://github.com/openfoodfacts
We have some Python clients and libraries:
* https://github.com/openfoodfacts/openfoodfacts-python
* https://github.com/openfoodfacts/OpenFoodFacts-APIRestPython
== Misc. ==
* https://en.wikipedia.org/wiki/Weka_(machine_learning)

Revision as of 15:48, 23 February 2018

Error Detection and Automatic Classification

This wiki page is to document the Data Science project with ENSAE students (February - June 2018)

To edit this page:

  1. Create an account on https://world.openfoodfacts.org and log in on Open Food Facts with this account
  2. Go back to the wiki page https://en.wiki.openfoodfacts.org/Student_projects/ENSAE/2018 and click on "Log in" on the top right corner
  3. Click on "Edit" on the top right
  4. Feel free to edit the page and re-organize it as you see fit

Participants:

  • Laeticia
  • Zoé

Goals

Most of the products photos and data on https://world.openfoodfacts.org is crowdsourced by users who scan product barcodes, use the Open Food Facts mobile app to send photos of the product, ingredients and nutrition facts, and the Open Food Facts web site to input the corresponding data. This project aims to find errors in data entered by users, and to speed up data entry by automatically classifying products or making suggestions to contributors.


Classification

Automatically classify products into product categories, brands, labels


Error detection

Find errors in values entered by users for the nutrion facts (energy, fat, carbohydrates, proteins, salt etc.)


Data


Approaches, algorithms etc.

(I just listed some initial ideas -- Stephane)

Classification

  • use words extracted from OCR on product photos to classify products into categories, brands and labels

Error detection

  • cluster products by category to find outliers


Measuring results

Source code

Open Food Facts uses Github for source control: https://github.com/openfoodfacts

We have some Python clients and libraries:

Misc.