Student projects/2018/ENSAE Classification and Error Detection: Difference between revisions
No edit summary |
No edit summary |
||
Line 53: | Line 53: | ||
Open Food Facts uses Github for source control: https://github.com/openfoodfacts | Open Food Facts uses Github for source control: https://github.com/openfoodfacts | ||
We have some Python clients and libraries: | |||
* https://github.com/openfoodfacts/openfoodfacts-python | |||
* https://github.com/openfoodfacts/OpenFoodFacts-APIRestPython | |||
== Misc. == | |||
* https://en.wikipedia.org/wiki/Weka_(machine_learning) |
Revision as of 15:48, 23 February 2018
Error Detection and Automatic Classification
This wiki page is to document the Data Science project with ENSAE students (February - June 2018)
To edit this page:
- Create an account on https://world.openfoodfacts.org and log in on Open Food Facts with this account
- Go back to the wiki page https://en.wiki.openfoodfacts.org/Student_projects/ENSAE/2018 and click on "Log in" on the top right corner
- Click on "Edit" on the top right
- Feel free to edit the page and re-organize it as you see fit
Participants:
- Laeticia
- Zoé
Goals
Most of the products photos and data on https://world.openfoodfacts.org is crowdsourced by users who scan product barcodes, use the Open Food Facts mobile app to send photos of the product, ingredients and nutrition facts, and the Open Food Facts web site to input the corresponding data. This project aims to find errors in data entered by users, and to speed up data entry by automatically classifying products or making suggestions to contributors.
Classification
Automatically classify products into product categories, brands, labels
Error detection
Find errors in values entered by users for the nutrion facts (energy, fat, carbohydrates, proteins, salt etc.)
Data
Approaches, algorithms etc.
(I just listed some initial ideas -- Stephane)
Classification
- use words extracted from OCR on product photos to classify products into categories, brands and labels
Error detection
- cluster products by category to find outliers
Measuring results
Source code
Open Food Facts uses Github for source control: https://github.com/openfoodfacts
We have some Python clients and libraries:
- https://github.com/openfoodfacts/openfoodfacts-python
- https://github.com/openfoodfacts/OpenFoodFacts-APIRestPython