1,082
edits
(re-organize the proposals (wip)) |
No edit summary |
||
Line 43: | Line 43: | ||
* Grade scan products based on this data | * Grade scan products based on this data | ||
* Display product recommendations / alternatives that better match the user preferences | * Display product recommendations / alternatives that better match the user preferences | ||
=== Computer vision === | |||
Why it's important: all product data comes from photos of the product and labels. Today most of this data is entered manually. In order to be able to scale, we need to extract more data from photos automatically. | |||
Background: We currently only do basic OCR for ingredients. There is a lot of room for improvement. | |||
==== Improve OCR for ingredients ==== | |||
* Create golden test sets to measure accuracy of the current OCR and improvements | |||
* Train OCR models targeted for ingredients | |||
* Automatic cropping of ingredients lists | |||
==== OCR for Nutrition Facts tables ==== | |||
* Automatic recognition and cropping of nutrition facts table | |||
* OCR for the nutrition facts table | |||
==== Brands and labels detection ==== | |||
* Automatically recognize brands and labels | |||
=== Data science === | |||
Why it's important: our product database is growing rapidly (10k new products every Month in early 2018), we need automated ways to extract and validate data | |||
Background: to date, we have done very little in this area | |||
==== Automatically classify products ==== | |||
* Detect field values from other field values or bag of words from the OCR | |||
** Categories | |||
** Brands (in some cases, a strong feature can be the barcode prefix) | |||
** Labels | |||
* When certain, detected values can be applied immediately | |||
* When less certain, we can ask users to confirm suggestions | |||
==== Automatically detect errors ==== | |||
* Bad nutrition facts | |||
** e.g. by looking at outliers for products of the same category |