Reusing Open Food Facts Data: Difference between revisions
(+python +R) |
m (typo) |
||
Line 1: | Line 1: | ||
Open Food Facts data is released as Open Data: it can be reused freely by anyone, under the Open Database License (ODBL). | Open Food Facts data is released as Open Data: it can be reused freely by anyone, under the Open Database License (ODBL). | ||
== Where is the data == | == Where is the data? == | ||
You'll find different kind of ways to get the data. | You'll find different kind of ways to get the data. | ||
=== Searching for a selection of | === Searching for a selection of products? === | ||
Then use the advanced search. The Open Food Facts advanced search feature allows to download selections of the data. See: https://world.openfoodfacts.org/cgi/search.pl | Then use the advanced search. The Open Food Facts advanced search feature allows to download selections of the data. See: https://world.openfoodfacts.org/cgi/search.pl | ||
Revision as of 14:04, 16 April 2020
Open Food Facts data is released as Open Data: it can be reused freely by anyone, under the Open Database License (ODBL).
Where is the data?
You'll find different kind of ways to get the data.
Searching for a selection of products?
Then use the advanced search. The Open Food Facts advanced search feature allows to download selections of the data. See: https://world.openfoodfacts.org/cgi/search.pl
When you search is done, you will be able to download the selection, just give a try!
Searching for the whole database?
The whole database can be downloaded at https://world.openfoodfacts.org/data
It's very big. Open Food Facts hosts more than 1,200,000 products (as of April 2020). So you will probably need skills to reuse the data.
You'll be able to find there different kinds of data.
The MongoDB daily export
It represents the most complete data; it's very big and you have to know how to deal with MongoDB.
The CSV daily export
It represents a subset of the database but it is generally fitted to the majority of usages. It's a 2.3GB file (as of April 2020), so it can't be opened by Libre Office or Excel with an 8GB machine.
How to reuse?
CSV daily export
csvkit tips
csvkit is a very efficient tool to manipulate huge amounts of CSV data. Here are some useful tips to manipulate Open Food Facts CSV export.
Selecting 2 columns. Selecting two or three columns can be useful for some usages. Extracting two columns produce a smaller CSV file which can be opened by common softwares such as Libre Office or Excel. The following command creates a CSV file (brands.csv) containing two columns from Open Food Facts (code and brands). (It generally takes more than 2 minutes, depending on your computer.)
$ csvcut -t -c code,brands en.openfoodfacts.org.products.csv > brands.csv
Import CSV in PostGRE SQL
See this article: https://blog-postgresql.verite.pro/2018/12/21/import-openfoodfacts.html (in french, but should be understandable with Google Translator).
Python
There are some articles dealing with using Python language to explore Open Food Facts data.
Step by step commands: http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/prepare_data_2017.html (also in french)
Python notebooks are great to learn Open Food Facts data, as they mix code and results together:
- Find dozens of python notebooks on Kaggle
- https://www.datasciencesociety.net/part-1-exploring-food-data/
R stat
For people who have R stat skills, there are more than 50 notebooks from Kaggle community.