Jump to content

Reusing Open Food Facts Data: Difference between revisions

+python +R
m (typo)
(+python +R)
Line 29: Line 29:
[https://csvkit.readthedocs.io/en/latest/ csvkit] is a very efficient tool to manipulate huge amounts of CSV data. Here are some useful tips to manipulate Open Food Facts CSV export.
[https://csvkit.readthedocs.io/en/latest/ csvkit] is a very efficient tool to manipulate huge amounts of CSV data. Here are some useful tips to manipulate Open Food Facts CSV export.


'''Selecting 2 columns'''. Selecting two or three columns can be useful for some usages. Extracting two columns produce a smaller CSV file which can be opened by common softwares such as Libre Office or Excel. The following command creates a CSV file (brands.csv) containing two columns from Open Food Facts (code and brands). (It generally takes more than 2 minutes, depending on your computer.)
''Selecting 2 column''s. Selecting two or three columns can be useful for some usages. Extracting two columns produce a smaller CSV file which can be opened by common softwares such as Libre Office or Excel. The following command creates a CSV file (brands.csv) containing two columns from Open Food Facts (code and brands). (It generally takes more than 2 minutes, depending on your computer.)


<code>
<code>
Line 37: Line 37:
==== Import CSV in PostGRE SQL ====
==== Import CSV in PostGRE SQL ====
See this article: https://blog-postgresql.verite.pro/2018/12/21/import-openfoodfacts.html (in french, but should be understandable with Google Translator).
See this article: https://blog-postgresql.verite.pro/2018/12/21/import-openfoodfacts.html (in french, but should be understandable with Google Translator).
==== Python ====
There are some articles dealing with using Python language to explore Open Food Facts data.
Step by step commands: http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/prepare_data_2017.html (also in french)
Python notebooks are great to learn Open Food Facts data, as they mix code and results together:
* Find [https://www.kaggle.com/openfoodfacts/world-food-facts/kernels?sortBy=hotness&group=everyone&pageSize=20&datasetId=20&language=Python dozens of python notebooks on Kaggle]
* https://www.datasciencesociety.net/part-1-exploring-food-data/
==== R stat ====
For people who have R stat skills, there are [https://www.kaggle.com/openfoodfacts/world-food-facts/kernels?sortBy=hotness&group=everyone&pageSize=20&datasetId=20&language=R more than 50 notebooks from Kaggle community].