951
edits
m (typo) |
(+python +R) |
||
Line 29: | Line 29: | ||
[https://csvkit.readthedocs.io/en/latest/ csvkit] is a very efficient tool to manipulate huge amounts of CSV data. Here are some useful tips to manipulate Open Food Facts CSV export. | [https://csvkit.readthedocs.io/en/latest/ csvkit] is a very efficient tool to manipulate huge amounts of CSV data. Here are some useful tips to manipulate Open Food Facts CSV export. | ||
''Selecting 2 column''s. Selecting two or three columns can be useful for some usages. Extracting two columns produce a smaller CSV file which can be opened by common softwares such as Libre Office or Excel. The following command creates a CSV file (brands.csv) containing two columns from Open Food Facts (code and brands). (It generally takes more than 2 minutes, depending on your computer.) | |||
<code> | <code> | ||
Line 37: | Line 37: | ||
==== Import CSV in PostGRE SQL ==== | ==== Import CSV in PostGRE SQL ==== | ||
See this article: https://blog-postgresql.verite.pro/2018/12/21/import-openfoodfacts.html (in french, but should be understandable with Google Translator). | See this article: https://blog-postgresql.verite.pro/2018/12/21/import-openfoodfacts.html (in french, but should be understandable with Google Translator). | ||
==== Python ==== | |||
There are some articles dealing with using Python language to explore Open Food Facts data. | |||
Step by step commands: http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/prepare_data_2017.html (also in french) | |||
Python notebooks are great to learn Open Food Facts data, as they mix code and results together: | |||
* Find [https://www.kaggle.com/openfoodfacts/world-food-facts/kernels?sortBy=hotness&group=everyone&pageSize=20&datasetId=20&language=Python dozens of python notebooks on Kaggle] | |||
* https://www.datasciencesociety.net/part-1-exploring-food-data/ | |||
==== R stat ==== | |||
For people who have R stat skills, there are [https://www.kaggle.com/openfoodfacts/world-food-facts/kernels?sortBy=hotness&group=everyone&pageSize=20&datasetId=20&language=R more than 50 notebooks from Kaggle community]. |