Jump to content

Reusing Open Food Facts Data: Difference between revisions

Add jsonl export and examples with jq
(→‎CSV daily export: adding about importing CSV to SQLite)
(Add jsonl export and examples with jq)
Line 71: Line 71:
==== R stat ====
==== R stat ====
For people who have R stat skills, there are [https://www.kaggle.com/openfoodfacts/world-food-facts/kernels?sortBy=hotness&group=everyone&pageSize=20&datasetId=20&language=R more than 50 notebooks from Kaggle community].
For people who have R stat skills, there are [https://www.kaggle.com/openfoodfacts/world-food-facts/kernels?sortBy=hotness&group=everyone&pageSize=20&datasetId=20&language=R more than 50 notebooks from Kaggle community].
=== jsonl export ===
jsonl is a huge file! It's not possible to play with it with common editors or common tools. But there is some command line tools that allows interesting things, like [https://stedolan.github.io/jq/manual/v1.6/ jq].
==== jq ====
* start decompress the file (be carreful => 17GB after decompression):
$ gunzip openfoodfacts-products.jsonl.gz
* work on a small subset to test. E.g. for 100 products:
$ head -n 100 openfoodfacts-products.jsonl > small.jsonl
You can start playing with jq. Here are examples.
$ cat small.jsonl | jq . # print all file in JSON format
$ cat small.jsonl | jq -r .code # print all products' codes.
$ cat small.jsonl | jq -r '[.code,.product_name] | @csv' # output a CSV file containing code,product_name