Jump to content

Reusing Open Food Facts Data: Difference between revisions

Add JSONL doc
(CSV export details)
(Add JSONL doc)
Line 18: Line 18:


==== The MongoDB daily export ====
==== The MongoDB daily export ====
It represents the most complete data; it's very big and you have to know how to deal with MongoDB.
It represents the most complete data; it's very big and you have to know how to deal with MongoDB. It's very big! More than 9GB uncompressed.


==== The JSONL daily export ====
==== The JSONL daily export ====
While still undocumented, there is a daily export of the whole database in jsonl format. It represents the same data as the MongoDB export. It's very big! More than 14GB uncompressed.
While still undocumented, there is a daily export of the whole database in [https://jsonlines.org/ JSONL format] (sometimes called LDJSON or NDJSON) where each line is a JSON object. It represents the same data as the MongoDB export. The file is 2,7GB (2020-09), compressed with gzip. It takes more than 14GB uncompressed.


You can find it at https://static.openfoodfacts.org/data/openfoodfacts-products.jsonl.gz
You can find it at https://static.openfoodfacts.org/data/openfoodfacts-products.jsonl.gz
Line 58: Line 58:
==== Import CSV to SQLite ====
==== Import CSV to SQLite ====


The repository [https://github.com/fairdirect/foodrescue-content foodrescue-content] contains Ruby scripts that import Open Food Facts CSV data into a [https://www.sqlite.org/index.html SQLite] database with full table normalization. Only a few fields are imported so far, but this an be extended easily. Data imported so far includes:  
The repository [https://github.com/fairdirect/foodrescue-content foodrescue-content] contains Ruby scripts that import Open Food Facts CSV data into a [https://www.sqlite.org/index.html SQLite] database with full table normalization. Only a few fields are imported so far, but this can be extended easily. Data imported so far includes:  


* barcode number
* barcode number
Line 97: Line 97:
  $ cat openfoodfacts-products.jsonl | jq -r '[.code,.product_name] | @csv' > names.csv # output CSV file (name.csv) containing all products with code,product_name
  $ cat openfoodfacts-products.jsonl | jq -r '[.code,.product_name] | @csv' > names.csv # output CSV file (name.csv) containing all products with code,product_name


If you don't have enough disk place to uncompress the .gz file, you can use zcat directly on the compressed file. Example:
If you don't have enough disk space to uncompress the .gz file, you can use zcat directly on the compressed file. Example:
  $ zcat openfoodfacts-products.jsonl.gz | jq -r '[.code,.product_name] | @csv' # output CSV data containing code,product_name
  $ zcat openfoodfacts-products.jsonl.gz | jq -r '[.code,.product_name] | @csv' # output CSV data containing code,product_name