951
edits
(CSV export details) |
(Add JSONL doc) |
||
Line 18: | Line 18: | ||
==== The MongoDB daily export ==== | ==== The MongoDB daily export ==== | ||
It represents the most complete data; it's very big and you have to know how to deal with MongoDB. | It represents the most complete data; it's very big and you have to know how to deal with MongoDB. It's very big! More than 9GB uncompressed. | ||
==== The JSONL daily export ==== | ==== The JSONL daily export ==== | ||
While still undocumented, there is a daily export of the whole database in | While still undocumented, there is a daily export of the whole database in [https://jsonlines.org/ JSONL format] (sometimes called LDJSON or NDJSON) where each line is a JSON object. It represents the same data as the MongoDB export. The file is 2,7GB (2020-09), compressed with gzip. It takes more than 14GB uncompressed. | ||
You can find it at https://static.openfoodfacts.org/data/openfoodfacts-products.jsonl.gz | You can find it at https://static.openfoodfacts.org/data/openfoodfacts-products.jsonl.gz | ||
Line 58: | Line 58: | ||
==== Import CSV to SQLite ==== | ==== Import CSV to SQLite ==== | ||
The repository [https://github.com/fairdirect/foodrescue-content foodrescue-content] contains Ruby scripts that import Open Food Facts CSV data into a [https://www.sqlite.org/index.html SQLite] database with full table normalization. Only a few fields are imported so far, but this | The repository [https://github.com/fairdirect/foodrescue-content foodrescue-content] contains Ruby scripts that import Open Food Facts CSV data into a [https://www.sqlite.org/index.html SQLite] database with full table normalization. Only a few fields are imported so far, but this can be extended easily. Data imported so far includes: | ||
* barcode number | * barcode number | ||
Line 97: | Line 97: | ||
$ cat openfoodfacts-products.jsonl | jq -r '[.code,.product_name] | @csv' > names.csv # output CSV file (name.csv) containing all products with code,product_name | $ cat openfoodfacts-products.jsonl | jq -r '[.code,.product_name] | @csv' > names.csv # output CSV file (name.csv) containing all products with code,product_name | ||
If you don't have enough disk | If you don't have enough disk space to uncompress the .gz file, you can use zcat directly on the compressed file. Example: | ||
$ zcat openfoodfacts-products.jsonl.gz | jq -r '[.code,.product_name] | @csv' # output CSV data containing code,product_name | $ zcat openfoodfacts-products.jsonl.gz | jq -r '[.code,.product_name] | @csv' # output CSV data containing code,product_name |