Reusing Open Food Facts Data: Difference between revisions

Reusing Open Food Facts Data (view source)

Revision as of 21:20, 19 September 2020

183 bytes added , 19 September 2020

Add JSONL doc

VisualWikitext

Charlesnepote

Bureaucrats, Administrators

951

edits

@@ Line 18: / Line 18: @@
 ==== The MongoDB daily export ====
-It represents the most complete data; it's very big and you have to know how to deal with MongoDB.
+It represents the most complete data; it's very big and you have to know how to deal with MongoDB. It's very big! More than 9GB uncompressed.
 ==== The JSONL daily export ====
-While still undocumented, there is a daily export of the whole database in jsonl format. It represents the same data as the MongoDB export. It's very big! More than 14GB uncompressed.
+While still undocumented, there is a daily export of the whole database in [https://jsonlines.org/ JSONL format] (sometimes called LDJSON or NDJSON) where each line is a JSON object. It represents the same data as the MongoDB export. The file is 2,7GB (2020-09), compressed with gzip. It takes more than 14GB uncompressed.
 You can find it at https://static.openfoodfacts.org/data/openfoodfacts-products.jsonl.gz
@@ Line 58: / Line 58: @@
 ==== Import CSV to SQLite ====
-The repository [https://github.com/fairdirect/foodrescue-content foodrescue-content] contains Ruby scripts that import Open Food Facts CSV data into a [https://www.sqlite.org/index.html SQLite] database with full table normalization. Only a few fields are imported so far, but this an be extended easily. Data imported so far includes:
+The repository [https://github.com/fairdirect/foodrescue-content foodrescue-content] contains Ruby scripts that import Open Food Facts CSV data into a [https://www.sqlite.org/index.html SQLite] database with full table normalization. Only a few fields are imported so far, but this can be extended easily. Data imported so far includes:
 * barcode number
@@ Line 97: / Line 97: @@
   $ cat openfoodfacts-products.jsonl | jq -r '[.code,.product_name] | @csv' > names.csv # output CSV file (name.csv) containing all products with code,product_name
-If you don't have enough disk place to uncompress the .gz file, you can use zcat directly on the compressed file. Example:
+If you don't have enough disk space to uncompress the .gz file, you can use zcat directly on the compressed file. Example:
   $ zcat openfoodfacts-products.jsonl.gz | jq -r '[.code,.product_name] | @csv' # output CSV data containing code,product_name