Jump to content

Reusing Open Food Facts Data: Difference between revisions

Eg: Filtering barcodes which are different from a code containing 1 to 13 digits
(Multiple filter and CSV export)
(Eg: Filtering barcodes which are different from a code containing 1 to 13 digits)
Line 90: Line 90:


==== jq ====
==== jq ====
* start decompress the file (be carreful => 14GB after decompression):
* start decompress the file (be careful => 14GB after decompression):
  $ gunzip openfoodfacts-products.jsonl.gz
  $ gunzip openfoodfacts-products.jsonl.gz
* work on a small subset to test. E.g. for 100 products:
* work on a small subset to test. E.g. for 100 products:
Line 118: Line 118:
  $ zcat openfoodfacts-products.jsonl.gz | jq -r '. | select(.misc_tags[]? == "en:nutriscore-computed" and .popularity_tags[]? == "top-90-percent-scans-2020") | [.code,.scans_n] | @csv' > displayed.ns.in.top90.2020.world.csv
  $ zcat openfoodfacts-products.jsonl.gz | jq -r '. | select(.misc_tags[]? == "en:nutriscore-computed" and .popularity_tags[]? == "top-90-percent-scans-2020") | [.code,.scans_n] | @csv' > displayed.ns.in.top90.2020.world.csv


Filtering barcodes which are different from a code containing 1 to 13 digits:
zcat openfoodfacts-products.jsonl.gz | jq -r '. | select(.code|test("^[0-9]{1,13}$") | not) | .code' > ean_gt_13.csv
These operations can be quite long (more than 10 minutes depending on your computer and your selection).
These operations can be quite long (more than 10 minutes depending on your computer and your selection).