951
edits
(Multiple filter and CSV export) |
(Eg: Filtering barcodes which are different from a code containing 1 to 13 digits) |
||
Line 90: | Line 90: | ||
==== jq ==== | ==== jq ==== | ||
* start decompress the file (be | * start decompress the file (be careful => 14GB after decompression): | ||
$ gunzip openfoodfacts-products.jsonl.gz | $ gunzip openfoodfacts-products.jsonl.gz | ||
* work on a small subset to test. E.g. for 100 products: | * work on a small subset to test. E.g. for 100 products: | ||
Line 118: | Line 118: | ||
$ zcat openfoodfacts-products.jsonl.gz | jq -r '. | select(.misc_tags[]? == "en:nutriscore-computed" and .popularity_tags[]? == "top-90-percent-scans-2020") | [.code,.scans_n] | @csv' > displayed.ns.in.top90.2020.world.csv | $ zcat openfoodfacts-products.jsonl.gz | jq -r '. | select(.misc_tags[]? == "en:nutriscore-computed" and .popularity_tags[]? == "top-90-percent-scans-2020") | [.code,.scans_n] | @csv' > displayed.ns.in.top90.2020.world.csv | ||
Filtering barcodes which are different from a code containing 1 to 13 digits: | |||
zcat openfoodfacts-products.jsonl.gz | jq -r '. | select(.code|test("^[0-9]{1,13}$") | not) | .code' > ean_gt_13.csv | |||
These operations can be quite long (more than 10 minutes depending on your computer and your selection). | These operations can be quite long (more than 10 minutes depending on your computer and your selection). |