Food Traceability Codes/EU Food establishments: Difference between revisions
Line 18: | Line 18: | ||
* French EMB codes: http://agriculture.gouv.fr/liste-des-etablissements-agrees-ce | * French EMB codes: http://agriculture.gouv.fr/liste-des-etablissements-agrees-ce | ||
* Greek GR codes: http://www.efet.gr/portal/page/portal/efetnew/enterprises/facilities | * Greek GR codes: http://www.efet.gr/portal/page/portal/efetnew/enterprises/facilities | ||
* Spanish RGSEAA codes: http://rgsa-web-aesan.msssi.es/rgsa/formulario_ue_js.jsp | |||
* other countries packaging codes | * other countries packaging codes | ||
Revision as of 10:30, 21 May 2019
The goal of this project is to establish a list of food establishments, to gather data about those establishments, to map food products to food establishments (using identifying codes and/or company names and street addresses), to use the data in the Open Food Facts applications and to enable other applications.
Current Uses on Open Food Facts
- Map made with this data: http://madenear.me/
- Product page: http://world.openfoodfacts.org/packager-code/fr-40-288-002-ec
Regulation
It's possible to know the origin of food products thanks to several marks on the tag: health mark (processing company identification) and/or the packager number (packaging company identification) when its full name is not displayed:
- Health marks (estampille sanitaire in French) identify processing facilities that prepare, treat, transform, manipulate or store animal products or products from animal origins. For european countries, this mark is an oval shape on the package. The mark displays the information of the plant that processed the products: two letters for the country (FR for France, UK for United Kingdom, EG for Germany, EB for Latvia ...), two or three letter for the region (department number in France), three digits for the town (INSEE number is used in France, not postal code), and then the last digits identify the plant itself.
- Packager number: it's the identification code for the packaging companies or the importer when its name is not displayed. Under certain circonstances, it can replace the name of the producer when the production is subcontracted.
Food establishments codes and sources
Codes listed of food products labels includes:
- EU agreements / food establishments codes : http://ec.europa.eu/food/food/biosafety/establishments/list_en.htm
- French EMB codes: http://agriculture.gouv.fr/liste-des-etablissements-agrees-ce
- Greek GR codes: http://www.efet.gr/portal/page/portal/efetnew/enterprises/facilities
- Spanish RGSEAA codes: http://rgsa-web-aesan.msssi.es/rgsa/formulario_ue_js.jsp
- other countries packaging codes
Aggregated list of food establishments
- Code repository for lists, scripts and tools: https://github.com/openfoodfacts/eu-food-data
Technical details
- Files are here: https://bitbucket.org/openfoodfacts/product-opener/src/b2b3e84d40182d1162ec4060894a20f133a870b7/packager-codes/?at=master
- Script used to merge files:https://bitbucket.org/openfoodfacts/product-opener/src/3942937fec34ecda9bfc88a1c9cef0d55a4a9d27/cgi/update_packager_codes.pl?at=master&fileviewer=file-view-default
How to process official lists
- Use of french Agriculture Ministry files (hosted at https://github.com/openfoodfacts/eu-food-data/blob/master/fr/urls-fr.txt) to extract list of certification (CE Approval number).
- Definition of a common structure for the table, such as:
- Type (name of category used by administration, ie name of the list)
- Libelle autorisation/Approval description (for fish processing facilities)
- Numero de departement/Department number
- Numero agrement/Approval number
- SIRET/Local Number
- Processus/Process (for fish processing facilities)
- Raison Sociale/Name
- Adresse/Adress
- Code Postal/Postal Code
- Commune/Town
- Espece/Species (for wild games or fish processing facilities)
- Concatenation of file to obtain a list of all approved processes
- Pivot table to create the list of all approved companies
IMPORTANT NOTE: An approval number is unique.A company can have several approved processes, but they will all have the same approval number.
- Update list with other official lists, such as derogatory approvals, ...
France (DONE)
- Complete list of approved processes (as of July 2014): https://raw.githubusercontent.com/pmainguet/eu-food-data/master/fr/140802_Liste_Certif_EMB_CE_MinAgri
- Complete list of approved companies (as of July 2014): https://raw.githubusercontent.com/pmainguet/eu-food-data/master/fr/140802_Liste_EMB_CE_MinAgri.csv
United Kingdom (DONE)
To-do list:
- find a way to manage the data available there http://www.food.gov.uk/enforcement/sectorrules
- perhaps contact helpline@foodstandards.gsi.gov.uk as no appropriate contact was found on http://www.food.gov.uk/about-us/contact-us/contact-details-by-topic/contact-us-uk . The goal is to see if there is a more API friendly way to get the data.
To do
- Add derogatory approvals, see http://agriculture.gouv.fr/liste-des-etablissements
- Search for EMB list approvals
Methodology to extract data for European Food Establishments
General
Build URL Data Files
- All root directories for EU Approval list by country can be found here https://github.com/openfoodfacts/eu-food-data
- Build a url listing all needed url. It is best to use FR template (FR-urls.txt to be uploaded), to identify missing data. It is also important to match country specific denomination with OFF taxonomy for EU Approval Section.
Build Raw CSV/TXT files
- Several formats are used in EU countries. A specific approach is needed for each of them. Refer below for details for each country.
- For non text-PDF files, use Tesseract and this Python script https://github.com/virantha/pypdfocr to get text PDF
- For text PDF files, use Tabula and use the command line tool to create scripts https://github.com/tabulapdf/tabula-extractor/wiki/Using-the-command-line-tabula-extractor-tool
Get Templated CSV/TXT file
- Cleanup is needed to ensure data accuracy
- an XLS file is available for some countries (eg France) to speed up the process
- Section needs to be normalize to make comparisons between countries
Geocode
- The best way is to use the geopy library to use nominatim in a Python script
- To check geocoding accuracy, it is best to make a pivot table of all cities in the target country, and to add a lookup function in the main data file. Adding the name of the country in the address usually lead to better results.
- Local Authorities List in UK http://localweblist.net/
Polishing files
- As a company has one approval codes but can have it for several section under the EU regulation (slaughterhouse, warehouse, fishery products - Freezing vessel ...), you must used a pivot table and some cleaning to get a file with the following columns: name of the establishment, approval code, type (concatenation of the Section name under which the company is approved), one colum for each EU section, adresse, latitude, longitude
- The "type" column is used for search query
- To get a nice vizualisation and make corrections, use http://blog.perrygeo.net/2013/09/30/leaflet-simple-csv/ to make a map of all listed companies. You can filter company on their section by simply using the search field
Find Additional Data on companies
- Infogreffe
- https://opencorporates.com/
Country | Owner | Source format | Status | Output | Deployed ? | Remarks |
---|---|---|---|---|---|---|
Austria | ||||||
Belgium | ||||||
Bulgaria | ||||||
Croatia | ||||||
Cyprus | ||||||
Czech Republic | ||||||
Denmark | ||||||
Estonia | ||||||
Finland | ||||||
France | ||||||
Germany | ||||||
Greece | ||||||
Hungary | ||||||
Ireland | ||||||
Italy | ||||||
Latvia | ||||||
Lithuania | ||||||
Luxembourg | ||||||
Malta | ||||||
Netherlands | ||||||
Poland | ||||||
Portugal | ||||||
Romania | ||||||
Slovakia | ||||||
Slovenia | ||||||
Spain |
Albania
Andorra
Austria
Belarus
Belgium
Bosnia and Herzegovina
Bulgaria
Croatia
Cyprus
Czech Republic
Denmark
Estonia
Faroe Islands
Finland
France (DONE)
- I just build a script that takes all the french agreement info from Agriculture Ministry and concatenate them in one file. Next step is to do the same for UK. The step after that is to cleverly agregate the duplicates (some companies have several health agreements under the same agreement number)
https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/FR-script.py This script use this file to get the list of URL to retrieve https://github.com/openfoodfacts/eu-food-data/blob/master/fr/urls-fr.txt
- If nominatim limits queries, you can use data files from wikipedia for lat/lng for each zipcode https://www.data.gouv.fr/fr/datasets/listes-des-communes-geolocalisees-par-regions-departements-circonscriptions-nd/
Germany (DONE)
- @vince has performed a first extraction based on file in https://github.com/openfoodfacts/eu-food-data/tree/master/de with the help of the following script
cat export.csv | awk -F';' '{print "\"" $4 "\"; Deutschland"}' | sed "1d" | sort -u | iconv -f ISO-8859-15 -t UTF-8 > to_geocode.csv
Gibraltar
Greece
Hungary
Iceland
Ireland
Isle of Man
Italy
Geocoded version by Tacite : https://openfoodfacts.slack.com/archives/C02LB7AV0/p1494497561131004
Source (no geocoding) is at http://www.salute.gov.it/consultazioneStabilimenti/ConsultazioneStabilimentiServlet?ACTION=gestioneSingoloPaese&naz=CL
Note: There is an alternative version with more than 6500 codes available that includes artisan raw milk codes that we're not likely to see on OFF for now. In an effort to get a working version with geocoding quickly, I didn't use it.
Kosovo
Latvia
Liechtenstein
Lithuania
Luxembourg
Macedonia
Malta
Moldova
Monaco
Montenegro
Netherlands (DONE)
- Data from NWA (Netherlands Food & Consumer Product Safety Authority)
- A nice map with all the companies with approval from Netherlands under the EU regulation: http://www.alteconomics.fr/Leaflet
Norway
Poland
Portugal
Romania
@lucaa is doing it
Russia
San Marino
Serbia
https://openfoodfacts.slack.com/files/bojackhorseman/F59PY4CR1/version1.csv
Slovakia
Slovenia
Spain
Sweden
Switzerland
Ukraine
United Kingdom (DONE)
- As UK is divided in 4 regions (Ireland, England, Wales and Scotland) and because they have different file format, we use a 3-file script
- https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/UK-urls.txt => all UK urls
- https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/UK-methods.txt => list which method to use depending on the file type
- https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/UK-script.py => the script itself
- Source: Food Standards Agency - http://www.food.gov.uk/business-industry/meat/audit
Vatican city
Yugoslavia
Inspiration
Use of the food establishments list inside Open Food Facts
- Display information on packager page (company name, street address, type of facility etc.)
- Geo-code addresses and display on http://cestemballepresdechezvous.fr
See-also
Aggregated (but not open) list of EU codes: http://www.eucode.info/