Food Traceability Codes/EU Food establishments: Difference between revisions

From Open Food Facts wiki
m (UK stuff)
 
(18 intermediate revisions by 5 users not shown)
Line 1: Line 1:
The goal of this project is to establish a list of food establishments, to gather data about those establishments, to map food products to food establishments (using identifying codes and/or company names and street addresses), to use the data in the Open Food Facts applications and to enable other applications.
The goal of this project is to establish a list of food establishments, to gather data about those establishments, to map food products to food establishments (using identifying codes and/or company names and street addresses), to use the data in the Open Food Facts applications and to enable other applications.
== Current Uses on Open Food Facts ==
* Map made with this data: http://madenear.me/
* Product page: http://world.openfoodfacts.org/packager-code/fr-40-288-002-ec


==Regulation==
==Regulation==
Line 14: Line 18:
* French EMB codes: http://agriculture.gouv.fr/liste-des-etablissements-agrees-ce
* French EMB codes: http://agriculture.gouv.fr/liste-des-etablissements-agrees-ce
* Greek GR codes: http://www.efet.gr/portal/page/portal/efetnew/enterprises/facilities
* Greek GR codes: http://www.efet.gr/portal/page/portal/efetnew/enterprises/facilities
* Spanish RGSEAA codes: http://rgsa-web-aesan.msssi.es/rgsa/formulario_ue_js.jsp
* other countries packaging codes
* other countries packaging codes


== Aggregated list of food establishments ==
== Aggregated list of food establishments ==
* Code repository for lists, scripts and tools: https://github.com/openfoodfacts/eu-food-data
=== Technical details ===


* Code repository for lists, scripts and tools: https://github.com/openfoodfacts/eu-food-data
**Files are here: https://bitbucket.org/openfoodfacts/product-opener/src/b2b3e84d40182d1162ec4060894a20f133a870b7/packager-codes/?at=master
**Script used to merge files:https://bitbucket.org/openfoodfacts/product-opener/src/3942937fec34ecda9bfc88a1c9cef0d55a4a9d27/cgi/update_packager_codes.pl?at=master&fileviewer=file-view-default
===How to process official lists===
===How to process official lists===
*Use of french Agriculture Ministry files (hosted at https://github.com/openfoodfacts/eu-food-data/blob/master/fr/urls-fr.txt) to extract list of certification (CE Approval number).
*Use of french Agriculture Ministry files (hosted at https://github.com/openfoodfacts/eu-food-data/blob/master/fr/urls-fr.txt) to extract list of certification (CE Approval number).
Line 38: Line 46:
*Update list with other official lists, such as derogatory approvals, ...
*Update list with other official lists, such as derogatory approvals, ...


===France===
===France (DONE) ===
*Complete list of approved processes (as of July 2014): https://raw.githubusercontent.com/pmainguet/eu-food-data/master/fr/140802_Liste_Certif_EMB_CE_MinAgri
*Complete list of approved processes (as of July 2014): https://raw.githubusercontent.com/pmainguet/eu-food-data/master/fr/140802_Liste_Certif_EMB_CE_MinAgri
*Complete list of approved companies (as of July 2014): https://raw.githubusercontent.com/pmainguet/eu-food-data/master/fr/140802_Liste_EMB_CE_MinAgri.csv
*Complete list of approved companies (as of July 2014): https://raw.githubusercontent.com/pmainguet/eu-food-data/master/fr/140802_Liste_EMB_CE_MinAgri.csv
===United Kingdom===
===United Kingdom (DONE) ===
To-do list:
To-do list:
*find a way to manage the data available there http://www.food.gov.uk/enforcement/sectorrules
*find a way to manage the data available there http://www.food.gov.uk/enforcement/sectorrules
*perhaps contact helpline@foodstandards.gsi.gov.uk as no appropriate contact was found on http://www.food.gov.uk/about-us/contact-us/contact-details-by-topic/contact-us-uk . The goal is to see if there is a more API friendly way to get the data.
*perhaps contact helpline@foodstandards.gsi.gov.uk as no appropriate contact was found on http://www.food.gov.uk/about-us/contact-us/contact-details-by-topic/contact-us-uk . The goal is to see if there is a more API friendly way to get the data.
====To do====
===To do===
*Add derogatory approvals, see http://agriculture.gouv.fr/liste-des-etablissements
*Add derogatory approvals, see http://agriculture.gouv.fr/liste-des-etablissements
*Search for EMB list approvals
*Search for EMB list approvals
=Methodology to extract data for European Food Establishments=
==General==
===Build URL Data Files===
* All root directories for EU Approval list by country can be found here https://github.com/openfoodfacts/eu-food-data
* Build a url listing all needed url. It is best to use FR template (FR-urls.txt to be uploaded), to identify missing data. It is also important to match country specific denomination with OFF taxonomy for EU Approval Section.
===Build Raw CSV/TXT files===
*Several formats are used in EU countries. A specific approach is needed for each of them. Refer below for details for each country.
*For non text-PDF files, use Tesseract and this Python script https://github.com/virantha/pypdfocr to get text PDF
*For text PDF files, use Tabula and use the command line tool to create scripts https://github.com/tabulapdf/tabula-extractor/wiki/Using-the-command-line-tabula-extractor-tool
===Get Templated CSV/TXT file===
*Cleanup is needed to ensure data accuracy
*an XLS file is available for some countries (eg France) to speed up the process
*Section needs to be normalize to make comparisons between countries
===Geocode===
*The best way is to use the geopy library to use nominatim in a Python script
*To check geocoding accuracy, it is best to make a pivot table of all cities in the target country, and to add a lookup function in the main data file. Adding the name of the country in the address usually lead to better results.
*Local Authorities List in UK http://localweblist.net/
===Polishing files===
*As a company has one approval codes but can have it for several section under the EU regulation (slaughterhouse, warehouse, fishery products - Freezing vessel ...), you must used a pivot table and some cleaning to get a file with the following columns: name of the establishment, approval code, type (concatenation of the Section name under which the company is approved), one colum for each EU section, adresse, latitude, longitude
*The "type" column is used for search query
*To get a nice vizualisation and make corrections, use http://blog.perrygeo.net/2013/09/30/leaflet-simple-csv/ to make a map of all listed companies. You can filter company on their section by simply using the search field
===Find Additional Data on companies===
*Infogreffe
*https://opencorporates.com/
{| class="wikitable"
! Country
! Owner
! Source format
! Status
! Output
! Deployed ?
! Remarks
|-
| Austria
|
|
|
|
|
|
|-
| Belgium
|
|
|
|
|
|
|-
| Bulgaria
|
|
|
|
|
|
|-
| Croatia
|
|
|
|
|
|
|-
| Cyprus
|
|
|
|
|
|
|-
|Czech Republic
|
|
|
|
|
|
|-
|Denmark
|
|
|
|
|
|
|-
| Estonia
|
|
|
|
|
|
|-
| Finland
|
|
|
|
|
|
|-
| France
|
|
|
|
|
|
|-
| Germany
|
|
|
|
|
|
|-
| Greece
|
|
|
|
|
|
|-
| Hungary
|
|
|
|
|
|
|-
| Ireland
|
|
|
|
|
|
|-
| Italy
|
|
|
|
|
|
|-
| Latvia
|
|
|
|
|
|
|-
|Lithuania
|
|
|
|
|
|
|-
| Luxembourg
|
|
|
|
|
|
|-
| Malta
|
|
|
|
|
|
|-
| Netherlands
|
|
|
|
|
|
|-
| Poland
|
|
|
|
|
|
|-
| Portugal
|
|
|
|
|
|
|-
| Romania
|
|
|
|
|
|
|-
| Slovakia
|
|
|
|
|
|
|-
| Slovenia
|
|
|
|
|
|
|-
| Spain
|
|
|
|
|
|
|}
==Albania==
==Andorra==
==Austria==
==Belarus==
==Belgium==
==Bosnia and Herzegovina==
==Bulgaria==
==Croatia==
==Cyprus==
==Czech Republic==
==Denmark==
==Estonia==
==Faroe Islands==
==Finland==
==France (DONE) ==
*I just build a script that takes all the french agreement info from Agriculture Ministry and concatenate them in one file. Next step is to do the same for UK. The step after that is to cleverly agregate the duplicates (some companies have several health agreements under the same agreement number)
https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/FR-script.py
This script use this file to get the list of URL to retrieve https://github.com/openfoodfacts/eu-food-data/blob/master/fr/urls-fr.txt
*If nominatim limits queries, you can use data files from wikipedia for lat/lng for each zipcode https://www.data.gouv.fr/fr/datasets/listes-des-communes-geolocalisees-par-regions-departements-circonscriptions-nd/
==Germany (DONE) ==
*@vince has performed a first extraction based on file in https://github.com/openfoodfacts/eu-food-data/tree/master/de with the help of the following script
    cat export.csv | awk -F';' '{print "\"" $4 "\"; Deutschland"}' | sed "1d" | sort -u | iconv -f ISO-8859-15 -t UTF-8 > to_geocode.csv
==Gibraltar==
==Greece==
==Hungary==
==Iceland==
==Ireland==
==Isle of Man==
==Italy==
Geocoded version by Tacite : https://openfoodfacts.slack.com/archives/C02LB7AV0/p1494497561131004 <br/>
Source (no geocoding) is at http://www.salute.gov.it/consultazioneStabilimenti/ConsultazioneStabilimentiServlet?ACTION=gestioneSingoloPaese&naz=CL <br/>
Note: There is an alternative version with more than 6500 codes available that includes artisan raw milk codes that we're not likely to see on OFF for now. In an effort to get a working version with geocoding quickly, I didn't use it.
==Kosovo==
==Latvia==
==Liechtenstein==
==Lithuania==
==Luxembourg==
==Macedonia==
==Malta==
==Moldova==
==Monaco==
==Montenegro==
==Netherlands (DONE) ==
*Data from NWA (Netherlands Food & Consumer Product Safety Authority)
**English (site): https://english.nvwa.nl/topics/approved-establishments
* EU
**English EU: https://webgate.ec.europa.eu/tracesnt/directory/publication/establishment/index#!/search?classificationSectionChapter=food&countryCode=NL&sort=classificationSection.translation
==Norway==
==Poland==
==Portugal==
==Romania==
@lucaa is doing it
==Russia==
==San Marino==
==Serbia==
https://openfoodfacts.slack.com/files/bojackhorseman/F59PY4CR1/version1.csv
==Slovakia==
==Slovenia==
==Spain==
==Sweden==
==Switzerland==
==Ukraine==
==United Kingdom (DONE) ==
*As UK is divided in 4 regions (Ireland, England, Wales and Scotland) and because they have different file format, we use a 3-file script
**https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/UK-urls.txt => all UK urls
**https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/UK-methods.txt => list which method to use depending on the file type
**https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/UK-script.py => the script itself
*Source: Food Standards Agency - http://www.food.gov.uk/business-industry/meat/audit
==Vatican city==
==Yugoslavia==
==Inspiration==
*http://free.sourcemap.com/
[[Category:Food codes]]
[[fr:Projet:Codes_propriétaires]]


== Use of the food establishments list inside Open Food Facts ==
== Use of the food establishments list inside Open Food Facts ==
Line 53: Line 397:
* Display information on packager page (company name, street address, type of facility etc.)
* Display information on packager page (company name, street address, type of facility etc.)
* Geo-code addresses and display on http://cestemballepresdechezvous.fr
* Geo-code addresses and display on http://cestemballepresdechezvous.fr


== See-also ==
== See-also ==
Line 59: Line 405:


[[fr:Projet:Etablissements alimentaires]]
[[fr:Projet:Etablissements alimentaires]]
[[Category:Project]]
[[Category:European Union]]
[[Category:Data]]
[[Category:Food codes]]

Latest revision as of 13:20, 23 December 2023

The goal of this project is to establish a list of food establishments, to gather data about those establishments, to map food products to food establishments (using identifying codes and/or company names and street addresses), to use the data in the Open Food Facts applications and to enable other applications.

Current Uses on Open Food Facts

Regulation

It's possible to know the origin of food products thanks to several marks on the tag: health mark (processing company identification) and/or the packager number (packaging company identification) when its full name is not displayed:

  • Health marks (estampille sanitaire in French) identify processing facilities that prepare, treat, transform, manipulate or store animal products or products from animal origins. For european countries, this mark is an oval shape on the package. The mark displays the information of the plant that processed the products: two letters for the country (FR for France, UK for United Kingdom, EG for Germany, EB for Latvia ...), two or three letter for the region (department number in France), three digits for the town (INSEE number is used in France, not postal code), and then the last digits identify the plant itself.
  • Packager number: it's the identification code for the packaging companies or the importer when its name is not displayed. Under certain circonstances, it can replace the name of the producer when the production is subcontracted.

Food establishments codes and sources

Codes listed of food products labels includes:

Aggregated list of food establishments

Technical details

How to process official lists

  • Use of french Agriculture Ministry files (hosted at https://github.com/openfoodfacts/eu-food-data/blob/master/fr/urls-fr.txt) to extract list of certification (CE Approval number).
  • Definition of a common structure for the table, such as:
    • Type (name of category used by administration, ie name of the list)
    • Libelle autorisation/Approval description (for fish processing facilities)
    • Numero de departement/Department number
    • Numero agrement/Approval number
    • SIRET/Local Number
    • Processus/Process (for fish processing facilities)
    • Raison Sociale/Name
    • Adresse/Adress
    • Code Postal/Postal Code
    • Commune/Town
    • Espece/Species (for wild games or fish processing facilities)
  • Concatenation of file to obtain a list of all approved processes
  • Pivot table to create the list of all approved companies

IMPORTANT NOTE: An approval number is unique.A company can have several approved processes, but they will all have the same approval number.

  • Update list with other official lists, such as derogatory approvals, ...

France (DONE)

United Kingdom (DONE)

To-do list:

To do

Methodology to extract data for European Food Establishments

General

Build URL Data Files

  • All root directories for EU Approval list by country can be found here https://github.com/openfoodfacts/eu-food-data
  • Build a url listing all needed url. It is best to use FR template (FR-urls.txt to be uploaded), to identify missing data. It is also important to match country specific denomination with OFF taxonomy for EU Approval Section.

Build Raw CSV/TXT files

Get Templated CSV/TXT file

  • Cleanup is needed to ensure data accuracy
  • an XLS file is available for some countries (eg France) to speed up the process
  • Section needs to be normalize to make comparisons between countries

Geocode

  • The best way is to use the geopy library to use nominatim in a Python script
  • To check geocoding accuracy, it is best to make a pivot table of all cities in the target country, and to add a lookup function in the main data file. Adding the name of the country in the address usually lead to better results.
  • Local Authorities List in UK http://localweblist.net/

Polishing files

  • As a company has one approval codes but can have it for several section under the EU regulation (slaughterhouse, warehouse, fishery products - Freezing vessel ...), you must used a pivot table and some cleaning to get a file with the following columns: name of the establishment, approval code, type (concatenation of the Section name under which the company is approved), one colum for each EU section, adresse, latitude, longitude
  • The "type" column is used for search query
  • To get a nice vizualisation and make corrections, use http://blog.perrygeo.net/2013/09/30/leaflet-simple-csv/ to make a map of all listed companies. You can filter company on their section by simply using the search field

Find Additional Data on companies


Country Owner Source format Status Output Deployed ? Remarks
Austria
Belgium
Bulgaria
Croatia
Cyprus
Czech Republic
Denmark
Estonia
Finland
France
Germany
Greece
Hungary
Ireland
Italy
Latvia
Lithuania
Luxembourg
Malta
Netherlands
Poland
Portugal
Romania
Slovakia
Slovenia
Spain


Albania

Andorra

Austria

Belarus

Belgium

Bosnia and Herzegovina

Bulgaria

Croatia

Cyprus

Czech Republic

Denmark

Estonia

Faroe Islands

Finland

France (DONE)

  • I just build a script that takes all the french agreement info from Agriculture Ministry and concatenate them in one file. Next step is to do the same for UK. The step after that is to cleverly agregate the duplicates (some companies have several health agreements under the same agreement number)

https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/FR-script.py This script use this file to get the list of URL to retrieve https://github.com/openfoodfacts/eu-food-data/blob/master/fr/urls-fr.txt

Germany (DONE)

   cat export.csv | awk -F';' '{print "\"" $4 "\"; Deutschland"}' | sed "1d" | sort -u | iconv -f ISO-8859-15 -t UTF-8 > to_geocode.csv

Gibraltar

Greece

Hungary

Iceland

Ireland

Isle of Man

Italy

Geocoded version by Tacite : https://openfoodfacts.slack.com/archives/C02LB7AV0/p1494497561131004

Source (no geocoding) is at http://www.salute.gov.it/consultazioneStabilimenti/ConsultazioneStabilimentiServlet?ACTION=gestioneSingoloPaese&naz=CL 

Note: There is an alternative version with more than 6500 codes available that includes artisan raw milk codes that we're not likely to see on OFF for now. In an effort to get a working version with geocoding quickly, I didn't use it.

Kosovo

Latvia

Liechtenstein

Lithuania

Luxembourg

Macedonia

Malta

Moldova

Monaco

Montenegro

Netherlands (DONE)

Norway

Poland

Portugal

Romania

@lucaa is doing it

Russia

San Marino

Serbia

https://openfoodfacts.slack.com/files/bojackhorseman/F59PY4CR1/version1.csv

Slovakia

Slovenia

Spain

Sweden

Switzerland

Ukraine

United Kingdom (DONE)

Vatican city

Yugoslavia

Inspiration

fr:Projet:Codes_propriétaires


Use of the food establishments list inside Open Food Facts


See-also

Aggregated (but not open) list of EU codes: http://www.eucode.info/

fr:Projet:Etablissements alimentaires