Food Traceability Codes/EU Food establishments

From Open Food Facts wiki
Revision as of 09:22, 5 February 2016 by Teolemon (talk | contribs)

The goal of this project is to establish a list of food establishments, to gather data about those establishments, to map food products to food establishments (using identifying codes and/or company names and street addresses), to use the data in the Open Food Facts applications and to enable other applications.

Current Uses on Open Food Facts

Regulation

It's possible to know the origin of food products thanks to several marks on the tag: health mark (processing company identification) and/or the packager number (packaging company identification) when its full name is not displayed:

  • Health marks (estampille sanitaire in French) identify processing facilities that prepare, treat, transform, manipulate or store animal products or products from animal origins. For european countries, this mark is an oval shape on the package. The mark displays the information of the plant that processed the products: two letters for the country (FR for France, UK for United Kingdom, EG for Germany, EB for Latvia ...), two or three letter for the region (department number in France), three digits for the town (INSEE number is used in France, not postal code), and then the last digits identify the plant itself.
  • Packager number: it's the identification code for the packaging companies or the importer when its name is not displayed. Under certain circonstances, it can replace the name of the producer when the production is subcontracted.

Food establishments codes and sources

Codes listed of food products labels includes:

Aggregated list of food establishments

Technical details

How to process official lists

  • Use of french Agriculture Ministry files (hosted at https://github.com/openfoodfacts/eu-food-data/blob/master/fr/urls-fr.txt) to extract list of certification (CE Approval number).
  • Definition of a common structure for the table, such as:
    • Type (name of category used by administration, ie name of the list)
    • Libelle autorisation/Approval description (for fish processing facilities)
    • Numero de departement/Department number
    • Numero agrement/Approval number
    • SIRET/Local Number
    • Processus/Process (for fish processing facilities)
    • Raison Sociale/Name
    • Adresse/Adress
    • Code Postal/Postal Code
    • Commune/Town
    • Espece/Species (for wild games or fish processing facilities)
  • Concatenation of file to obtain a list of all approved processes
  • Pivot table to create the list of all approved companies

IMPORTANT NOTE: An approval number is unique.A company can have several approved processes, but they will all have the same approval number.

  • Update list with other official lists, such as derogatory approvals, ...

France

United Kingdom

To-do list:

To do

Methodology to extract data for European Food Establishments

General

Build URL Data Files

  • All root directories for EU Approval list by country can be found here https://github.com/openfoodfacts/eu-food-data
  • Build a url listing all needed url. It is best to use FR template (FR-urls.txt to be uploaded), to identify missing data. It is also important to match country specific denomination with OFF taxonomy for EU Approval Section.

Build Raw CSV/TXT files

Get Templated CSV/TXT file

  • Cleanup is needed to ensure data accuracy
  • an XLS file is available for some countries (eg France) to speed up the process
  • Section needs to be normalize to make comparisons between countries

Geocode

  • The best way is to use the geopy library to use nominatim in a Python script
  • To check geocoding accuracy, it is best to make a pivot table of all cities in the target country, and to add a lookup function in the main data file. Adding the name of the country in the address usually lead to better results.
  • Local Authorities List in UK http://localweblist.net/

Polishing files

  • As a company has one approval codes but can have it for several section under the EU regulation (slaughterhouse, warehouse, fishery products - Freezing vessel ...), you must used a pivot table and some cleaning to get a file with the following columns: name of the establishment, approval code, type (concatenation of the Section name under which the company is approved), one colum for each EU section, adresse, latitude, longitude
  • The "type" column is used for search query
  • To get a nice vizualisation and make corrections, use http://blog.perrygeo.net/2013/09/30/leaflet-simple-csv/ to make a map of all listed companies. You can filter company on their section by simply using the search field

Find Additional Data on companies

France

  • I just build a script that takes all the french agreement info from Agriculture Ministry and concatenate them in one file. Next step is to do the same for UK. The step after that is to cleverly agregate the duplicates (some companies have several health agreements under the same agreement number)

https://github.com/openfoodfacts/eu-food-data/blob/master/scripts/FR-script.py This script use this file to get the list of URL to retrieve https://github.com/openfoodfacts/eu-food-data/blob/master/fr/urls-fr.txt

UK

DE

   cat export.csv | awk -F';' '{print "\"" $4 "\"; Deutschland"}' | sed "1d" | sort -u | iconv -f ISO-8859-15 -t UTF-8 > to_geocode.csv

NL

Inspiration

fr:Projet:Codes_propriétaires


Use of the food establishments list inside Open Food Facts


See-also

Aggregated (but not open) list of EU codes: http://www.eucode.info/

fr:Projet:Etablissements alimentaires