Global stores taxonomy: Difference between revisions

From Open Food Facts wiki
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
See [[Global taxonomies]] for instructions.
See [[Global taxonomies]] for instructions.
The taxonomy was moved to GitHub: https://github.com/openfoodfacts/openfoodfacts-server/blob/master/taxonomies/stores.txt
[[Category:Global_Taxonomies]]
[[Category:Global_Taxonomies]]
[[Category:Stores]]
== Introduction ==
== Introduction ==
The stores taxonomy contains a list of locations where products (food products, beauty products, etc.) can be obtained. A related store brands taxonomy describes the brand names (store chains) of the individual stores.
The stores taxonomy contains a list of locations where products (food products, beauty products, etc.) can be obtained. A related store brands taxonomy describes the brand names (store chains) of the individual stores.
Line 9: Line 12:
An entry in the taxonomy describes a single location. This taxonomy can combine data pulled from various sources: receipts from prices, osm, wikidata, etc or provided by the user (honour system). OFF should be careful not copy to much data already maintained by other parties. And if data is copied, it is to provide a snapshot in time.
An entry in the taxonomy describes a single location. This taxonomy can combine data pulled from various sources: receipts from prices, osm, wikidata, etc or provided by the user (honour system). OFF should be careful not copy to much data already maintained by other parties. And if data is copied, it is to provide a snapshot in time.
=== Receipts ===
=== Receipts ===
[[File:Receipt example.png|thumb|Example of store information on a receipt]]
[[File:Example receipt heading.png|100 px|thumb|right|Example receipt heading]]
The basis could be the receipts that users have added on prices. As all OFF data, any information should be backed up by image proofs. In the case for a store, this could be a receipt. The store information available on the receipt can be extracted to the taxonomy. What can be extracted depends on the receipt. In the example shown here, it is possible to extract the store name, the operator (useful for franchises), the address, phone number, siret, naf and tva. Analysus of more receipts might reveal other country specific information.
The basis could be the receipts that users have added on prices. As all OFF data, any information should be backed up by image proofs. In the case for a store, this could be a receipt. The store information available on the receipt can be extracted to the taxonomy. What can be extracted depends on the receipt. In the example shown here, it is possible to extract the store name, the operator (useful for franchises), the address, phone number, siret, naf and tva. Analysus of more receipts might reveal other country specific information.
Hopefully this information is enough to unambiguously identify a store.
Hopefully this information is enough to unambiguously identify a store.
Line 24: Line 27:


Several of the fields are pulled from OSM (name, postcode, city, country, lat, lon). These need to be regularly updated, so that OSM is followed. Any deleted location should be kept.
Several of the fields are pulled from OSM (name, postcode, city, country, lat, lon). These need to be regularly updated, so that OSM is followed. Any deleted location should be kept.
=== Store brand ===
The current implementation of OFF already supports a field for Stores. It is a free text field where the user can enter whatever s/he wants. A variety of entries is the consequence. But most of all it is used to enter the value for the Store brand, i.e. the name of the store as is shown outside the store and on receipts.
Note that this can also be imported from OSM by means of the receipts. On OSM it is indicated with the tag ''brand''.
OSM also indicates a link to wikidata (for instance [https://www.wikidata.org/wiki/Q151954 Lidl]).
There are also ideas how to kickstart a list of store brands from external sources.


=== Example Taxonomy Entry===
=== Example Taxonomy Entry===
Line 46: Line 40:
<nowiki>osm_lat:en: 48.8259597</nowiki><br>
<nowiki>osm_lat:en: 48.8259597</nowiki><br>
<nowiki>osm_lon:en: 2.3511541</nowiki><br>
<nowiki>osm_lon:en: 2.3511541</nowiki><br>
== Store brand entries ==
The current implementation of OFF already supports a field for Stores. It is a free text field where the user can enter whatever s/he wants. A variety of entries is the consequence. But most of all it is used to enter the value for the Store brand, i.e. the name of the store as is shown outside the store and on receipts.
=== Fields ===
* store-brand - the name of the store as found on receipts
* wikidata - the entry on wikidata that contains more information on the supermarket chain.
* countries - a list of countries where the chain currently operates (has stores). This information can be pulled from wikidata.
* expiration-date-country - some kind of date when store no longer has any stores in a specific country. For instance user chain [https://www.wikidata.org/wiki/Q925132 Dia] used to be active in France and OFF still has products for this.
=== Comments ===
Note that this can also be imported from OSM by means of the receipts. On OSM it is indicated with the tag ''brand''.
OSM also indicates a link to wikidata (for instance [https://www.wikidata.org/wiki/Q151954 Lidl]).
=== Example Taxonomy Entry ===
<nowiki>en:store-brand: en:Biocoop</nowiki>
<br>
<nowiki>countries:en: en:france</nowiki>
<br>
<nowiki>wikidata:en: Q2904039</nowiki>


== Lifecycle ==
== Lifecycle ==
Line 63: Line 76:
Editing involves changing existing items in the taxonomies.
Editing involves changing existing items in the taxonomies.
==== Extending ====
==== Extending ====
Extending involves adding new items to the taxonomies.
Extending involves adding new items to the taxonomies through all possible applications.
The stores taxonomy needs to be regularly updated from the data entered through [https://prices.openfoodfacts.org/ OFF Prices]. This means any changed values and any added entries/
* Stores Taxonomy
** OFF Prices - new stores can be found through new location_id and can be double checked against OSM id;
** Productopener - use OSM id to check whether something is new;
** API - use OSM id to check whether something is new;
* Store Brands Taxonomy
** OFF Prices - new store brands via OSM and wikidata;
** Productopener - new store brands via OSM and wikidata;
** API - new store brands via OSM and wikidata;


== Spinoff ==
== Spinoff ==
Information on stores, operators and owners is not the main goal of OFF. OFF prefers that this is maintained by more specialised other parties. However OFF does gather this information through receipts for instance. Users are encouraged to add this kind of information to OpenStreetMap or wikidata.
Information on stores, operators and owners is not the main goal of OFF. OFF prefers that this is maintained by more specialised other parties. However OFF does gather some information through receipts for instance (full address, phone number, opening hours, operator, siret, tva, naf). Users are encouraged to add this kind of information to OpenStreetMap or wikidata.


== Use cases ==
== Use cases ==
* OFF Prices - To have the prices of products for individual stores and brands.
* https://github.com/danslimmon/oscar Throw out a package, add to shopping list, potentially you can do an online order/click and collect (because the store * is known, and has wikidata, and a url)
* Walk into an area with a geofence - https://github.com/alltheplaces/alltheplaces / open streetmap - have home assistant push a notification to get the three items from that Wikidata:brand store that are on your list
* I need (uncommon ingredient or food); from the places it is sold which is closest to me?
*I scanned an item. I'm standing inside an OSM shop/supermarket at (geofence). It has a Wikidata:brand. Would I like to add this as the store that sells this food?
* Provides a way via name suggestion index to get a clear logo for a given brand of it is a store brand with more data captured in open food facts, Provides some insight into supply chains
=== Robotoff ===
* We currently maintain a handmade list of stores that can be safely inferred from OCR and automatically applied: https://github.com/openfoodfacts/robotoff/blob/main/data/ocr/store_regex.txt


== Discussion points ==
== Discussion points ==
Line 80: Line 108:
* We should try to add Wikidata relations (and thus OSM ones) whenever possible
* We should try to add Wikidata relations (and thus OSM ones) whenever possible
* Chains - if a store is part of a chain with multiple stores it can be added through a wikidata link
* Chains - if a store is part of a chain with multiple stores it can be added through a wikidata link
Moved to GitHub: https://github.com/openfoodfacts/openfoodfacts-server/blob/master/taxonomies/stores.txt

Latest revision as of 09:52, 16 August 2024

See Global taxonomies for instructions.

The taxonomy was moved to GitHub: https://github.com/openfoodfacts/openfoodfacts-server/blob/master/taxonomies/stores.txt

Introduction

The stores taxonomy contains a list of locations where products (food products, beauty products, etc.) can be obtained. A related store brands taxonomy describes the brand names (store chains) of the individual stores.

Most of the data related to these taxonomies is obtained through third parties (Wikidata, OpenStreetMap). The main interest of OFF is the relation between products, stores and brands.

Store entries

An entry in the taxonomy describes a single location. This taxonomy can combine data pulled from various sources: receipts from prices, osm, wikidata, etc or provided by the user (honour system). OFF should be careful not copy to much data already maintained by other parties. And if data is copied, it is to provide a snapshot in time.

Receipts

Example receipt heading

The basis could be the receipts that users have added on prices. As all OFF data, any information should be backed up by image proofs. In the case for a store, this could be a receipt. The store information available on the receipt can be extracted to the taxonomy. What can be extracted depends on the receipt. In the example shown here, it is possible to extract the store name, the operator (useful for franchises), the address, phone number, siret, naf and tva. Analysus of more receipts might reveal other country specific information. Hopefully this information is enough to unambiguously identify a store.

Prices

The stores taxonomy can be based on OFF Prices. The receipts corresponding to a shop are entered on Prices. Prices has a unique identifier for each shop, which should be used as a linking pin. In addition Prices offers a link with Openstreetmap, which can be used to pull in additional information. Some of this information is already cached by Prices.

Open Street Map

All stores are linked to Open Street Map (OSM) through the unique OSM id. The link refers to a basic object in OSM:

  • Node - is used to refer to a single location, like BP Service;
  • (Closed) Way - is used to refer to a building, like IKEA;
  • Relation - is used to refer multiple ways used by the shop, like Carrefour;

Using a Way to designate a shopping location might lead to inconsistencies, like this actual location. Which shop on the street is meant? Using a relation might lead to inconsistencies as well, like this actual location. This can be limited to OSM objects that have the tag shop.

Several of the fields are pulled from OSM (name, postcode, city, country, lat, lon). These need to be regularly updated, so that OSM is followed. Any deleted location should be kept.

Example Taxonomy Entry

An example for an entry / location / store / shop in the taxonomy:
store_brand:en: en:Biocoop
price_id:en: 45
osm_id:en: 9815975601
osm_type:en: NODE
osm_display_name:en: en:"Biocoop, Rue de Tolbiac, Quartier de la Maison-Blanche, Paris 13e Arrondissement, Paris, Île-de-France, France métropolitaine, 75013, France"
osm_address_postcode:en: 75013
osm_address_city:en: en:Paris
osm_address_country:en: en:france
osm_lat:en: 48.8259597
osm_lon:en: 2.3511541

Store brand entries

The current implementation of OFF already supports a field for Stores. It is a free text field where the user can enter whatever s/he wants. A variety of entries is the consequence. But most of all it is used to enter the value for the Store brand, i.e. the name of the store as is shown outside the store and on receipts.

Fields

  • store-brand - the name of the store as found on receipts
  • wikidata - the entry on wikidata that contains more information on the supermarket chain.
  • countries - a list of countries where the chain currently operates (has stores). This information can be pulled from wikidata.
  • expiration-date-country - some kind of date when store no longer has any stores in a specific country. For instance user chain Dia used to be active in France and OFF still has products for this.

Comments

Note that this can also be imported from OSM by means of the receipts. On OSM it is indicated with the tag brand.

OSM also indicates a link to wikidata (for instance Lidl).


Example Taxonomy Entry

en:store-brand: en:Biocoop
countries:en: en:france
wikidata:en: Q2904039

Lifecycle

The lifecycle for the two taxonomies is related, so they will be discussed together.

Kickstart

The kickstart describes the initial, very first introduction of the taxonomies.

Store brands

The store brands taxonomy could be started with the current entries in the Stores field. Unfortunately the quality of these current entries seems rather low. It is preferred to start with a high quality list pulled in from external sources, like all supermarket chains listed on Wikidata.

As a subsequent step all existing entries could be mapped to this store brands taxonomy.

Stores

The stores taxonomy can initially be pulled from OFF Prices. The data only needs to be formatted in the correct way. The stores information provided by OFF Prices will then be used as a starting point.

Maintenance

Editing

Editing involves changing existing items in the taxonomies.

Extending

Extending involves adding new items to the taxonomies through all possible applications.

  • Stores Taxonomy
    • OFF Prices - new stores can be found through new location_id and can be double checked against OSM id;
    • Productopener - use OSM id to check whether something is new;
    • API - use OSM id to check whether something is new;
  • Store Brands Taxonomy
    • OFF Prices - new store brands via OSM and wikidata;
    • Productopener - new store brands via OSM and wikidata;
    • API - new store brands via OSM and wikidata;

Spinoff

Information on stores, operators and owners is not the main goal of OFF. OFF prefers that this is maintained by more specialised other parties. However OFF does gather some information through receipts for instance (full address, phone number, opening hours, operator, siret, tva, naf). Users are encouraged to add this kind of information to OpenStreetMap or wikidata.

Use cases

  • OFF Prices - To have the prices of products for individual stores and brands.
  • https://github.com/danslimmon/oscar Throw out a package, add to shopping list, potentially you can do an online order/click and collect (because the store * is known, and has wikidata, and a url)
  • Walk into an area with a geofence - https://github.com/alltheplaces/alltheplaces / open streetmap - have home assistant push a notification to get the three items from that Wikidata:brand store that are on your list
  • I need (uncommon ingredient or food); from the places it is sold which is closest to me?
  • I scanned an item. I'm standing inside an OSM shop/supermarket at (geofence). It has a Wikidata:brand. Would I like to add this as the store that sells this food?
  • Provides a way via name suggestion index to get a clear logo for a given brand of it is a store brand with more data captured in open food facts, Provides some insight into supply chains

Robotoff

Discussion points

  • What information should be stored by OFF and what can be left to OSM or Wikidata?
  • What to be done with shops that go out of business?

Note

  • We need to associate stores with brands that are sold in thoses stores (Auchan for Auchan…)
  • We should try to add Wikidata relations (and thus OSM ones) whenever possible
  • Chains - if a store is part of a chain with multiple stores it can be added through a wikidata link