Data imports
(Redirected from Project:Data imports)
- As of today (21/11/2016), almost all the data on Open Food Facts comes from crowdsourcing (through Open Food Facts or other apps that allow to upload pictures and/or data).
- There are a few instances of open databases or manufacturers / distributors submitted content that we could import on Open Food Facts. This page is to discuss how we could do it.
External databases that we have identified/imported
- Data imports/Switzerland (imported several times)
- Data imports/United States (imported several times)
- FR:SpƩculoos (dropped)
- Producer data (on-going effort)
Technical Design
Objectives
- Traceability: we need to find a way to make sure that where each piece of data is coming from, when it was added, it has been modified since etc.
- Needed for attribution
- Needed for transparency
Proposed design
Proposal A
For each field (e.g. ingredients, nutrients, product name, brands etc.) add a corresponding [field name]_sources (e.g. product_name_sources), with an array of import entries. e.g.
- "ingredients_text_en":"TOMATOES (TOMATOES AND FIRE ROASTED TOMATOES, TOMATO JUICE, CITRIC ACID, CALCIUM CHLORIDE), WHITE WINE VINEGAR, CARROTS, WATER, YELLOW ONION, HABANERO CHILI PEPPER (HABANERO CHILI PEPPERS, WATER, SALT, CITRIC ACID), MUSTARD (DISTILLED VINEGAR, WATER, MUSTARD SEED, SALT, TURMERIC, SPICES), ORGANIC CANE SUGAR, SALT, MODIFIED FOOD STARCH, GARLIC, SUNFLOWER OIL, HERBS AND SPICES."
- "ingredients_text_en_sources": [
- {
- "id", "usda-ndb",
- "url", "https://ndb.nal.usda.gov/ndb/foods/show/58513?format=Abridged&reportfmt=csv&Qv=1" (direct product url if available)
- "import_t", "423423" (timestamp of import date)
- "content", "TOMATOES, WHITE WINE VINEGAR, CARROTS, WATER, YELLOW ONION, HABANERO CHILI PEPPER, MUSTARD, ORGANIC CANE SUGAR, SALT, MODIFIED FOOD STARCH, GARLIC, SUNFLOWER OIL, HERBS AND SPICES."
- },
- { (another import later)
- "id", "usda-ndb",
- "url", "https://ndb.nal.usda.gov/ndb/foods/show/58513?format=Abridged&reportfmt=csv&Qv=1" (direct product url if available)
- "import_t", "523423" (timestamp of import date)
- "content", "TOMATOES (TOMATOES AND FIRE ROASTED TOMATOES, TOMATO JUICE, CITRIC ACID, CALCIUM CHLORIDE), WHITE WINE VINEGAR, CARROTS, WATER, YELLOW ONION, HABANERO CHILI PEPPER (HABANERO CHILI PEPPERS, WATER, SALT, CITRIC ACID), MUSTARD (DISTILLED VINEGAR, WATER, MUSTARD SEED, SALT, TURMERIC, SPICES), ORGANIC CANE SUGAR, SALT, MODIFIED FOOD STARCH, GARLIC, SUNFLOWER OIL, HERBS AND SPICES."
- },
- {
Proposal B
A version of proposal A with less repeated content for each imported field?
We add only one field called "sources" that references the fields that have been imported.
- "sources": [
- {
- "id", "usda-ndb",
- "url", "https://ndb.nal.usda.gov/ndb/foods/show/58513?format=Abridged&reportfmt=csv&Qv=1" (direct product url if available)
- "import_t", "423423" (timestamp of import date)
- "fields" : ["product_name","ingredients","nutrients"]
- "images" : [ "1", "2", "3" ] (images ids)
- },
- {
- "id", "usda-ndb",
- "url", "https://ndb.nal.usda.gov/ndb/foods/show/58513?format=Abridged&reportfmt=csv&Qv=1" (direct product url if available)
- "import_t", "523423" (timestamp of import date)
- "fields" : ["ingredients","nutrients"]
- "images" : [ "4", "5", "6" ] (images ids)
- },
- {