Data imports

From Open Food Facts wiki
(Redirected from Project:Data imports)


  • As of today (21/11/2016), almost all the data on Open Food Facts comes from crowdsourcing (through Open Food Facts or other apps that allow to upload pictures and/or data).
  • There are a few instances of open databases or manufacturers / distributors submitted content that we could import on Open Food Facts. This page is to discuss how we could do it.

External databases that we have identified/imported

Technical Design

Objectives

  • Traceability: we need to find a way to make sure that where each piece of data is coming from, when it was added, it has been modified since etc.
    • Needed for attribution
    • Needed for transparency

Proposed design

Proposal A

For each field (e.g. ingredients, nutrients, product name, brands etc.) add a corresponding [field name]_sources (e.g. product_name_sources), with an array of import entries. e.g.

  • "ingredients_text_en":"TOMATOES (TOMATOES AND FIRE ROASTED TOMATOES, TOMATO JUICE, CITRIC ACID, CALCIUM CHLORIDE), WHITE WINE VINEGAR, CARROTS, WATER, YELLOW ONION, HABANERO CHILI PEPPER (HABANERO CHILI PEPPERS, WATER, SALT, CITRIC ACID), MUSTARD (DISTILLED VINEGAR, WATER, MUSTARD SEED, SALT, TURMERIC, SPICES), ORGANIC CANE SUGAR, SALT, MODIFIED FOOD STARCH, GARLIC, SUNFLOWER OIL, HERBS AND SPICES."
  • "ingredients_text_en_sources": [
    • {
    • },
    • { (another import later)
      • "id", "usda-ndb",
      • "url", "https://ndb.nal.usda.gov/ndb/foods/show/58513?format=Abridged&reportfmt=csv&Qv=1" (direct product url if available)
      • "import_t", "523423" (timestamp of import date)
      • "content", "TOMATOES (TOMATOES AND FIRE ROASTED TOMATOES, TOMATO JUICE, CITRIC ACID, CALCIUM CHLORIDE), WHITE WINE VINEGAR, CARROTS, WATER, YELLOW ONION, HABANERO CHILI PEPPER (HABANERO CHILI PEPPERS, WATER, SALT, CITRIC ACID), MUSTARD (DISTILLED VINEGAR, WATER, MUSTARD SEED, SALT, TURMERIC, SPICES), ORGANIC CANE SUGAR, SALT, MODIFIED FOOD STARCH, GARLIC, SUNFLOWER OIL, HERBS AND SPICES."
    • },


Proposal B

A version of proposal A with less repeated content for each imported field?

We add only one field called "sources" that references the fields that have been imported.