GSOC 2022 - Taxonomy editor: Difference between revisions

From Open Food Facts wiki
(Add presentation to wiki)
(mark deprecated)
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Taxonomy editor
<blockquote>'''<big>ARCHIVE</big>:''' this page is archived. Useful information is on the github repository: https://github.com/openfoodfacts/taxonomy-editor/</blockquote>


=== Summary ===
=== Summary ===


Create an application that enables more contributors to taxonomies, while continuously checking quality.  
The Open Food Facts database contains a lot of information on food products, such as ingredients, labels, additives etc. To link this information to useful properties such as Nutri-Score, Agribalyse and many more, taxonomies are used within the database.
 
A taxonomy in Open Food Facts is a raw text file containing a Directed Acyclic Graph (DAG) where each leaf node has one or more parent nodes. It is mainly used for classification and translation of various food products within the database. Hence, taxonomies are at the heart of data structures in the Open Food Facts database and must be maintained properly.
 
The taxonomy files present in Open Food Facts are long to read (ingredients.txt taxonomy alone has around 80000 lines!) and cumbersome to edit by contributors.
 
This project provides an User-Friendly interface developed for editing taxonomies with ease. This tool is helpful for contributors to visualize a node's translations, properties, parents and children in a single page. The editor allows users to perform CRUD operations on the taxonomy and on the nodes present. A fast search mechanism for finding nodes within the taxonomy has also been implemented successfully.
 
The introduction of this Taxonomy Editor would help existing contributors edit taxonomies seamlessly and will encourage more contributions from the wonderful community of Open Food Facts.


=== Description ===
=== Description ===


'''Status''': Selected for GSOC 2022, Planning
'''Status''': Selected for GSOC 2022, Completed


'''People''':
'''People''':
Line 40: Line 48:


* Timeline proposed by Aadarsh: https://drive.google.com/file/d/1KeNLc-2V1U_zcA-3QkvBm35LxYSItQZc/view?usp=sharing
* Timeline proposed by Aadarsh: https://drive.google.com/file/d/1KeNLc-2V1U_zcA-3QkvBm35LxYSItQZc/view?usp=sharing
* To be discussed


=== Resources / Contributing ===
=== Resources / Contributing ===
Line 47: Line 54:
* Project to track current tasks: https://github.com/orgs/openfoodfacts/projects/28
* Project to track current tasks: https://github.com/orgs/openfoodfacts/projects/28


* [https://slack.openfoodfacts.org Slack] channels: #taxonomies
* [https://slack.openfoodfacts.org Slack] channels: #taxonomies #taxonomy-editor
* Existing documentation on taxonomies : [[Global taxonomies]] - [[Product Opener/Taxonomy Edition]]
* Existing documentation on taxonomies : [[Global taxonomies]] - [[Product Opener/Taxonomy Edition]]


* Initial proposal: https://drive.google.com/file/d/1KeNLc-2V1U_zcA-3QkvBm35LxYSItQZc/view?usp=sharing
* Initial proposal: https://drive.google.com/file/d/1KeNLc-2V1U_zcA-3QkvBm35LxYSItQZc/view?usp=sharing
* Initial project phases presentation (2022/05/24): https://docs.google.com/presentation/d/1mHyN7NGf7PBxH0wSJlhiFsHAruu8NJnlcrzNl4SIVFo/edit?usp=sharing
* Project scope discussion (2022/06/14): https://docs.google.com/presentation/d/1IngF82uWfE2BcYgQ0uaDWwxm_luqzv52Q61bvYNzh4U/edit?usp=sharing
* Meeting minutes: https://docs.google.com/document/d/1tdYkUmoRU8BxFPdCwtewoUi7PV8PmDlXtExOcPYyu-I/edit?usp=sharing
* Meeting minutes: https://docs.google.com/document/d/1tdYkUmoRU8BxFPdCwtewoUi7PV8PmDlXtExOcPYyu-I/edit?usp=sharing
* Figma mockups: https://www.figma.com/file/7QxD2pOnVntjDPqbHHPGHv/Taxonomy-Editor?node-id=0%3A1


=== Archives ===
=== Archives ===
Line 59: Line 65:


[[Category:Project]]
[[Category:Project]]
[[Category:Templates]]
[[Category:Deprecated]]

Latest revision as of 15:36, 18 April 2023

ARCHIVE: this page is archived. Useful information is on the github repository: https://github.com/openfoodfacts/taxonomy-editor/

Summary

The Open Food Facts database contains a lot of information on food products, such as ingredients, labels, additives etc. To link this information to useful properties such as Nutri-Score, Agribalyse and many more, taxonomies are used within the database.

A taxonomy in Open Food Facts is a raw text file containing a Directed Acyclic Graph (DAG) where each leaf node has one or more parent nodes. It is mainly used for classification and translation of various food products within the database. Hence, taxonomies are at the heart of data structures in the Open Food Facts database and must be maintained properly.

The taxonomy files present in Open Food Facts are long to read (ingredients.txt taxonomy alone has around 80000 lines!) and cumbersome to edit by contributors.

This project provides an User-Friendly interface developed for editing taxonomies with ease. This tool is helpful for contributors to visualize a node's translations, properties, parents and children in a single page. The editor allows users to perform CRUD operations on the taxonomy and on the nodes present. A fast search mechanism for finding nodes within the taxonomy has also been implemented successfully.

The introduction of this Taxonomy Editor would help existing contributors edit taxonomies seamlessly and will encourage more contributions from the wonderful community of Open Food Facts.

Description

Status: Selected for GSOC 2022, Completed

People:

  • Aadarsh Anantha Ramakrishnan (GSOC intern)
  • Charles Népote, Stéphane Gigandet (mentors)

Impact (why)

Taxonomies are at the heart of openfoodfacts in many aspects. It helps identify components (ingredients, labels, brands,…) and link them to useful properties, at the base of nutri-score, eco-score, allergens identification and some other properties.

Each taxonomy is a DAG (directed acyclic graph) where leaves have one or more parents. Currently the taxonomy is in a raw text file in our repository: https://github.com/openfoodfacts/openfoodfacts-server/tree/main/taxonomies.

While effective for the application, this format is quite cumbersome to edit for contributors.

Expected outcomes (what)

We would like to have a tool (online or standalone) to edit taxonomies.

Expected outcomes:

The tool should:

  • help quickly find an element with a search
  • help visualize the hierarchy of components
  • help visualize the component, it’s synonyms in multiple languages
  • indicate inherited properties for an element, and signal when there are more than one
  • enable edition of those names, synonyms and properties
  • run some validation on names, synonyms and properties (no duplicate, specific formats, etc.)

As a bonus, it would be really interesting to know the impact of a modification on the application. For that we could imagine simple API’s (one for each taxonomy) on the openfoodfact application to visualize which products would be affected by a change. This feedback could be a really interesting tool to ensure no error is made (unexpected side effects)

Timeline

Resources / Contributing

Archives