Server-side product indexing and search: Difference between revisions

From Open Food Facts wiki
No edit summary
No edit summary
Β 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Summary ==
== Summary ==


Server-side product similarity indexing and search is the 2nd of the 4 sub-tasks of the [[Project:Personalized_Search]] funded by the NGI0 Discovery Fund managed by NlNet.
Server-side product indexing and search is the 2nd of the 4 sub-tasks of the [[Project:Personalized_Search]] funded by the NGI0 Discovery Fund managed by NlNet.


This page documents the progress made in Q2 2020.
This page documents the progress made in Q2, Q3 and Q4 2020.


== Overview ==
== Overview ==
Line 11: Line 11:
Diagram source: https://vecta.io/app/edit/-M2XyVv8ZoaLNrW-zQoT
Diagram source: https://vecta.io/app/edit/-M2XyVv8ZoaLNrW-zQoT


# The Open Food Facts mobile app (and 3rd party apps) make generic search requests that do not contain user preferences
# The server returns a big number of generic results
# The app uses the user preferences stored locally to personalize the search results


1 The Open Food Facts mobile app (and 3rd party apps) make generic search requests that do not contain user preferences
== Initial research and specifications (completed in May and June 2020) ==
1 The server returns a big number of generic results
1 The app uses the user preferences stored locally to personalize the search results


== Functional specs ==
At the start of the project, we evaluated the different options for server-side product indexing and search:


=== Search API ===
* [[Server-side product indexing and search - Functional Specs]]
* [[Server-side product indexing and search - Technical Specs]]


The search API is called by the app to retrieve a high number of generic search results (products) that match a query.
== New Search API (completed in August and September 2020) ==


==== Match (required) ====
The existing Open Food Facts search API is outdated and hacky (it was built on top of the OFF web site search form and is unnecessarily convoluted) and does not support some of the requirements for the Personal Search project (in particular being able to retrieve a given set of products using their barcodes).


The search query needs to contain at least one required criteria.
We have create a new a new [[Open Food Facts Search API Version 2]] that is simpler but also more powerful.


Here are potential criteria that we could consider:
Key features of the new Search API:
* Simplified parameter specification
* Tag parameters (e.g. categories, labels, ingredients) can be searched in any language
* Support for AND, NOT and OR queries for tags fields (e.g. product with all those labels, none of those labels, or one of those labels)
* Allow to sort results by popularity of products (most scanned products)
* New /api/v2/search (JSON) and /search (OFF web site) endpoints that accept the same parameters


===== Match on category =====
== New Product Attributes for all search features (completed from August to October 2020) ==


The query specifies a category tag (e.g. en:cookies).
We have created new [[Product Attributes]] that allow clients (like apps but also the OFF web site) to easily filter and rank search results according to the user preference, and to explain to users how well the products match their preferences.


Pros:
Key features of product attributes:
* Can be used to display product recommendations for a given product (using the most specific category of the product)
* All search criteria / features (e.g. nutritional quality, if a product is vegan, contains a specific allergen etc.) are computed individually on the server side, and then made available to clients in the same normalized format
* Can be used for a search box that is limited to categories (e.g. suggest as you type restricted to known categories)
** Clients can implement local filtering and ranking with [[Client-side libraries for personalized product filtering and ranking|a very simple algorithm]]. All the complexity is of the different search features is handled on the server side.
* Can be used for a per category search interface that display 1st level choices, 2nd level etc. (e.g. first click on "Vegetables", then "Bananas" etc.)
** Clients can get localized names, descriptions etc. for each search feature in the language they choose.
* There could be ways to pre-compute per category results that could be used to speed up queries
** New search features (e.g. a new diet, a new environmental or social score) can be made available to clients without any code change on the clients.
Cons:
* Does not support freely typed search queries (e.g. with a product name or brand)


== Next milestone: Client-side libraries for personalized product filtering and ranking ==


===== Match on keywords =====
To make it even easier to use the new Search API and Product Attributes, we are developing [[Client-side libraries for personalized product filtering and ranking]] in Javascript and DART (for Flutter apps).
Β 
Allow users to enter keywords that can be matched indifferently to product name, brand, categories etc. This is how the current search function on the OFF web site is implemented.
Β 
Pros:
* Can be used for a search box
Cons:
* Currently does not work well for products in multiple languages
* Difficult to do pre-computations to optimize queries
Β 
===== Match on product ======
Β 
Given a product as input, return "similar" products.
Β 
Pros:
* Can be used for products recommendations
* Similarity could be based on other things that the category (e.g. labels, ingredients, brands etc.)
Cons:
* It could be very expensive to retrieve similar products in real-time or to pre-compute all of them
* Does not work for category search of freely typed search queries
Β 
==== Optional filters ====
Β 
The search query can contain filters to restrict the result set. e.g. the country where the products are sold.
Β 
Those filters may also include user preferences for apps who do not want to do the personalization locally.
Β 
Β 
=== Server-side indexing ===
Β 
Based on the type of queries we want to support, we may be able and may need to pre-compute and store new information in the database to better support the queries.
Β 
==== Similarity ====
Β 
==== Non personalized sort order ====
Β 
It is unrealistic to return all possible results to the calling app for all queries. E.g. if the query is "cookies", we cannot return tens of thousands of results. So we will need to use some kind of reasonable sort order that returns the most useful results for the app, without limiting the options for personnalization of the results.
Β 
Possible criteria to include in the sort order:
Β 
===== Product quality =====
Β 
If we expect that the app will only offer "better" products and not worst products, we may include things like the Nutri-Score or NOVA in the sort order.
Β 
===== Popularity =====
Β 
We could also include a measure of product popularity (e.g. derived from scans) in the sort order.
Β 
== Technical specs ==
Β 


[[Category:Project:Personalized_Search]]
[[Category:Project:Personalized_Search]]
[[Category:ProductOpener]]
[[Category:Search]]

Latest revision as of 11:55, 11 October 2022

Summary

Server-side product indexing and search is the 2nd of the 4 sub-tasks of the Project:Personalized_Search funded by the NGI0 Discovery Fund managed by NlNet.

This page documents the progress made in Q2, Q3 and Q4 2020.

Overview

Diagram source: https://vecta.io/app/edit/-M2XyVv8ZoaLNrW-zQoT

  1. The Open Food Facts mobile app (and 3rd party apps) make generic search requests that do not contain user preferences
  2. The server returns a big number of generic results
  3. The app uses the user preferences stored locally to personalize the search results

Initial research and specifications (completed in May and June 2020)

At the start of the project, we evaluated the different options for server-side product indexing and search:

New Search API (completed in August and September 2020)

The existing Open Food Facts search API is outdated and hacky (it was built on top of the OFF web site search form and is unnecessarily convoluted) and does not support some of the requirements for the Personal Search project (in particular being able to retrieve a given set of products using their barcodes).

We have create a new a new Open Food Facts Search API Version 2 that is simpler but also more powerful.

Key features of the new Search API:

  • Simplified parameter specification
  • Tag parameters (e.g. categories, labels, ingredients) can be searched in any language
  • Support for AND, NOT and OR queries for tags fields (e.g. product with all those labels, none of those labels, or one of those labels)
  • Allow to sort results by popularity of products (most scanned products)
  • New /api/v2/search (JSON) and /search (OFF web site) endpoints that accept the same parameters

New Product Attributes for all search features (completed from August to October 2020)

We have created new Product Attributes that allow clients (like apps but also the OFF web site) to easily filter and rank search results according to the user preference, and to explain to users how well the products match their preferences.

Key features of product attributes:

  • All search criteria / features (e.g. nutritional quality, if a product is vegan, contains a specific allergen etc.) are computed individually on the server side, and then made available to clients in the same normalized format
    • Clients can implement local filtering and ranking with a very simple algorithm. All the complexity is of the different search features is handled on the server side.
    • Clients can get localized names, descriptions etc. for each search feature in the language they choose.
    • New search features (e.g. a new diet, a new environmental or social score) can be made available to clients without any code change on the clients.

Next milestone: Client-side libraries for personalized product filtering and ranking

To make it even easier to use the new Search API and Product Attributes, we are developing Client-side libraries for personalized product filtering and ranking in Javascript and DART (for Flutter apps).