Server-side product indexing and search: Difference between revisions

From Open Food Facts wiki
Tags: Mobile edit Mobile web edit
Tags: Mobile edit Mobile web edit
Line 51: Line 51:
* Difficult to do pre-computations to optimize queries
* Difficult to do pre-computations to optimize queries


===== Match on product ======
===== Match on product =====


Given a product as input, return "similar" products.
Given a product as input, return "similar" products.

Revision as of 16:11, 19 May 2020

Summary

Server-side product similarity indexing and search is the 2nd of the 4 sub-tasks of the Project:Personalized_Search funded by the NGI0 Discovery Fund managed by NlNet.

This page documents the progress made in Q2 2020.

Overview

Diagram source: https://vecta.io/app/edit/-M2XyVv8ZoaLNrW-zQoT


1 The Open Food Facts mobile app (and 3rd party apps) make generic search requests that do not contain user preferences 1 The server returns a big number of generic results 1 The app uses the user preferences stored locally to personalize the search results

Functional specs

Search API

The search API is called by the app to retrieve a high number of generic search results (products) that match a query.

Match (required)

The search query needs to contain at least one required criteria.

Here are potential criteria that we could consider:

Match on category

The query specifies a category tag (e.g. en:cookies).

Pros:

  • Can be used to display product recommendations for a given product (using the most specific category of the product)
  • Can be used for a search box that is limited to categories (e.g. suggest as you type restricted to known categories)
  • Can be used for a per category search interface that display 1st level choices, 2nd level etc. (e.g. first click on "Vegetables", then "Bananas" etc.)
  • There could be ways to pre-compute per category results that could be used to speed up queries

Cons:

  • Does not support freely typed search queries (e.g. with a product name or brand)


Match on keywords

Allow users to enter keywords that can be matched indifferently to product name, brand, categories etc. This is how the current search function on the OFF web site is implemented.

Pros:

  • Can be used for a search box

Cons:

  • Currently does not work well for products in multiple languages
  • Difficult to do pre-computations to optimize queries
Match on product

Given a product as input, return "similar" products.

Pros:

  • Can be used for products recommendations
  • Similarity could be based on other things that the category (e.g. labels, ingredients, brands etc.)

Cons:

  • It could be very expensive to retrieve similar products in real-time or to pre-compute all of them
  • Does not work for category search of freely typed search queries

Optional filters

The search query can contain filters to restrict the result set. e.g. the country where the products are sold.

Those filters may also include user preferences for apps who do not want to do the personalization locally.


Server-side indexing

Based on the type of queries we want to support, we may be able and may need to pre-compute and store new information in the database to better support the queries.

Similarity

Non personalized sort order

It is unrealistic to return all possible results to the calling app for all queries. E.g. if the query is "cookies", we cannot return tens of thousands of results. So we will need to use some kind of reasonable sort order that returns the most useful results for the app, without limiting the options for personnalization of the results.

Possible criteria to include in the sort order:

Product quality

If we expect that the app will only offer "better" products and not worst products, we may include things like the Nutri-Score or NOVA in the sort order.

Availability

We can filter by country of availability, by store (or stores) of availability, and number of scans (a proxy for wide availability)

Popularity

We could also include a measure of product popularity (e.g. derived from scans) in the sort order.

Technical specs