Jump to content

Search API V3: Difference between revisions

m
Minor changes
(Add links)
m (Minor changes)
Line 3: Line 3:


=== Goals ===
=== Goals ===
Create a new search API to facilitate:
Create new search APIs to facilitate:


* Autocomplete (currently unsupported)
* Autocomplete (currently unsupported)
Line 22: Line 22:


=== Configuration ===
=== Configuration ===
We will use Elasticsearch 8.3.3 (latest), deployed via Docker. We will use a replication factor of one, with shards split across two nodes.
We will use Elasticsearch 8.3.3 (latest), deployed via Docker. We will use a replication factor of one (ie, two copies of the data in total), with shards split across two nodes.


From testing on a M1 Macbook Pro, we see:
From testing on a M1 Macbook Pro, we see:
Line 35: Line 35:


=== Monitoring ===
=== Monitoring ===
We will use [https://elasticvue.com/ elasticvue] to see information such as resource usage, sharding, and perform debugging. Query information can be seen at elasticvue --> indices --> settings/cog --> Show stats.  
We will use [https://elasticvue.com/ elasticvue] to see information such as resource usage, sharding, and perform debugging. Query stats (to monitor ongoing use) can be seen at elasticvue --> indices --> settings/cog --> Show stats.  


=== Data ===
=== Data ===
The core datatype will be the ''Product''.
The core datatype will be the ''Product''.


To enable API cases such as partial text search, a rich autocomplete, and the possibility of eventually serving as a unified read layer, all fields will be added to the index. Only product names, brands and categories will be indexed for autocomplete queries.
To enable API cases such as partial text search, a rich autocomplete, and the possibility of eventually serving as a unified read layer, [https://static.openfoodfacts.org/data/data-fields.txt all fields] will be added to the index. Only product names, brands and categories will be indexed for autocomplete queries.


An argument could be made for storing fewer fields, and reducing disk usage. However, as illustrated above, disk usage is quite reasonable.  
An argument could be made for storing fewer fields, and reducing disk usage. However, as illustrated above, disk usage is quite reasonable.  
Line 48: Line 48:


=== Importing Data ===
=== Importing Data ===
A Redis container will be created, which will serve as a queue/buffer for writing data. When the [https://github.com/openfoodfacts/openfoodfacts-server/blob/af59dc1155a096328e9dc4710985a12a8be878c3/lib/ProductOpener/Products.pm#L968 ''store_product'' method] is called on the main service, a new entry will be added to the queue, containing the full product definition. A field will indicate if this is an upsert or a delete.
A Redis container will be created, which will serve as a queue/buffer for writing data. Using this approach is preferable to a webhook as search service instability will not affect the main server, and we can better handle write spikes/DOS attacks.  


The Search Service will consume from this queue, indexing (or deleting) each product as it receives messages.
Data will be added to the queue when the [https://github.com/openfoodfacts/openfoodfacts-server/blob/af59dc1155a096328e9dc4710985a12a8be878c3/lib/ProductOpener/Products.pm#L968 ''store_product'' method] is called on the main service. This data will contain the full product definition and a field will indicate if this is an upsert or a delete.
 
The Search Service will consume from this queue via the [https://redis-py.readthedocs.io/en/stable/ Redis Python API], indexing (or deleting) each product as it receives messages.


A manual import script will also be written, to take the CSV file and [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html bulk import] items. To ensure that data from the manual import script is up to date (ie, no gap from the time of running the script and when data is imported), we need to:
A manual import script will also be written, to take the CSV file and [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html bulk import] items. To ensure that data from the manual import script is up to date (ie, no gap from the time of running the script and when data is imported), we need to:
Line 122: Line 124:
   
   
=== API Discussion ===
=== API Discussion ===
The barcode GET API is included to demonstrate how this service could easily replace our existing APIs, but is not intended to be used.  
The barcode GET API is included to demonstrate how this service could easily replace our existing APIs, but is not intended to be used for the moment.  


The remaining APIs have several commonalities:
The remaining APIs have several commonalities:
Line 129: Line 131:
* An optional ''response_fields'' parameter is provided, to limit the fields in the response further
* An optional ''response_fields'' parameter is provided, to limit the fields in the response further
* POST is used, to support a complex request body
* POST is used, to support a complex request body


The ''/search'' API is the most complex as it allows a series of filters to support the use cases in the [https://openfoodfacts.github.io/api-documentation/#3SEARCHRequests current API]. These filters will work like an intersection/AND query. Of interest are the:
The ''/search'' API is the most complex as it allows a series of filters to support the use cases in the [https://openfoodfacts.github.io/api-documentation/#3SEARCHRequests current API]. These filters will work like an intersection/AND query. Of interest are the:
17

edits