17
edits
(Add links) |
m (Minor changes) |
||
Line 3: | Line 3: | ||
=== Goals === | === Goals === | ||
Create | Create new search APIs to facilitate: | ||
* Autocomplete (currently unsupported) | * Autocomplete (currently unsupported) | ||
Line 22: | Line 22: | ||
=== Configuration === | === Configuration === | ||
We will use Elasticsearch 8.3.3 (latest), deployed via Docker. We will use a replication factor of one, with shards split across two nodes. | We will use Elasticsearch 8.3.3 (latest), deployed via Docker. We will use a replication factor of one (ie, two copies of the data in total), with shards split across two nodes. | ||
From testing on a M1 Macbook Pro, we see: | From testing on a M1 Macbook Pro, we see: | ||
Line 35: | Line 35: | ||
=== Monitoring === | === Monitoring === | ||
We will use [https://elasticvue.com/ elasticvue] to see information such as resource usage, sharding, and perform debugging. Query | We will use [https://elasticvue.com/ elasticvue] to see information such as resource usage, sharding, and perform debugging. Query stats (to monitor ongoing use) can be seen at elasticvue --> indices --> settings/cog --> Show stats. | ||
=== Data === | === Data === | ||
The core datatype will be the ''Product''. | The core datatype will be the ''Product''. | ||
To enable API cases such as partial text search, a rich autocomplete, and the possibility of eventually serving as a unified read layer, all fields will be added to the index. Only product names, brands and categories will be indexed for autocomplete queries. | To enable API cases such as partial text search, a rich autocomplete, and the possibility of eventually serving as a unified read layer, [https://static.openfoodfacts.org/data/data-fields.txt all fields] will be added to the index. Only product names, brands and categories will be indexed for autocomplete queries. | ||
An argument could be made for storing fewer fields, and reducing disk usage. However, as illustrated above, disk usage is quite reasonable. | An argument could be made for storing fewer fields, and reducing disk usage. However, as illustrated above, disk usage is quite reasonable. | ||
Line 48: | Line 48: | ||
=== Importing Data === | === Importing Data === | ||
A Redis container will be created, which will serve as a queue/buffer for writing data. | A Redis container will be created, which will serve as a queue/buffer for writing data. Using this approach is preferable to a webhook as search service instability will not affect the main server, and we can better handle write spikes/DOS attacks. | ||
The Search Service will consume from this queue, indexing (or deleting) each product as it receives messages. | Data will be added to the queue when the [https://github.com/openfoodfacts/openfoodfacts-server/blob/af59dc1155a096328e9dc4710985a12a8be878c3/lib/ProductOpener/Products.pm#L968 ''store_product'' method] is called on the main service. This data will contain the full product definition and a field will indicate if this is an upsert or a delete. | ||
The Search Service will consume from this queue via the [https://redis-py.readthedocs.io/en/stable/ Redis Python API], indexing (or deleting) each product as it receives messages. | |||
A manual import script will also be written, to take the CSV file and [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html bulk import] items. To ensure that data from the manual import script is up to date (ie, no gap from the time of running the script and when data is imported), we need to: | A manual import script will also be written, to take the CSV file and [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html bulk import] items. To ensure that data from the manual import script is up to date (ie, no gap from the time of running the script and when data is imported), we need to: | ||
Line 122: | Line 124: | ||
=== API Discussion === | === API Discussion === | ||
The barcode GET API is included to demonstrate how this service could easily replace our existing APIs, but is not intended to be used. | The barcode GET API is included to demonstrate how this service could easily replace our existing APIs, but is not intended to be used for the moment. | ||
The remaining APIs have several commonalities: | The remaining APIs have several commonalities: | ||
Line 129: | Line 131: | ||
* An optional ''response_fields'' parameter is provided, to limit the fields in the response further | * An optional ''response_fields'' parameter is provided, to limit the fields in the response further | ||
* POST is used, to support a complex request body | * POST is used, to support a complex request body | ||
The ''/search'' API is the most complex as it allows a series of filters to support the use cases in the [https://openfoodfacts.github.io/api-documentation/#3SEARCHRequests current API]. These filters will work like an intersection/AND query. Of interest are the: | The ''/search'' API is the most complex as it allows a series of filters to support the use cases in the [https://openfoodfacts.github.io/api-documentation/#3SEARCHRequests current API]. These filters will work like an intersection/AND query. Of interest are the: |
edits