Folksonomy Engine/Technical specifications
Requirements
- Scalability
- Number of property/value pairs could be huge: potentially millions in a few years; potentially dozens of millions in 5 years.
- Write and read must be fast and scale to, at least, thousands of requests per second.
- Development
- Data model should be as simple as possible (really).
- As en open source software Open Food Facts appreciate contributions: the technical stack should be widely known to maximize the chance that other developers participate.
- Open Food Facts permanent dev team is small: the technical stack should be known by the team.
- Open Food Facts tries to choose improved technologies and mature standards.
- Backend and frontend should be clearly separated: it helps data reusability; it allows backend and frontend devs to concentrate on their own work.
- Usages
- Data should be easy to reuse by developers.
- A dynamic API documentation should be really helpful.
- All the data need to be versioned, for transparency and history concerns.
- Data should be easy to reuse by developers.
Technical choices
Backend
- The whole Open Food Facts infrastructure is based on Linux Debian, which is efficient and widely known.
- We choose Python as the backend programming language:
- Python is widely adopted (one of the main languages taught at school).
- It scales well.
- Part of the team is already using Python.
- Building an API from scratch would have been too long, so choose the FastAPI framework which integrates interesting benefits:
- Building an API with FastAPI is very easy: it's possible to build a simple API with its self generated documentation (openapi standard).
- Said to be fast: https://www.techempower.com/ benchmarks are showing it is ~3 times faster than very popular Python framework such as Django or Flask.
- FastAPI can be run with or without a web server.
- RESTful compliance is largely handled by FastAPI itself.
Database
Folksonomy Engine has very different goals and requirements than Open Food Facts backend (Perl + MongoDB):
- The data model should be very simple.
- It could lead to 10 times more records as the total number of products gathered by Open Food Facts.
The data model simplicity lead us to choose a classical RDBMS known for its scalability: PostgreSQL. PostgreSQL has also interesting advantages and features which can be useful for Folksonomy Engine (fully open source, transactions, stored procedures, triggers, JSON...).
Data model
For performance concerns, we separated the data model implementation into two parts :
- one small table deals with the live data (current record versions)
- the other one stores all the past versions of the records.
Triggers at PostgreSQL level are used to maintain integrity between the current and archived versions of the records.
Frontend
For its first version, at least (it could change), the whole frontend will be developed in Javascript.
Workload test
As we said earlier, FastAPI is known to be very fast. But we need to have a few metrics to verify if it will be able to scale.
Folksonomy Engine won't be used for each product of the database. As of today (2021-10), Open Food Facts has a mean of 0.77 requests per second, with pikes up to 100 req/s. The usage of Folksonomy Engine will be mechanically lower than Open Food Facts requests.
Tests
If not specified, all our tests are made on a Raspberry Pi 4: because of the very low price of it, hardware -- and tests -- are easily reproducible.
All tests have been run with:
- a standard installation, as described here
- no optimization at all
uvicorn
launched with 2 workers.
$ uvicorn folksonomy.api:app --reload --host 192.168.0.42 --workers 2
We used the classical ab
test tool from the Apache Foundation, and a custom script to test new key/value injection (see bellow).
Testing 10,000 queries to /ping
(includes a database request) give the following results:
$ ab -n 10000 -c 10 http://192.168.0.42:8000/ping Requests per second: 307.11 [#/sec] (mean) Time per request: 32.562 [ms] (mean)
Testing 10,000 queries to a particular key/value gives the following results:
$ ab -n 10000 -c 10 http://192.168.0.42:8000/product/6389599748279 Requests per second: 227.45 [#/sec] (mean) Time per request: 43.966 [ms] (mean)
Publishing 2,000 new key/value pairs on 2,000 different products give the following result:
$ time ./fe_test.sh real 0m25,006s user 0m30,612s sys 0m21,014ssh
corresponding to 80 requests per second. We obtained similar results for 20,000.
Dealing with the fact that our production servers are at least 10 or 20 times more powerfull than a Raspeberry Pi, we can conclude that Folksonomy Engine is correctly build to scale up.
Key/values injection script
#!/usr/bin/bash # This script is creating NB number of key/value pairs in Folksonomy Engine API # DO NOT USE ON PRODUCTION # Number of products to be created NB=2000 # Server URL S='http://192.168.0.42:8000' # User name U='charlesnepote' # Password read -p "OFF password: " PASS #PASS='' TOKEN=`curl -X 'POST' \ "$S/auth" \ -H 'accept: application/json' \ -H 'Content-Type: application/x-www-form-urlencoded' \ -d "grant_type=&username=$U&password=$PASS&scope=&client_id=&client_secret=" \ | jq -r '.access_token'` echo "Token: $TOKEN" counter=1 while [ $counter -le $NB ] do echo $counter ((counter++)) # Create a random key and value # https://stackoverflow.com/questions/32484504/using-random-to-generate-a-random-string-in-bash P=`LC_ALL=C tr -dc 0-9 </dev/urandom | head -c 13` echo "P: "$P K=`LC_ALL=C tr -dc a-z_ </dev/urandom | head -c 30` echo "K: "$K V=`LC_ALL=C tr -dc a-z0-9_ </dev/urandom | head -c 30` echo "V: "$V curl -X 'POST' \ "$S/product" \ -H 'accept: application/json' \ -H "Authorization: Bearer $TOKEN" \ -H 'Content-Type: application/json' \ -d '{ "product": "'$P'", "k": "'$K'", "v": "'$V'", "owner": "", "version": 1, "editor": "'$U'", "last_edit": "2021-10-19T07:57:47.518Z", "comment": "Test" }' & echo "" done # wait for the end of all the foreground calls wait