Currently recovery tests are done manually, it'd be great to have them as automated tests.
These are the main scenarios:
(the connection recovery worker is referred as just "worker")
1. postgrest started with a pg connection, then pg becomes unavailable
- worker starts only after a request to postgrest, postgrest should respond with a
{"details":"no connection to the server\n","message":"Database client error. Retrying the connection."}
- if db-channel-enabled=true, worker starts immediately, not necessary to prove this in tests though.
- pg becomes available, postgrest succeeds reconnecting, reloads the schema cache and responds with 200
- if db-load-guc-config=true, it should also re-read the in-db config.
- test with an
ALTER ROLE postgrest_test_authenticator SET pgrst.db_schemas = 'public'; and try a GET /public_consumers which should give a 404 if the in-db config isn't re-read.
2. unavailable pg, postgrest started
- worker starts immediately, postgrest should respond with a
503 {"message":"Database connection lost. Retrying the connection."}
- Bug: if db-channel-enabled=true, postgrest doesn't reply and curl gives
Connection refused. This must be because of the mvarConnectionStatus MVar, it doesn't happen on 1 though.
- pg becomes available, postgrest succeeds reconnecting, reloads the schema cache and responds with 200
- if db-load-guc-config=true, it should also re-read the in-db config.
3. SIGUSR1 - NOTIFY reload schema
- when these are done, no running requests using pg connections must be interrupted
- when postgrest has a pg connection, both SIGUSR1 and NOTIFY will reload the schema cache
- if db-load-guc-config=true, it should also re-read the in-db config.
- ensure SIGUSR1 starts the worker when db-channel-enabled=true(got it to lock before, and worker was not starting, so this must be ensured)
- when postgrest loses the connection, and db-channel-enabled=false(only SIGUSR1)
- SIGUSR1 starts the worker, only one can run at a time. Ensured by
refIsWorkerOn, this can be confirmed by doing several SIGUSR1 and just noting one Attempting to reconnect to the database in 1 seconds... message. If refIsWorkerOn is removed, there will be several Attempting to reconnect to the database in 1 seconds... mesagges.
- Not sure how to test this, maybe count the number of threads?
- pg becomes available, postgrest succeeds reconnecting, reloads the schema cache and responds with 200
- when postgrest loses the connection, and db-channel-enabled=true
- ensure the
listener recovers, e.g. doing a NOTIFY 'reload cache/load config' should work after recovery.
Currently recovery tests are done manually, it'd be great to have them as automated tests.
These are the main scenarios:
(the connection recovery worker is referred as just "worker")
1. postgrest started with a pg connection, then pg becomes unavailable
{"details":"no connection to the server\n","message":"Database client error. Retrying the connection."}ALTER ROLE postgrest_test_authenticator SET pgrst.db_schemas = 'public';and try aGET /public_consumerswhich should give a 404 if the in-db config isn't re-read.2. unavailable pg, postgrest started
503 {"message":"Database connection lost. Retrying the connection."}Connection refused. This must be because of themvarConnectionStatusMVar, it doesn't happen on 1 though.3. SIGUSR1 - NOTIFY reload schema
refIsWorkerOn, this can be confirmed by doing several SIGUSR1 and just noting oneAttempting to reconnect to the database in 1 seconds...message. IfrefIsWorkerOnis removed, there will be severalAttempting to reconnect to the database in 1 seconds...mesagges.listenerrecovers, e.g. doing aNOTIFY 'reload cache/load config'should work after recovery.