Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline retrieve deadlocks when streaming many insertions (possible race condition) #942

Open
jdmdmm opened this issue Jan 26, 2025 · 9 comments

Comments

@jdmdmm
Copy link

jdmdmm commented Jan 26, 2025

When using the pipeline code with nontransaction, and with multiple inserts per second (typically - this does not happen when doing 2 inserts per second on a timer - also I have made it work in this mode of streaming insert operation once, so I suspect this is a race condition between the insert and the retrieve on the pipeline), the pipeline retrieve function deadlocks, here is a stacktrace:

#0 0x00007ffff74b04cd in __GI___poll (fds=0x7ffff4d131e8, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007ffff7f7c814 in ?? () from /lib/x86_64-linux-gnu/libpq.so.5
#2 0x00007ffff7f82740 in PQgetResult () from /lib/x86_64-linux-gnu/libpq.so.5
#3 0x00005555569b9ce5 in pqxx::connection::get_result (this=0x55555791b340) at xxx/builddbg/_deps/project_libpqxx-src/src/connection.cxx:959
#4 0x00005555569f7692 in pqxx::internal::gate::connection_pipeline::get_result (this=0x7ffff4d133c8)
at xxx/builddbg/_deps/project_libpqxx-src/include/pqxx/internal/gates/connection-pipeline.hxx:15
#5 0x00005555569f231b in pqxx::pipeline::obtain_dummy (this=0x5555579c9300) at xxx/builddbg/_deps/project_libpqxx-src/src/pipeline.cxx:283
#6 0x00005555569f5078 in pqxx::pipeline::receive (this=0x5555579c9300, stop={...})
at xxx/builddbg/_deps/project_libpqxx-src/src/pipeline.cxx:448
#7 0x00005555569f44ac in pqxx::pipeline::retrieve (this=0x5555579c9300, q={...})
at xxx/builddbg/_deps/project_libpqxx-src/src/pipeline.cxx:395
#8 0x00005555569f00f6 in pqxx::pipeline::retrieve (this=0x5555579c9300) at xxx/builddbg/_deps/project_libpqxx-src/src/pipeline.cxx:153

@tt4g
Copy link
Contributor

tt4g commented Jan 26, 2025

PQgetResult() that appears at the top of the backtrace is the libpq function, so the deadlock appears to occur in libpq.
The libpq API is not thread-safe.
You say you are running a pipeline on a timer, but isn't that accessing one connection at a time?

@jdmdmm
Copy link
Author

jdmdmm commented Jan 27, 2025

I'm running the insertion routine on a timer in some scenarios, but this scenario (which is failing) is raw (real time) data from the sensors being inserted in one thread and the results being periodically checked in another.

I tried using lock guards to ensure thread safety during access to libpqxx (and thus also libpq), but I get the same issue.
I think the pipeline is just choking on the sheer amount of data, although it is weird that every so often (very rarely), it will work, which is what led me to believe it was a race condition.

I implemented a solution using internal queues (each in its own thread) using multiple connections to the db and a normal work transaction using commit etc, and I can make that work with 8 connections at 10 millisecond insertion intervals.

Realtime sensor data is faster than this.

@tt4g
Copy link
Contributor

tt4g commented Jan 27, 2025

Since libpq is not thread-safe, libpqxx is not thread-safe either.
It is not safe to perform another operation between the start and completion of the transaction (from the start of the pipeline to completion or cancellation).

I can make that work with 8 connections at 10 millisecond insertion intervals.

Are you saying that your project avoids this problem?

@jdmdmm
Copy link
Author

jdmdmm commented Jan 27, 2025

As I understood it, the purpose of a pipeline is to be able to insert (for example) in one thread and read the results of that insert on another.
Is that not the usage pattern?

@jtv
Copy link
Owner

jtv commented Jan 27, 2025

@jdmdmm: Not specifically. The idea of the pipeline is simply: if you're executing many queries in quick succession that don't need the results of the immediately preceding queries, then we can save some networking time by sending each query before the previous query is finished.

The pipeline class does that by concatenating queries and sending them to the server as one big string. That's pretty much all there is to it. Threading was never even a consideration - either you keep one connection on one thread, or you use locking.

Anyway, is there a clearer, more structured description of what it is you're doing? I find this story a bit confusing, like I'm falling into the middle of a conversation that was already going on. It doesn't help that I'm having to read it on a phone. I can make guesses but my experience is that doing so often leads to more confusion.

And as @tt4g says, I guess it's possible that you have a non-threadsafe build of libpq... Have you checked?

@kiwixz
Copy link

kiwixz commented Jan 29, 2025

Just FYI the libpq doc mentions:

It is best to use pipeline mode with libpq in non-blocking mode. If used in blocking mode it is possible for a client/server deadlock to occur. [15]

[15] The client will block trying to send queries to the server, but the server will block trying to send results to the client from queries it has already processed. This only occurs when the client sends enough queries to fill both its output buffer and the server's receive buffer before it switches to processing input from the server, but it's hard to predict exactly when that will happen.

@jtv
Copy link
Owner

jtv commented Jan 29, 2025

Thanks @kiwixz. The pqxx::pipeline class does not use libpq's pipeline mode but there may be a similar thing going on. The ideas are very similar.

(In case anybody wonders why libpqxx does not use libpq's pipeline mode... I wrote my pipeline class long before that feature was introduced into libpq. The libpq feature was inspired by a similar feature in the Java client, which in turn may or may not have been inspired by libpqxx — I have no idea. One day I hope to rewrite my class to become a thin wrapper for the libpq feature.)

@jdmdmm
Copy link
Author

jdmdmm commented Jan 29, 2025

I was wondering if I had control over that, but it does seem to me that it is blocking and I think it just demonstrates this issue more readily under high load.
I've implemented a thread safe version using multiple parallel connections and standard blocking transactions that also batches many inserts together into a single transaction at a time. It works very well so far, but it would be great to have something I can just use from the library.

@jtv
Copy link
Owner

jtv commented Jan 30, 2025

That's good to hear @jdmdmm — the pipeline class may simply not be a good fit for your use-case. If you're doing bulk inserts to single tables, also consider speeding it up using pqxx::stream_to. It's basically a type-safe wrapper for COPY FROM STDIN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants