-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the usage of txn metadata file #525
Conversation
52df912
to
3292ab7
Compare
I see a
Full log from test execution: |
Hi @arajkumar ; while I'm all for simplifying the code as much as possible, we need to make sure we don't over-simplify it. Unfortunately in this PR at the moment I think it's the case that we oversimplify some aspects. As with the previous PR you worked on I am very worried about accepting a BEGIN in the middle of a transaction or a COMMIT without a BEGIN; I think we should be very cautious about that and only accept it when we absolutely know and made sure it's the situation we believe it is. Another thing to keep is mind is that setting the
In these cases we still want correct processing of all the transactions. That's the reason why we have the xid metadata files for larger transactions (or split transactions): it allows to skip a transaction as a whole even if the target endpos LSN is to be found in the middle of said transaction. We want to stop processing but also we want to register the replay_lsn to the actual last COMMIT that was processed, so that in case we restart with a new endpos the split transaction is not skipped. I don't see how your current implementation addresses that problem... |
Thanks @dimitri for your feedback.
Partial transactions are a reality while switching from prefetch to replay mode. BEGIN after BEGIN is very much possible when you interrupt the write before the COMMIT message arrives. But I agree, COMMIT without a BEGIN should never come.
IIUC, the current code changes the commit type to synchronous incase if it sees the endpos is in the middle., doesn't seem to skip the txn. pgcopydb/src/bin/pgcopydb/ld_apply.c Lines 698 to 723 in b1a0bcf
So, when there is no commit_lsn, we let the transaction proceed and executes the body(DML), if there is an ENDPOS in the middle, it still honours it and abort the transaction as usual. We never updates the replay_lsn for the abort case, it stays with the previous value. When you change the endpos to new value and resume the pgcopydb, it would ignore all txs except the last one which was aborted and continue the apply from from the beginning, it will ignore the old endpos. |
235da85
to
5e4b474
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought through this and couldn't think of scenario where this PR wouldn't work. So this looks good to me.
I have also tested it using the script by which I was able to discover previous bugs, but that worked fine here. I'll sleep on it for some more time if I find some edge case.
Been thinking more about it, and what I think makes this approach feasible now (when it wasn't before) has been the addition of the explicit ENDPOS message in the stream. Given that, I think I see how everything works together in a way that we don't need the transaction metadata file anymore. It's a very nice simplification, thanks @arajkumar ! I still would prefer us to keep the resume filtering logic in |
Thanks @dimitri @shubhamdhama for reviewing this PR.
Let me revert the changes made to ld_stream.c. But I think the main problem would be we will end up seeing issues like #471. |
810316a
to
33e5d5c
Compare
I just did a quick review and the PR still allows BEGIN; BEGIN; without error, or COMMIT without BEGIN, assuming it must be the case you have in mind without making sure it is. It could be any other bug that leads to that situation and we need to protect against it. Can we remove the new processing in these areas too? Also I don't understand the new |
@dimitri This is to handle the case where the SWITCH and COMMIT have the same LSN. When we update For example, consider the following segment files, SWITCH message in 000000040000018600000096.sql and COMMIT message in 000000040000018600000097.sql has same LSN. Without having separate field to track, apply would ignore the COMMIT because previousLSN will be same. 000000040000018600000096.sql
000000040000018600000097.sql
|
Oh that's a good find! It warrants some comments in the code to explain that, and I think it would be even best as its own separate bugfix PR, what do you think? |
This commit removes the usage of transaction metadata file. Initially, it was used by the apply process to bypass transactions that were already applied. However, this approach had its challenges. Specifically, in live replay mode, a transaction with numerous statements could fill the UNIX PIPE (an IPC primitive used in replay mode), leading to a potential deadlock. This is because the apply process would be waiting for the transaction metadata file. By eliminating the transaction metadata file, the apply process lets the transaction proceed and decides whether to apply or skip it based on the commit LSN during the commit phase. Signed-off-by: Arunprasad Rajkumar <[email protected]>
33e5d5c
to
a503ea4
Compare
@dimitri It wasn't a problem earlier as we checked the commitLSN in the BEGIN message., this change is only needed for the new implementation which doesn't require txn metadata file. I will try to add some comment. |
Thanks for all the work @arajkumar ! |
This commit removes the usage of transaction metadata file. Initially, it was used by the apply process to bypass transactions that were already applied. However, this approach had its challenges. Specifically, in live replay mode, a transaction with numerous statements could fill the UNIX PIPE (an IPC primitive used in replay mode), leading to a potential deadlock. This is because the apply process would be waiting for the transaction metadata file.
By eliminating the transaction metadata file, the apply process lets the transaction proceed and decides whether to apply or skip it based on the commit LSN during the commit phase.
Fixes #493 #471