Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the usage of txn metadata file #525

Merged
merged 1 commit into from
Nov 6, 2023

Conversation

arajkumar
Copy link
Contributor

@arajkumar arajkumar commented Nov 1, 2023

This commit removes the usage of transaction metadata file. Initially, it was used by the apply process to bypass transactions that were already applied. However, this approach had its challenges. Specifically, in live replay mode, a transaction with numerous statements could fill the UNIX PIPE (an IPC primitive used in replay mode), leading to a potential deadlock. This is because the apply process would be waiting for the transaction metadata file.

By eliminating the transaction metadata file, the apply process lets the transaction proceed and decides whether to apply or skip it based on the commit LSN during the commit phase.

Fixes #493 #471

@arajkumar
Copy link
Contributor Author

CC: @shubhamdhama @dimitri

@arajkumar arajkumar force-pushed the no-txn-metadata-pr branch 3 times, most recently from 52df912 to 3292ab7 Compare November 2, 2023 09:03
@arajkumar
Copy link
Contributor Author

I see a test cdc-endpos-beteween-transaction fails intermittently because when resuming we no longer skip the messages(if we receive again) which are buffered into the file. The case is handled in the downstream, but when we compare the interim files, there will be an extra content some times.

diff -I BEGIN -I COMMIT -I KEEPALIVE -I SWITCH -I ENDPOS /usr/src/pgcopydb/000000010000000000000002.sql /var/lib/postgres/.local/share/pgcopydb/000000010000000000000002.sql
9,10c9,14
< -- KEEPALIVE {"lsn":"0/244B9A8","timestamp":"2023-09-17 07:04:04.9770+0000"}
< -- ENDPOS {"lsn":"0/244B9A8"}
---
-- KEEPALIVE {"lsn":"0/244AD40","timestamp":"2023-11-02 09:06:05.139078+0000"}
-- ENDPOS {"lsn":"0/244AD40"}
BEGIN; -- {"xid":492,"lsn":"0/244AC18","timestamp":"2023-11-02 09:06:05.586617+0000"}
PREPARE 6a9e34e7 AS INSERT INTO public.category ("category_id", "name", "last_update") overriding system value VALUES ($1, $2, $3), ($4, $5, $6), ($7, $8, $9);
EXECUTE 6a9e34e7["1005","Mystery","2022-12-13 00:00:01+00","1006","Historical drama","2022-12-14 00:00:01+00","1008","Thriller","2022-12-15 00:00:01+00"];
-- ENDPOS {"lsn":"0/244AD40"}

Full log from test execution:
cdc-endpos-between-transaction.log

@dimitri
Copy link
Owner

dimitri commented Nov 2, 2023

Hi @arajkumar ; while I'm all for simplifying the code as much as possible, we need to make sure we don't over-simplify it. Unfortunately in this PR at the moment I think it's the case that we oversimplify some aspects. As with the previous PR you worked on I am very worried about accepting a BEGIN in the middle of a transaction or a COMMIT without a BEGIN; I think we should be very cautious about that and only accept it when we absolutely know and made sure it's the situation we believe it is.

Another thing to keep is mind is that setting the endpos can be done several times by the user:

  • set the endpos, pgcopydb finishes,
  • realize you want to replicate more,
  • reset endpos to the new (greater) LSN,
  • restart the follow mode

In these cases we still want correct processing of all the transactions. That's the reason why we have the xid metadata files for larger transactions (or split transactions): it allows to skip a transaction as a whole even if the target endpos LSN is to be found in the middle of said transaction. We want to stop processing but also we want to register the replay_lsn to the actual last COMMIT that was processed, so that in case we restart with a new endpos the split transaction is not skipped.

I don't see how your current implementation addresses that problem...

@arajkumar
Copy link
Contributor Author

arajkumar commented Nov 2, 2023

Thanks @dimitri for your feedback.

Hi @arajkumar ; while I'm all for simplifying the code as much as possible, we need to make sure we don't over-simplify it. Unfortunately in this PR at the moment I think it's the case that we oversimplify some aspects. As with the previous PR you worked on I am very worried about accepting a BEGIN in the middle of a transaction or a COMMIT without a BEGIN; I think we should be very cautious about that and only accept it when we absolutely know and made sure it's the situation we believe it is.

Partial transactions are a reality while switching from prefetch to replay mode. BEGIN after BEGIN is very much possible when you interrupt the write before the COMMIT message arrives. But I agree, COMMIT without a BEGIN should never come.

Another thing to keep is mind is that setting the endpos can be done several times by the user:

  • set the endpos, pgcopydb finishes,
  • realize you want to replicate more,
  • reset endpos to the new (greater) LSN,
  • restart the follow mode

In these cases we still want correct processing of all the transactions. That's the reason why we have the xid metadata files for larger transactions (or split transactions): it allows to skip a transaction as a whole even if the target endpos LSN is to be found in the middle of said transaction.

IIUC, the current code changes the commit type to synchronous incase if it sees the endpos is in the middle., doesn't seem to skip the txn.

bool commitLSNreachesEndPos =
context->endpos != InvalidXLogRecPtr &&
context->endpos <= metadata->txnCommitLSN;
GUC *settings =
commitLSNreachesEndPos ? applySettingsSync : applySettings;
if (commitLSNreachesEndPos)
{
log_notice("BEGIN transaction with COMMIT LSN %X/%X which is "
"reaching endpos %X/%X, synchronous_commit is on",
LSN_FORMAT_ARGS(metadata->txnCommitLSN),
LSN_FORMAT_ARGS(context->endpos));
}
if (!pgsql_set_gucs(pgsql, settings))
{
log_error("Failed to set the apply GUC settings, "
"see above for details");
return false;
}
context->transactionInProgress = true;
break;
}

We want to stop processing but also we want to register the replay_lsn to the actual last COMMIT that was processed, so that in case we restart with a new endpos the split transaction is not skipped.

I don't see how your current implementation addresses that problem...

So, when there is no commit_lsn, we let the transaction proceed and executes the body(DML), if there is an ENDPOS in the middle, it still honours it and abort the transaction as usual. We never updates the replay_lsn for the abort case, it stays with the previous value. When you change the endpos to new value and resume the pgcopydb, it would ignore all txs except the last one which was aborted and continue the apply from from the beginning, it will ignore the old endpos.

@arajkumar arajkumar force-pushed the no-txn-metadata-pr branch 2 times, most recently from 235da85 to 5e4b474 Compare November 3, 2023 05:23
Copy link
Contributor

@shubhamdhama shubhamdhama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought through this and couldn't think of scenario where this PR wouldn't work. So this looks good to me.
I have also tested it using the script by which I was able to discover previous bugs, but that worked fine here. I'll sleep on it for some more time if I find some edge case.

@dimitri
Copy link
Owner

dimitri commented Nov 3, 2023

Been thinking more about it, and what I think makes this approach feasible now (when it wasn't before) has been the addition of the explicit ENDPOS message in the stream. Given that, I think I see how everything works together in a way that we don't need the transaction metadata file anymore. It's a very nice simplification, thanks @arajkumar !

I still would prefer us to keep the resume filtering logic in ld_stream.c module for the moment, for two reasons. First, we know we can trust that code and we don't need to remove it for this idea to work, so we can review another PR later that removes that part. Second, it allows to still issue ERROR conditions when we see a BEGIN and a transaction is already in progress.

@arajkumar
Copy link
Contributor Author

arajkumar commented Nov 3, 2023

Thanks @dimitri @shubhamdhama for reviewing this PR.

I still would prefer us to keep the resume filtering logic in ld_stream.c module for the moment, for two reasons. First, we know we can trust that code and we don't need to remove it for this idea to work, so we can review another PR later that removes that part. Second, it allows to still issue ERROR conditions when we see a BEGIN and a transaction is already in progress.

Let me revert the changes made to ld_stream.c. But I think the main problem would be we will end up seeing issues like #471.

@arajkumar arajkumar changed the title Revisit the resume logic Remove the usage of txn metadata file Nov 3, 2023
@arajkumar arajkumar force-pushed the no-txn-metadata-pr branch 2 times, most recently from 810316a to 33e5d5c Compare November 3, 2023 13:37
@dimitri
Copy link
Owner

dimitri commented Nov 6, 2023

I just did a quick review and the PR still allows BEGIN; BEGIN; without error, or COMMIT without BEGIN, assuming it must be the case you have in mind without making sure it is. It could be any other bug that leads to that situation and we need to protect against it.

Can we remove the new processing in these areas too?

Also I don't understand the new context->switchLSN tracking and when it replaces previousLSN, but maybe we cas dispose of it while doing the previously mentioned cleanup?

@arajkumar
Copy link
Contributor Author

Also I don't understand the new context->switchLSN tracking and when it replaces previousLSN, but maybe we cas dispose of it while doing the previously mentioned cleanup?

@dimitri This is to handle the case where the SWITCH and COMMIT have the same LSN. When we update previousLSN in SWITCH, it would cause the COMMIT to be ignored as we check previousLSN < lsn_of_commit_message.

For example, consider the following segment files, SWITCH message in 000000040000018600000096.sql and COMMIT message in 000000040000018600000097.sql has same LSN. Without having separate field to track, apply would ignore the COMMIT because previousLSN will be same.

000000040000018600000096.sql

...
BEGIN; -- {"xid":9168338,"lsn":"186/96FF0738","timestamp":"2023-11-06 08:34:39.459848+0000"}
PREPARE 4f6bacc3 AS INSERT INTO public.diagnostics ("time", "tags_id", "fuel_state", "current_load", "status","additional_tags") overriding system value VALUES ($1, $2, $3, $4, $5, $6), ($7, $8, $9, $10, $11, $12), ($13, $14, $15, $16, $17, $18)
EXECUTE 4f6bacc3["2016-01-24 01:32:50+00","482","0.700000","234","0",null,"2016-01-2401:32:50+00","2034","0.700000","234","4",null,"2016-01-24 01:32:50+00","462","0.700000","234","3",null,"2016-01-2401:32:50+00","463","0.700000","234","0",null,"2016-01-24 01:32:50+00","1166","0.700000","234","2",null,"2016-01-2401:32:50+00","465","0.700000","234","0",null,"2016-01-24 01:32:50+00","2031","0.700000","234","2",null,"2016-01-2401:32:50+00","468","0.700000","234","0",null,"2016-01-24 01:32:50+00","469","0.700000","234","0",null,"2016-01-2401:33:00+00","331","0.700000","2515","1",null,"2016-01-24 01:33:00+00","482","0.700000","2515","0",null,"2016-01-2401:33:00+00","461","0.700000","2515","0",null,"2016-01-24 01:33:00+00","2034","0.700000","2515","3",null,"2016-01-2401:33:00+00","462","0.700000","2515","1",null,"2016-01-24 01:33:00+00","1166","0.700000","2515","3",null,"2016-01-2401:33:00+00","466","0.700000","2515",null,null,"2016-01-24 01:33:00+00","465","0.700000","2515","1",null,"2016-01-2401:33:00+00","468","0.700000","2515","0",null,"2016-01-24 01:33:00+00","469","0.700000","2515",null,null,"2016-01-2401:33:10+00","331","0.700000","1736","1"]
-- SWITCH WAL {"lsn":"186/970098C0"}

000000040000018600000097.sql

COMMIT; -- {"xid":9168338,"lsn":"186/970098C0","timestamp":"2023-11-06 08:34:39.466938+0000"}
...

@dimitri
Copy link
Owner

dimitri commented Nov 6, 2023

@dimitri This is to handle the case where the SWITCH and COMMIT have the same LSN. When we update previousLSN in SWITCH, it would cause the COMMIT to be ignored as we check previousLSN < lsn_of_commit_message.

Oh that's a good find! It warrants some comments in the code to explain that, and I think it would be even best as its own separate bugfix PR, what do you think?

This commit removes the usage of transaction metadata file. Initially, it was used by the apply process to bypass transactions that were already applied. However, this approach had its challenges. Specifically, in live replay mode, a transaction with numerous statements could fill the UNIX PIPE (an IPC primitive used in replay mode), leading to a potential deadlock. This is because the apply process would be waiting for the transaction metadata file.

By eliminating the transaction metadata file, the apply process lets the transaction proceed and decides whether to apply or skip it based on the commit LSN during the commit phase.

Signed-off-by: Arunprasad Rajkumar <[email protected]>
@arajkumar
Copy link
Contributor Author

@dimitri This is to handle the case where the SWITCH and COMMIT have the same LSN. When we update previousLSN in SWITCH, it would cause the COMMIT to be ignored as we check previousLSN < lsn_of_commit_message.

Oh that's a good find! It warrants some comments in the code to explain that, and I think it would be even best as its own separate bugfix PR, what do you think?

@dimitri It wasn't a problem earlier as we checked the commitLSN in the BEGIN message., this change is only needed for the new implementation which doesn't require txn metadata file. I will try to add some comment.

@dimitri dimitri added bug Something isn't working enhancement New feature or request labels Nov 6, 2023
@dimitri dimitri added this to the v0.14 milestone Nov 6, 2023
@dimitri dimitri merged commit 86e7de4 into dimitri:main Nov 6, 2023
15 checks passed
@dimitri
Copy link
Owner

dimitri commented Nov 6, 2023

Thanks for all the work @arajkumar !

@arajkumar
Copy link
Contributor Author

Thanks a lot @dimitri for your time and feedback.

Next, I'm planning to address #471.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pgcopydb hangs on while doing pgcopydb clone --follow
3 participants