-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add byte-level progress to pgcopydb list progress #839
Add byte-level progress to pgcopydb list progress #839
Conversation
b3f436b
to
f82a0be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR misses the mark on applying the “Separation of Concerns” principle, as detailed in the review. The callback context should handle its own specific data in a way that the CopyArgs does not have to know about.
Also the PR comment invokes the “Single Source of Truth” principle, but I read the code as a violation of that principle, wherein it was previously made in a way that the single source of truth would be the SQLite catalogs, so that external processes can have a look at the same information as the internal code. Maybe the code evolved in a way that makes this not true anymore, but I think I would like to get back in that direction. For instance, we could build a pgcopydb watch
command that acts like a htop
command for pgcopydb: this would be an external process that would ideally just use the SQLite contents.
if (!summary_pretty_print_timing(catalog, timing)) | ||
{ | ||
/* errors have already been logged */ | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that work now with respect to the duration_pretty
column that still exists in the SQLite catalogs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the existing sources, there seems to be ne need to keep the field in the table.
src/bin/pgcopydb/summary.c
Outdated
/* | ||
* Skip reading `ppDuration` from DB and create it from `durationMs` | ||
* in accordance with the single source of truth principle. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This principle was already implemented before, where the decision was made to compute the data before SQLite storage and then use what is in the SQLite database. This allows any external tooling to have access to the same data as our internal code. Including commands that run alongside the main pgcopydb
process rather than as a child process.
IOW I believe you introduced a regression here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was that the data in these 2 fields were inconsistent. What are the external tools which use these fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is there a way to fix the bug wrt inconsistencies here, and keeping the data all in the SQLite database file?
- Could the inconsistency be an explanation for End report after a db clone shows higher numbers than expected #697?
- I mean “external” as in “not running within the process-tree with access to all the memory etc”, so that means the command
pgcopydb list progress
is already an “external” tool. All it has access to is the SQLite database file contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Probably there is. But is it a good idea to keep the 2 versions of the same data in the DB?
- I don't know, maybe.
- If we are talking about the
pgcopydb
then that means we have no problem here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we use our own version of SQLite and we know we have JSON support in SQL, we could rewrite all the SQLite to JSON transformation code and use a SQL query instead. It was my impression that having the data in two forms in the SQLite database was useful for that.
Another use-case is debugging from the SQLite data file when something strange happens. Then we might appreciate having the data available in pretty printed format for debugging.
The use-case I mentioned before still do exists, it's nice to be able to re-use the data from the SQLite data file either in numeric format or in pretty-printed format without having to re-compute the pretty-printed format every time. It's always the same with caching: do you want to use more CPU or more memory?
src/bin/pgcopydb/summary.c
Outdated
/* | ||
* Skip reading `ppBytes` from DB and create it from `bytes` | ||
* in accordance with the single source of truth principle. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before. What about external processes that might want to access to the telemetry?
src/bin/pgcopydb/table-data.c
Outdated
/* | ||
* copydb_mark_table_as_done creates the table doneFile with the expected | ||
* summary content. To create a doneFile we must acquire the synchronisation | ||
* semaphore first. The lockFile is also removed here. | ||
* copydb_save_copy_progress saves the duration and the number of | ||
* bytes transmitted about the copy progress into the catalog. | ||
*/ | ||
bool | ||
copydb_mark_table_as_done(CopyDataSpec *specs, | ||
copydb_save_copy_progress(CopyDataSpec *specs, | ||
CopyTableDataSpec *tableSpecs) | ||
{ | ||
DatabaseCatalog *sourceDB = &(specs->catalogs.source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this change. The summary_finish_table
function is responsible to maintain the done_time_epoch
field in the SQLite catalogs, and the rest of the code uses that information to determine if the table COPY is finished or not. Now we will update that done_time_epoch
field multiple times during the COPY. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed a change for this.
This explanation sounds misleading. If we have the 2 versions of the same data present, that is the symptom of the violation of single source of truth. It caused problems and I spent a lot of time to find this problem. Any application (not only pgcopydb) can make use of the data in the db represented as integer and display it however it wants. |
I think we're just using a different source of truth and then the same argument. Now that you know where the problem comes from, do you see any difficulty or architectural problem in using our SQLite database file as the source of truth? |
Yes, there seems to be a misunderstanding. I'm not objecting to using the sqlite as the source of truth, but to using 2 different fields for the same data but with different representations (int and string). |
src/bin/pgcopydb/summary.c
Outdated
/* | ||
* Skip reading `ppDuration` from DB and create it from `durationMs` | ||
* in accordance with the single source of truth principle. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we use our own version of SQLite and we know we have JSON support in SQL, we could rewrite all the SQLite to JSON transformation code and use a SQL query instead. It was my impression that having the data in two forms in the SQLite database was useful for that.
Another use-case is debugging from the SQLite data file when something strange happens. Then we might appreciate having the data available in pretty printed format for debugging.
The use-case I mentioned before still do exists, it's nice to be able to re-use the data from the SQLite data file either in numeric format or in pretty-printed format without having to re-compute the pretty-printed format every time. It's always the same with caching: do you want to use more CPU or more memory?
src/bin/pgcopydb/table-data.c
Outdated
ctx->bytesTransmittedBeforeSavingProgress += bytesTransmitted; | ||
ctx->bytesTransmitted += bytesTransmitted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When are the two values different? There is no comment about that, and the naming of the variable does not explain that either. Why do we need the two slots? Can we have a better name for bytesTransmittedBeforeSavingProgress
? I currently have two problems with it: I don't understand the name in relation with the code, and I would like a shorter name if that's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because you don't understand it doesn't mean others don't, right? So the variable keeps the number of bytes transmitted before saving the progress to the db and reset to zero on each save. If you have a better name, please feel free to suggest one.
Assuming that you mean the vars starting with bytesTransmitted...
one keeps the total as before (no change here) and the other as explained above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that attitude necessary @rimbi ? I can tell you it's not welcome.
Two things for you to consider:
- I am asking questions on aspects of your code that is not obvious to me when reviewing the PR. When things are not obvious, we need to either improve the code, or add good comments to help the reader understand what's happening with less efforts.
- I have read a lot of code, including some great code in many different programming languages. The great code that I have read always have these two properties: what/how it does is obvious, and it is elegant. Elegance has something personal to it, so I will merge code that I do not find elegant. I will not merge code that I do not find obvious, though. That's not personal.
I think that's useful (see other comments). I would like to keep that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is still a minor update to make to distribute responsibilities more cleanly (see new comment), and a major blocker to resolve.
You are stating that the human readable pretty-printed versions of the duration and bytes should not be maintained anymore, which I disagree with. But even if we agreed, the proper way to change that would be do open a separate PR and properly remove the two fields from the SQLite database and all the code that uses them, not just the code that maintains the values at UPDATE time!
In the context of this PR, please refrain from making a controversial half-baked change.
src/bin/pgcopydb/table-data.c
Outdated
.bytesTransmitted = 0, | ||
.lastSavingTimeMs = 0 | ||
}; | ||
INSTR_TIME_SET_CURRENT(context.lastSavingTimeMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we deal with that responsibility from within the hook function, allowing/expecting the context value to be set to zero by the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand your reasoning: This pattern is quite common in the codebase. What is it that sets this piece of code apart form others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which pattern do you mean? I can't remember of another place where we use INSTR TIME portability macros for high-precision time measurement from libpq and also a callback function and a context...
Please use the INSTR TIME macros only in the callback function, so that the caller does not need to know why your callback implementation decided to use INSTR TIME macros instead of time(NULL)
for a 5 seconds duration where accuracy is not a concern, allowing us to change our mind later about how the callback function is implemented without having to also change the code from the caller that has to prepare the context.
A good init value would be zero, because in C name = { .specs = specs }
will leave the other fields properly initialized to zero. It is very easy in the callback function to initialize the lastSavingTimeMs to now when called the first time.
- Replaces size-based saving check with interval-based saving check
65c9b56
to
a3591b5
Compare
Closing for triage purposes. A part of this PR has been superseded in #858, other aspects need to be addressed in their own PR. |
Fixes: #388
Allows the user to see the progress of the copy operation by executing the
pgcopydb list progress
command. The user will see that the duration and the bytes transferred will be increasing.