-
Notifications
You must be signed in to change notification settings - Fork 303
Fixes #38685 - Clean duplicate erratum packages #11480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideIntroduces a new rake task to detect and remove duplicate Katello::ErratumPackage records, reassigning associated module stream links before deleting duplicates to ensure data integrity ahead of the bigint migration. Sequence diagram for duplicate erratum package cleanup tasksequenceDiagram
participant RakeTask as Rake Task
participant User as User (anonymous_api_admin)
participant EP as Katello::ErratumPackage
participant MSEP as Katello::ModuleStreamErratumPackage
RakeTask->>User: Set current user to anonymous_api_admin
RakeTask->>EP: Find duplicate groups by nvrea, erratum_id, name, filename
loop For each duplicate group
RakeTask->>EP: Get all duplicate IDs
RakeTask->>MSEP: For each duplicate, delete conflicting associations
RakeTask->>MSEP: Update remaining associations to point to kept ID
end
RakeTask->>EP: Delete duplicate records
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `lib/katello/tasks/repository.rake:123` </location>
<code_context>
+ update_mappings.each_slice(1000) do |batch|
+ batch.each do |old_id, new_id|
+ # Delete records where module_stream already has the target erratum_package
+ Katello::ModuleStreamErratumPackage
+ .where(erratum_package_id: old_id)
+ .where(
+ module_stream_id: Katello::ModuleStreamErratumPackage
+ .where(erratum_package_id: new_id)
+ .select(:module_stream_id)
+ )
+ .delete_all
+
+ # Update remaining records
</code_context>
<issue_to_address>
Deleting records before updating may cause data loss if not all relationships are covered.
Please ensure this deletion logic accounts for all relevant relationships and maintains referential integrity.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
2e31672 to
570af3e
Compare
|
We might want to alert the user that they need to reindex their DBs. Otherwise, if they clean up these dupes and proceed with the upgrade, the duplicates can get populated all over again. Somewhat related: theforeman/foreman_maintain#1032 If we'll trigger this rake task from the pre-hooks in the installer or foreman_maintain, perhaps the warning could be reported then instead. Essentially, they need to run |
|
There is value in alerting users here if they are ever running this outside of an upgrade..And the same warning will appear as part of upgrade when this task runs.. I'll add a message here.. |
4450067 to
1ce8ba1
Compare
|
@ianballou @pavanshekar This is good for a review.. |
lib/katello/tasks/repository.rake
Outdated
| def handle_duplicate_erratum_packages | ||
| # Alert users that they need to reindex their database to ensure the indexes are re-run and active. | ||
| puts "Please reindex your database to ensure indexes are rebuilt and active." | ||
| puts "This can be acheived by running `runuser -u postgres -- reindexdb -a`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just one small typo:
| puts "This can be acheived by running `runuser -u postgres -- reindexdb -a`" | |
| puts "This can be achieved by running `runuser -u postgres -- reindexdb -a`" |
pavanshekar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acking... - works well! 👍
1ce8ba1 to
d6e7dbf
Compare
d6e7dbf to
60119a2
Compare
ianballou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
(cherry picked from commit dac6836)
(cherry picked from commit dac6836)
What are the changes introduced in this pull request?
This pull request introduces a rake task to clean up duplicate Katello::ErratumPackage records. This cleanup is a crucial preparatory step before the upcoming bigint migration to ensure data integrity and prevent potential issues with unique constraints.
Considerations taken when implementing this change?
What are the testing steps for this pull request?
To test this change, you can create dummy duplicate data and then run the cleanup task.
First, drop the unique constraint on erratum_packages to allow for duplicate data creation.
psql -d katello -U katello
DROP index katello_erratum_packages_eid_nvrea_n_fNext, create dummy duplicate Katello::ErratumPackage records from existing data in the Rails console.
Gather existing IDs for creating associations.
Create duplicate references to the newly created packages.
Finally, run the cleanup task and verify that the duplicates are removed and the associations are correctly maintained.
bundle exec rails katello:cleanup_duplicate_erratum_packages
To verify that duplicates got cleaned up by adding back the unique index:
CREATE UNIQUE INDEX katello_erratum_packages_eid_nvrea_n_f ON katello_erratum_packages USING btree (erratum_id, nvrea, name, filename);
Summary by Sourcery
Add a rake task to clean up duplicate Katello::ErratumPackage records by consolidating duplicates and updating associated module stream references in preparation for the bigint migration.
New Features:
Enhancements: