Skip to content

Conversation

@sjha4
Copy link
Member

@sjha4 sjha4 commented Sep 9, 2025

What are the changes introduced in this pull request?

We introduced a rake task as part of: #11480 .. However, to properly run the upgrade without depending on the rake task as upstream doesn't have f-m and the pre-upgrade hook in f-m can't access the rake task without backporting the task in time, a migration file change would be the change easier to deliver.

Considerations taken when implementing this change?

Transfer rake task logic to migration file for bigint migration which in a case triggered the bad index error.

What are the testing steps for this pull request?

Refer to steps in #11480 (comment).

However, you'll want to run bundle exec rails db:rollback before/after you've created all the test data to be able to rerun the updated migration file with bundle exec rails db:migrate.

Summary by Sourcery

Incorporate duplicate erratum package cleanup into the bigint migration to ensure a smooth upgrade without relying on an external rake task

Enhancements:

  • Embed cleanup_duplicate_erratum_packages method in UseBigIntForErratumPackagesId migration
  • Detect and consolidate duplicate erratum package records by merging related module_stream associations
  • Remove redundant erratum package entries before altering the id column to bigint

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 9, 2025

Reviewer's Guide

This migration now embeds the rake task logic directly to clean up duplicate erratum_packages and their associations before converting the id column to bigint, ensuring data consistency without relying on external tasks.

Entity relationship diagram for updated erratum package cleanup migration

erDiagram
  KATELLO_ERRATUM_PACKAGE {
    bigint id
    string nvrea
    integer erratum_id
    string name
    string filename
  }
  KATELLO_MODULE_STREAM_ERRATUM_PACKAGE {
    integer id
    bigint erratum_package_id
    integer module_stream_id
  }
  KATELLO_ERRATUM_PACKAGE ||--o{ KATELLO_MODULE_STREAM_ERRATUM_PACKAGE : "erratum_package_id"
Loading

File-Level Changes

Change Details Files
Embed duplicate cleanup into migration
  • Invoke cleanup_duplicate_erratum_packages at the start of the up method
  • Define cleanup_duplicate_erratum_packages to identify duplicate ErratumPackage records via grouping and having
  • Map duplicate IDs to a single kept record and collect IDs to delete
  • Batch-update ModuleStreamErratumPackage associations: remove conflicting entries and reassign old IDs to the kept ID
  • Delete all duplicate ErratumPackage records after association updates
db/migrate/20250613210050_use_big_int_for_erratum_packages_id.rb

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `db/migrate/20250613210050_use_big_int_for_erratum_packages_id.rb:46` </location>
<code_context>
+
+    update_mappings.each_slice(1000) do |batch|
+      batch.each do |old_id, new_id|
+        Katello::ModuleStreamErratumPackage
+         .where(erratum_package_id: old_id)
+         .where(
+          module_stream_id: Katello::ModuleStreamErratumPackage
+                             .where(erratum_package_id: new_id)
+                             .select(:module_stream_id)
+         )
+         .delete_all
+
+        Katello::ModuleStreamErratumPackage
</code_context>

<issue_to_address>
The delete_all query may be inefficient due to nested where clauses.

This approach may trigger a subquery for each batch item, degrading performance with large datasets. Please consider optimizing by precomputing module_stream_ids or batching deletions.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +46 to +53
Katello::ModuleStreamErratumPackage
.where(erratum_package_id: old_id)
.where(
module_stream_id: Katello::ModuleStreamErratumPackage
.where(erratum_package_id: new_id)
.select(:module_stream_id)
)
.delete_all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): The delete_all query may be inefficient due to nested where clauses.

This approach may trigger a subquery for each batch item, degrading performance with large datasets. Please consider optimizing by precomputing module_stream_ids or batching deletions.

@sjha4
Copy link
Member Author

sjha4 commented Sep 9, 2025

Did some performance testing on a DB with the following counts:

katello=# select count(1) from katello_module_stream_erratum_packages;
  count  
---------
 1100837
(1 row)

katello=# select count(1) from katello_erratum_packages;
 count  
--------
 716163
(1 row)

time bundle exec rails db:migrate
...
...
== 20250613210050 UseBigIntForErratumPackagesId: migrating ====================
-- execute("ALTER SEQUENCE katello_erratum_packages_id_seq AS bigint;")
   -> 0.0122s
-- change_column(:katello_erratum_packages, :id, :bigint)
   -> 0.7717s
== 20250613210050 UseBigIntForErratumPackagesId: migrated (824.5893s) =========

== 20250714190050 AddMissingRpmsEvrIndex: migrating ===========================
-- index_exists?(:katello_rpms, [:name, :arch, :evr])
   -> 0.0029s
-- add_index(:katello_rpms, [:name, :arch, :evr])
   -> 1.2450s
== 20250714190050 AddMissingRpmsEvrIndex: migrated (1.2481s) ==================


real	13m50.593s
user	0m2.335s
sys	0m0.344s

@sjha4
Copy link
Member Author

sjha4 commented Sep 10, 2025

After discussions offline, we decided against having this as a migration which runs for all upgrades. The rake task itself can live independently for support purposes while not being an upgrade blocker. Closing this for reasons above.

@sjha4 sjha4 closed this Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant