Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy-repo makes only two copy calls for a repo pair [RHELDST-28235] #327

Merged

Conversation

rajulkumar
Copy link
Collaborator

Copy-repo previously made copy calls for each content type. Lots of Pulp tasks were spawned with this creating a big queue and slowing down the operations. Hence, this updates copy-repo to make only two copy calls, one for rpm and other for non-rpm content types.

Non-rpm content type units are smaller in count and unit_fields are smaller in size, hence they are all clubbed in the same call ignoring the unit_fields.

Also, non-rpm content type units are copied first as they may have modulemd units, which should be copied before the corresponding modular rpm units to avoid those rpms being visible to user without modulemd units in case of failure or partial copy.

@rajulkumar rajulkumar requested a review from rohanpm as a code owner December 10, 2024 06:35
Copy link

codecov bot commented Dec 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (0a4e6ee) to head (1788ebd).
Report is 19 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #327   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           54        54           
  Lines         3012      3010    -2     
=========================================
- Hits          3012      3010    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rbikar
Copy link
Member

rbikar commented Dec 10, 2024

@rajulkumar

I don't think that currently the code guarantee that modulemd copies (non-rpm) are finished before rpm copies. Can you have a look, please? If I get it right, the copies are only enqueued in correct order but order of execution can be different.
But in reality we haven't that ensured even with the original approach as this command isn't run that frequently we didn't get any complaints about that.

Copy-repo previously made copy calls for each content type.
Lots of Pulp tasks were spawned with this creating a big queue
and slowing down the operations. Hence, this updates copy-repo
to make only two copy calls, one for rpm and other for non-rpm
content types.

Non-rpm content type units are smaller in count and unit_fields
are smaller in size, hence they are all clubbed in the same call
ignoring the unit_fields.

Also, non-rpm content type units are copied first as they may have
modulemd units, which should be copied before the corresponding
modular rpm units to avoid those rpms being visible to user without
modulemd units in case of failure or partial copy.
@rajulkumar rajulkumar force-pushed the reduce_copy_repo_calls branch from 0f7afa2 to 1788ebd Compare December 11, 2024 00:44
@rajulkumar
Copy link
Collaborator Author

@rbikar
I updated the copy_content method to ensure the criteria are processed and completed sequentially for repo pair.
Also, pylint gave "cell-var-from-loop" warning, so checked the units in the controller for the updates to repository memberships wrt to src and dest repos and it looked fine. Hence, disabled the warning there.

# ensure the criterias are processed and completed/resolved in order
# so that non-rpm copy completes before rpm copy
# pylint:disable=cell-var-from-loop
tasks_f = f_flat_map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think we should be fine with this pylint warning, the issue is with those variables that change each loop and are used in lambda. And if you use it like this you will got all lambdas called with values of the last iteration:

fcs = []
for x in range(10):
    f = lambda: x
    fcs.append(f)

for func in fcs:
    print(func())

"""
OUTPUT:
9
9
9
9
9
9
9
9
9
9
"""

But iiuic we should be fine because we call it immediately and test didn't discover any issue with it.

@rajulkumar rajulkumar merged commit 49e0556 into release-engineering:master Dec 16, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants