Skip to content

EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 40 additions & 15 deletions b2sdk/transfer/emerge/planner/planner.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,26 +176,51 @@ def _get_emerge_parts(self, intent_fragments_iterator):
missing_length = min_part_size - upload_buffer.length
else:
missing_length = 0
if missing_length > 0 and current_len - missing_length < min_part_size:
# current intent is *not* a "small copy", but upload buffer is small
# and current intent is too short with the buffer to reach the minimum part size
# so we append current intent to upload buffer
upload_buffer.append(current_intent, current_end)
else:
if missing_length > 0:
# we "borrow" a fragment of current intent to upload buffer
# to fill it to minimum part size
upload_buffer.append(
current_intent, upload_buffer.end_offset + missing_length
)
# completely flush the upload buffer
if missing_length > 0 and (current_len - missing_length) >= min_part_size:
# we borrow exact size of missing_length bytes, to fill the buffer
borrow_end = upload_buffer.end_offset + missing_length
upload_buffer.append(current_intent, borrow_end)
# then we flush the upload buffer
for upload_buffer_part in self._buff_split(upload_buffer):
yield self._get_upload_part(upload_buffer_part)
# split current intent (copy source) to parts and yield
# then we calculate the rest of current intent
remaining_length = current_end - borrow_end
start_offset = borrow_end
# split current intent to the parts as long as
# the remaining part is larger than min_part_size and yield
while remaining_length >= min_part_size:
part_end = start_offset + min_part_size
copy_parts = self._get_copy_parts(
current_intent,
start_offset=start_offset,
end_offset=part_end
)
for part in copy_parts:
yield part
start_offset = part_end
remaining_length = current_end - start_offset
# we add the last part less than min_part_size to the upload buffer
if remaining_length > 0:
upload_buffer = UploadBuffer(start_offset)
upload_buffer.append(current_intent, current_end)
for upload_buffer_part in self._buff_split(upload_buffer):
yield self._get_upload_part(upload_buffer_part)
upload_buffer = UploadBuffer(current_end)
else:
upload_buffer = UploadBuffer(current_end)
Comment on lines +208 to +210
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
upload_buffer = UploadBuffer(current_end)
else:
upload_buffer = UploadBuffer(current_end)
upload_buffer = UploadBuffer(current_end)

elif missing_length > 0:
# we "borrow" a fragment of current intent to upload buffer
# to fill it to minimum part size
Comment on lines +212 to +213
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# we "borrow" a fragment of current intent to upload buffer
# to fill it to minimum part size
# we "borrow" a fragment of current intent to upload buffer
# to fill it to minimum part size

upload_buffer.append(
current_intent,
current_end
)
else:
# if there are no missing bytes then process the copy fragment without spliting
copy_parts = self._get_copy_parts(
current_intent,
start_offset=upload_buffer.end_offset,
end_offset=current_end,
end_offset=current_end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use trailing commas on multi-line statements so that if someone wants to add an argument, the diff consists of just one "added" line and not 2 added 1 removed

)
for part in copy_parts:
yield part
Expand Down