Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve retrying downloads code and testing #1559

Merged
merged 20 commits into from
Feb 12, 2025
Merged

Improve retrying downloads code and testing #1559

merged 20 commits into from
Feb 12, 2025

Conversation

yarikoptic
Copy link
Member

Follow up to

We are into holidays already, so it is understood if review does not come. I will reserve the "right" to merge if there is user demand but likely we could just wait. If anyone sees "clearly" how to improve -- feel welcome to push directly.

@yarikoptic yarikoptic added the patch Increment the patch version when merged label Dec 20, 2024
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 90.09009% with 11 lines in your changes missing coverage. Please review.

Project coverage is 88.65%. Comparing base (746650f) to head (2482103).
Report is 25 commits behind head on master.

Files with missing lines Patch % Lines
dandi/dandiarchive.py 14.28% 6 Missing ⚠️
dandi/dandiapi.py 33.33% 2 Missing ⚠️
dandi/utils.py 92.30% 2 Missing ⚠️
dandi/download.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1559      +/-   ##
==========================================
+ Coverage   88.34%   88.65%   +0.30%     
==========================================
  Files          78       78              
  Lines       10735    10829      +94     
==========================================
+ Hits         9484     9600     +116     
+ Misses       1251     1229      -22     
Flag Coverage Δ
unittests 88.65% <90.09%> (+0.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dandi/consts.py Show resolved Hide resolved
dandi/download.py Outdated Show resolved Hide resolved
dandi/download.py Outdated Show resolved Hide resolved
dandi/download.py Show resolved Hide resolved
dandi/download.py Outdated Show resolved Hide resolved
Copy link
Member

@candleindark candleindark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, _check_if_more_attempts_allowed is equally, if not more, about sleeping/waiting before sending out another requests. I think it would be beneficial to rename it, or in some other way, to indicate this nature.

If we want to improve the retrying further, I think we can consider incorporating some well-known and recommended algorithms for retrying requests to a server while avoiding overwhelming it. https://chatgpt.com/share/676f8290-86cc-800a-9aa8-1ad952282d27.

dandi/download.py Outdated Show resolved Hide resolved
retry_after,
exc,
exc_ve,
)
lgr.debug(
"%s - download failed due to response %d with "
"Retry-After=%d: %s, will sleep and retry",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "Retry-After=[]" in this log is now insistent and sleep_amount may not be a meaningful number if retry_after failed to be parsed into a datetime object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I do not get what [] you have in mind since it can't be a list... please rereview when I push changes -- may be the issue is addressed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant that you have numerous logs with inconsistent labelling with "Retry-After=". Some of them you set it to the header value of the "Retry-After" (understandable), others you set it to sleep_amount which can originate from, but not exactly, the passed in header value for "Retry-After" and can default to -1.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check if it is all consistent now -- seems to me, always seems to point to retry_after.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Retry-After=[]s in log messages, where [] is used to indicate an expression of %d or %s, are consistent within the get_retry_after(), but this last one is still inconsistent with the ones in get_retry_after(). This last one prints the int value of sleep_amount, the result of get_retry_after(), and the others print whatever the value of the HTTP header Retry-After which can be a string of a specific date.

I think we can just remove the use of Retry-After= here. There is plenty of logged messages within get_retry_after()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this log statement should be removed entirely. Because of the containing if-statement, this log statement doesn't always get executed for all responses that have a Retry-After header. However, at the same time, I think the original intent is to execute this log statement for all responses with a Retry-After header.

We are better off if we just replace the entire if statement:

        if (sleep_amount := get_retry_after(exc.response)) is not None:
            lgr.debug(
                "%s - download failed due to response %d with "
                "Retry-After=%d: %s, will sleep and retry",
                path,
                exc.response.status_code,
                sleep_amount,
                exc,
            )

with

sleep_amount = get_retry_after(exc.response)

and unindent the next log statement as I recommended below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! done that, pushed

dandi/download.py Outdated Show resolved Hide resolved
dandi/download.py Show resolved Hide resolved
datetime.UTC  was introduced only in 3.11, so type checking which uses 3.9
ATM fails rightfully, as unittesting on 3.9 and 3.10
…ETRY_CODES are used

It also add additional checks/protection against odd retry-after results (too long negatives or positives),
which should address some concerns raised in prior code review
* origin/master:
  Update CHANGELOG.md [skip ci]
  Constrain version pin
  Further refine version pinning
  Pin `dandischema` to require the latest version
  Start Zarr download as soon as first page of entries is obtained
  Update CHANGELOG.md [skip ci]
  Use Python 3.10 to build docs
  Support dandischema v0.11.0
  Fix typo in `dandi move` docstring
  Don't use version 0.25.5 of `responses`
  Update URL for DANDI Docs
dandi/download.py Outdated Show resolved Hide resolved
dandi/download.py Outdated Show resolved Hide resolved
dandi/download.py Outdated Show resolved Hide resolved
dandi/utils.py Outdated Show resolved Hide resolved
dandi/utils.py Outdated Show resolved Hide resolved
dandi/utils.py Outdated Show resolved Hide resolved
dandi/tests/test_download.py Outdated Show resolved Hide resolved
dandi/tests/test_download.py Outdated Show resolved Hide resolved
dandi/tests/test_download.py Outdated Show resolved Hide resolved
@yarikoptic
Copy link
Member Author

rerunning windows which puked with

FAILED dandi/tests/test_dandiarchive.py::test_parse_api_url_glob[https://gui.dandiarchive.org/#/dandiset/001001/draft/files?location=sub-RAT123/*.nwb-parsed_url1] - requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

@yarikoptic yarikoptic requested a review from jwodder February 5, 2025 01:22
dandi/download.py Outdated Show resolved Hide resolved
dandi/download.py Outdated Show resolved Hide resolved
dandi/utils.py Outdated Show resolved Hide resolved
dandi/utils.py Outdated
elif sleep_amount < 0:
sleep_amount = None
lgr.warning(
"date in Retry-After=%r is in the past (current is %r). "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to do "Retry-After=%s" instead of "Retry-After=%r" if you want the HTTP header appears exactly the same as they are received, without the quotes.

dandi/utils.py Outdated Show resolved Hide resolved
retry_after,
exc,
exc_ve,
)
lgr.debug(
"%s - download failed due to response %d with "
"Retry-After=%d: %s, will sleep and retry",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Retry-After=[]s in log messages, where [] is used to indicate an expression of %d or %s, are consistent within the get_retry_after(), but this last one is still inconsistent with the ones in get_retry_after(). This last one prints the int value of sleep_amount, the result of get_retry_after(), and the others print whatever the value of the HTTP header Retry-After which can be a string of a specific date.

I think we can just remove the use of Retry-After= here. There is plenty of logged messages within get_retry_after()

Comment on lines 1137 to 1142
lgr.debug(
"%s - download failed on attempt #%d: %s, will sleep a bit and retry",
"%s - download failed on attempt #%d: %s, will sleep %f and retry",
path,
attempt,
exc,
sleep_amount,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just unindent these lines, and get the log statement out of the if-statement and remove the last log statement at line 1126. See the comments for the last log statement for more details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense, dedenting the log so we always log how long we are to sleep, not only if random amount due to no Retry-After. I did remove that other log

retry_after,
exc,
exc_ve,
)
lgr.debug(
"%s - download failed due to response %d with "
"Retry-After=%d: %s, will sleep and retry",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this log statement should be removed entirely. Because of the containing if-statement, this log statement doesn't always get executed for all responses that have a Retry-After header. However, at the same time, I think the original intent is to execute this log statement for all responses with a Retry-After header.

We are better off if we just replace the entire if statement:

        if (sleep_amount := get_retry_after(exc.response)) is not None:
            lgr.debug(
                "%s - download failed due to response %d with "
                "Retry-After=%d: %s, will sleep and retry",
                path,
                exc.response.status_code,
                sleep_amount,
                exc,
            )

with

sleep_amount = get_retry_after(exc.response)

and unindent the next log statement as I recommended below.

dandi/utils.py Outdated Show resolved Hide resolved
dandi/utils.py Outdated Show resolved Hide resolved
… Retry-After= etc

Co-authored-by: Isaac To <[email protected]>

Was amended locally later due to a dirty suggestion applied
@yarikoptic yarikoptic added the release Create a release when this pr is merged label Feb 12, 2025
@yarikoptic yarikoptic merged commit 527347c into master Feb 12, 2025
26 checks passed
@yarikoptic yarikoptic deleted the gh-1556 branch February 12, 2025 19:58
Copy link

🚀 PR was released in 0.66.7 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged release Create a release when this pr is merged released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants