-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve retrying downloads code and testing #1559
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1559 +/- ##
==========================================
+ Coverage 88.34% 88.65% +0.30%
==========================================
Files 78 78
Lines 10735 10829 +94
==========================================
+ Hits 9484 9600 +116
+ Misses 1251 1229 -22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, _check_if_more_attempts_allowed
is equally, if not more, about sleeping/waiting before sending out another requests. I think it would be beneficial to rename it, or in some other way, to indicate this nature.
If we want to improve the retrying further, I think we can consider incorporating some well-known and recommended algorithms for retrying requests to a server while avoiding overwhelming it. https://chatgpt.com/share/676f8290-86cc-800a-9aa8-1ad952282d27.
dandi/download.py
Outdated
retry_after, | ||
exc, | ||
exc_ve, | ||
) | ||
lgr.debug( | ||
"%s - download failed due to response %d with " | ||
"Retry-After=%d: %s, will sleep and retry", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "Retry-After=[]" in this log is now insistent and sleep_amount
may not be a meaningful number if retry_after
failed to be parsed into a datetime
object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I do not get what []
you have in mind since it can't be a list... please rereview when I push changes -- may be the issue is addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I meant that you have numerous logs with inconsistent labelling with "Retry-After="
. Some of them you set it to the header value of the "Retry-After"
(understandable), others you set it to sleep_amount
which can originate from, but not exactly, the passed in header value for "Retry-After"
and can default to -1.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check if it is all consistent now -- seems to me, always seems to point to retry_after
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Retry-After=[]
s in log messages, where []
is used to indicate an expression of %d
or %s
, are consistent within the get_retry_after()
, but this last one is still inconsistent with the ones in get_retry_after()
. This last one prints the int value of sleep_amount
, the result of get_retry_after()
, and the others print whatever the value of the HTTP header Retry-After
which can be a string of a specific date.
I think we can just remove the use of Retry-After=
here. There is plenty of logged messages within get_retry_after()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this log statement should be removed entirely. Because of the containing if-statement, this log statement doesn't always get executed for all responses that have a Retry-After
header. However, at the same time, I think the original intent is to execute this log statement for all responses with a Retry-After
header.
We are better off if we just replace the entire if statement:
if (sleep_amount := get_retry_after(exc.response)) is not None:
lgr.debug(
"%s - download failed due to response %d with "
"Retry-After=%d: %s, will sleep and retry",
path,
exc.response.status_code,
sleep_amount,
exc,
)
with
sleep_amount = get_retry_after(exc.response)
and unindent the next log statement as I recommended below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! done that, pushed
datetime.UTC was introduced only in 3.11, so type checking which uses 3.9 ATM fails rightfully, as unittesting on 3.9 and 3.10
…ETRY_CODES are used It also add additional checks/protection against odd retry-after results (too long negatives or positives), which should address some concerns raised in prior code review
* origin/master: Update CHANGELOG.md [skip ci] Constrain version pin Further refine version pinning Pin `dandischema` to require the latest version Start Zarr download as soon as first page of entries is obtained Update CHANGELOG.md [skip ci] Use Python 3.10 to build docs Support dandischema v0.11.0 Fix typo in `dandi move` docstring Don't use version 0.25.5 of `responses` Update URL for DANDI Docs
…rm how long to sleep
Co-authored-by: John T. Wodder II <[email protected]>
Co-authored-by: John T. Wodder II <[email protected]>
Co-authored-by: John T. Wodder II <[email protected]>
rerunning windows which puked with
|
dandi/utils.py
Outdated
elif sleep_amount < 0: | ||
sleep_amount = None | ||
lgr.warning( | ||
"date in Retry-After=%r is in the past (current is %r). " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also want to do "Retry-After=%s" instead of "Retry-After=%r" if you want the HTTP header appears exactly the same as they are received, without the quotes.
dandi/download.py
Outdated
retry_after, | ||
exc, | ||
exc_ve, | ||
) | ||
lgr.debug( | ||
"%s - download failed due to response %d with " | ||
"Retry-After=%d: %s, will sleep and retry", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Retry-After=[]
s in log messages, where []
is used to indicate an expression of %d
or %s
, are consistent within the get_retry_after()
, but this last one is still inconsistent with the ones in get_retry_after()
. This last one prints the int value of sleep_amount
, the result of get_retry_after()
, and the others print whatever the value of the HTTP header Retry-After
which can be a string of a specific date.
I think we can just remove the use of Retry-After=
here. There is plenty of logged messages within get_retry_after()
dandi/download.py
Outdated
lgr.debug( | ||
"%s - download failed on attempt #%d: %s, will sleep a bit and retry", | ||
"%s - download failed on attempt #%d: %s, will sleep %f and retry", | ||
path, | ||
attempt, | ||
exc, | ||
sleep_amount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just unindent these lines, and get the log statement out of the if-statement and remove the last log statement at line 1126. See the comments for the last log statement for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense, dedenting the log so we always log how long we are to sleep, not only if random amount due to no Retry-After. I did remove that other log
dandi/download.py
Outdated
retry_after, | ||
exc, | ||
exc_ve, | ||
) | ||
lgr.debug( | ||
"%s - download failed due to response %d with " | ||
"Retry-After=%d: %s, will sleep and retry", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this log statement should be removed entirely. Because of the containing if-statement, this log statement doesn't always get executed for all responses that have a Retry-After
header. However, at the same time, I think the original intent is to execute this log statement for all responses with a Retry-After
header.
We are better off if we just replace the entire if statement:
if (sleep_amount := get_retry_after(exc.response)) is not None:
lgr.debug(
"%s - download failed due to response %d with "
"Retry-After=%d: %s, will sleep and retry",
path,
exc.response.status_code,
sleep_amount,
exc,
)
with
sleep_amount = get_retry_after(exc.response)
and unindent the next log statement as I recommended below.
… Retry-After= etc Co-authored-by: Isaac To <[email protected]> Was amended locally later due to a dirty suggestion applied
🚀 PR was released in |
Follow up to
Continue retrying downloads on retriable statuses #1558
retry on 429 as well
handle Retry-After as a Date not just seconds (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Date)
adjust logging of sleeping upon retry - do in a single place
We are into holidays already, so it is understood if review does not come. I will reserve the "right" to merge if there is user demand but likely we could just wait. If anyone sees "clearly" how to improve -- feel welcome to push directly.