forked from Netflix/metaflow
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOGMLE - Fix argo metaflow retry integration #1
Open
dhpikolo
wants to merge
31
commits into
master
Choose a base branch
from
logmle-debug-step-cli
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
ccdd76b
add custom log
dhpikolo 0427c73
use echo_always
dhpikolo 9ab0b1c
debug log to spot attempt number
dhpikolo 8617d44
pass kwargs get_task_datastore()
dhpikolo cfc0624
use try-excpet to mitigate datastore error
dhpikolo 47b8dd3
enable allow_not_done
dhpikolo f2126ec
remove try/except block
dhpikolo 966ac66
print datastore metadata
dhpikolo 3576162
use flow_datastore intead
dhpikolo 326d847
fix attrs
dhpikolo f4d0acd
Update metaflow/cli_components/step_cmd.py
dhpikolo 8f5044f
Update metaflow/cli_components/step_cmd.py
dhpikolo f1aa3a6
add try except block
dhpikolo cc2a622
correct math
dhpikolo bba6504
do not use ca_store attrs
dhpikolo 5603de0
improve logs
dhpikolo e605958
use getattr
dhpikolo f2829d2
set allow_not_done = True
dhpikolo 5f7d50d
use taskdatastores instead
dhpikolo 8fc9b37
add logs
dhpikolo ca8736f
Update metaflow/cli_components/step_cmd.py
dhpikolo 98b7579
save task std logs
dhpikolo 250b1ef
Merge branch 'logmle-debug-step-cli' of https://github.com/dhpikolo/m…
dhpikolo 7f89598
Revert "save task std logs"
dhpikolo 44e05cc
refactor + infer attempt in mflog.save_logs
dhpikolo b5bb050
change default to 0
dhpikolo a405118
Merge remote-tracking branch 'origin' into logmle-debug-step-cli
dhpikolo b69f81d
remove casting to int
dhpikolo 10ec80f
set default to None
dhpikolo 8959ae5
move get latest done attempts before if-else
dhpikolo 1277051
revert max_user_code_retries setting - since tests fail
dhpikolo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,7 +21,6 @@ def _read_file(path): | |
|
||
# these env vars are set by mflog.mflog_env | ||
pathspec = os.environ["MF_PATHSPEC"] | ||
attempt = os.environ["MF_ATTEMPT"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where does this come from? I am wondering if we can leave all the code in this file alone and try to set the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
ds_type = os.environ["MF_DATASTORE"] | ||
ds_root = os.environ.get("MF_DATASTORE_ROOT") | ||
paths = (os.environ["MFLOG_STDOUT"], os.environ["MFLOG_STDERR"]) | ||
|
@@ -37,8 +36,10 @@ def print_clean(line, **kwargs): | |
flow_datastore = FlowDataStore( | ||
flow_name, None, storage_impl=storage_impl, ds_root=ds_root | ||
) | ||
# Use inferred attempt - to save task_stdout.log and task_stderr.log | ||
latest_done_attempt = flow_datastore.get_latest_done_attempt(run_id=run_id, step_name=step_name, task_id=task_id) | ||
task_datastore = flow_datastore.get_task_datastore( | ||
run_id, step_name, task_id, int(attempt), mode="w" | ||
run_id, step_name, task_id, latest_done_attempt, mode="w" | ||
) | ||
|
||
try: | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, @colebaileygit
MAX_ATTEMPTS vs max_user_code_retries
i think,
max_user_code_retries
is number of times a task can be retried. this value is same as the value that the retry decorator contains.sample workflow-template
Where as
MAX_ATTEMPTS
is number of total attempts that a task can be ran. [here], which is basically the number of retries + initial run