-
Notifications
You must be signed in to change notification settings - Fork 0
LOGMLE - Fix argo metaflow retry integration #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 25 commits
ccdd76b
0427c73
9ab0b1c
8617d44
cfc0624
47b8dd3
f2126ec
966ac66
3576162
326d847
f4d0acd
8f5044f
f1aa3a6
cc2a622
bba6504
5603de0
e605958
f2829d2
5f7d50d
8fc9b37
ca8736f
98b7579
250b1ef
7f89598
44e05cc
b5bb050
a405118
b69f81d
10ec80f
8959ae5
1277051
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,7 +21,6 @@ def _read_file(path): | |
|
||
# these env vars are set by mflog.mflog_env | ||
pathspec = os.environ["MF_PATHSPEC"] | ||
attempt = os.environ["MF_ATTEMPT"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where does this come from? I am wondering if we can leave all the code in this file alone and try to set the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
ds_type = os.environ["MF_DATASTORE"] | ||
ds_root = os.environ.get("MF_DATASTORE_ROOT") | ||
paths = (os.environ["MFLOG_STDOUT"], os.environ["MFLOG_STDERR"]) | ||
|
@@ -37,8 +36,10 @@ def print_clean(line, **kwargs): | |
flow_datastore = FlowDataStore( | ||
flow_name, None, storage_impl=storage_impl, ds_root=ds_root | ||
) | ||
# Use inferred attempt - to save task_stdout.log and task_stderr.log | ||
latest_done_attempt = flow_datastore.get_latest_done_attempt(run_id=run_id, step_name=step_name, task_id=task_id) | ||
task_datastore = flow_datastore.get_task_datastore( | ||
run_id, step_name, task_id, int(attempt), mode="w" | ||
run_id, step_name, task_id, int(latest_done_attempt), mode="w" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is needed because, both kubernetes and argo options run mflog.save_logs after the task is completed. which will upload Example run without this change, have no logs for argo retried attempts. ie Attempt There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we explicitly casting this to an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah you are right, i dont think we need to cast it anymore, the function returns a count. I had it this way to keep the changes minimal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. function returning |
||
) | ||
|
||
try: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove it after testing complete