Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove empty default values for inputs/outputs arguments #3744

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

maddenp-noaa
Copy link

@maddenp-noaa maddenp-noaa commented Jan 13, 2025

Description

Using a mutable object like [] as a default argument value can be dangerous (see the description of this PR for the tutorial, the Important warning from the Python docs, Google's style guide, or the sad stories from googling "python mutable default argument"). This PR removes those.

An immutable default-argument value is nominally safe, but erroneous in a case like that where inputs=() appears in the formal arguments but inputs[0] appears in the function body (i.e., () cannot possibly be a valid value). This PR removes those.

There were a handful of cases, like inputs=() followed by for input in inputs: ..., where an empty iterable would be safe and correct, but where it seemed like it wouldn't make sense to call an app this way. This PR removes those, but I'd be happy to restore those (or anything, really) if I misunderstood the intent.

Given that the common "pythonic" advice is to specify e.g. inputs=None, I updated some bits of code to deal with this correctly. I'll flag those via inline comments.

I was able to pass the make test tests except for htex_local_alternate_test, wqex_local_test, vineex_local_test, radical_local_test, and config_local_test. htex_local_alternate_test and config_local_test run but appear to hang, wqex_local_test and vineex_local_test appear to require Python modules I don't have installed, and radical_local_test fails with a complaint about mpiexec -- but the behavior is the same for me on master, so I'm not sure it has anything to do with the PR changes.

Changed Behaviour

I wouldn't expect existing workflows to behave differently.

Fixes

NA but I'd be happy to open an Issue if it'd be helpful.

Type of change

Choose which options apply, and delete the ones which do not apply.

  • Update to human readable text: Documentation/error messages/comments
  • Code maintenance/cleanup

@benclifford
Copy link
Collaborator

looks like something breaks in the work queue executor here - i'll have a poke at that

@@ -301,7 +301,7 @@ New Functionality
.. code-block:: python

@bash_app
def cat(inputs=(), outputs=()):
def cat(inputs, outputs):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident, though, that Parsl treats args and kwargs the same with respect to these two special arguments. I'm going to dig a bit more and see if I can convince myself that these do not have to be keyword arguments in order to be recognized correctly, but I'm worried that the previous =() default argument habit might have been related to this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I expect that's where that style came from too - but it's not particularly nice and it would be good to figure it out and fix rather than behave not-like-normal-Python

Comment on lines +199 to 200
def reduce_app(inputs):
return sum(inputs)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's one of the cases where an empty iterable would actually work:

>>> inputs = tuple()
>>> sum(inputs)
0

But I'm not sure that's useful behavior for this app.

@@ -66,7 +66,7 @@ def wrapper(*args, **kwargs):
return wrapper


def _ftp_stage_in(working_dir, parent_fut=None, outputs=[], _parsl_staging_inhibit=True):
def _ftp_stage_in(working_dir, outputs, parent_fut=None, _parsl_staging_inhibit=True):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments with default values must follow those without.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can force args to be keyword args with * like this:

def a(x,y,*, z=10,q):

I've personally been encouraging the use of * in the codebase because it provides a cleaner separation between positional and keyword args, and forces people to use the keyword names, which makes long argument lists less bug-prone.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the call sites for the functions where I'd reordered the arguments all provide kwargs anyway, so I updated the arg lists to split args/kwargs with * and reverted the argument order.

@@ -788,7 +788,7 @@ def _add_input_deps(self, executor: str, args: Sequence[Any], kwargs: Dict[str,
logger.debug("Not performing input staging")
return args, kwargs, func

inputs = kwargs.get('inputs', [])
inputs = kwargs.get('inputs') or []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in several places below, the old code would assign None to inputs if the app's formal arguments contained inputs=None, and then the enumerate() below would fail. The new code will assign [] to inputs in this case.

@@ -944,7 +944,7 @@ def append_failure(e: Exception, dep: Future) -> None:
append_failure(e, dep)

# Check for futures in inputs=[<fut>...]
if 'inputs' in kwargs:
if kwargs.get('inputs'):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above: If inputs=None is supplied, 'inputs' in kwargs will be True, but None will not be iterable below. Now, inputs has to be specified and be truthy (which None is not). Other bad but truthy values could still make it past this check, but that was previously the case, too.

@@ -12,7 +12,7 @@ TARBALL="cctools-$CCTOOLS_VERSION-x86_64-ubuntu20.04.tar.gz"

# If stderr is *not* a TTY, then disable progress bar and show HTTP response headers
[[ ! -t 1 ]] && NO_VERBOSE="--no-verbose" SHOW_HEADERS="-S"
wget "$NO_VERBOSE" "$SHOW_HEADERS" -O /tmp/cctools.tar.gz "https://github.com/cooperative-computing-lab/cctools/releases/download/release/$CCTOOLS_VERSION/$TARBALL"
wget $NO_VERBOSE $SHOW_HEADERS -O /tmp/cctools.tar.gz "https://github.com/cooperative-computing-lab/cctools/releases/download/release/$CCTOOLS_VERSION/$TARBALL"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wget, at least on my system, fails when NO_VERBOSE and SHOW_HEADERS are empty/unset, as the command like ends up looking like

wget '' '' -O /tmp/cctools.tar.gz "<the url>"

and e.g.

~ $ wget '' '' -O /dev/null https://google.com &>/dev/null; echo $?
1
~ $ wget -O /dev/null https://google.com &>/dev/null; echo $?
0

Due to the set -e above, the script then aborted. This change fixed it for me.

@@ -28,7 +28,7 @@ def test_immediate_datafuture():
"""

import time
fu = echo_slow_message("Hello world", sleep=1, outputs=["hello.1.txt"])
fu = echo_slow_message("Hello world", outputs=["hello.1.txt"], sleep=1)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reordered these to match the reordered echo_slow_message() argument list, above, though it wasn't technically necessary since they're kwargs.

@maddenp-noaa
Copy link
Author

A failing test calls this bash_app, which it updated in this PR to replace =() with =None in the formal args. There might be something I didn't catch, like an assumption that the value is an iterable where I didn't convert None to []. This error kind of sounds like it's actually trying to transfer a file back from the worker environment, but the same error also says name 'RemoteExceptionWrapper' is not defined, which sounds like an import issue on the worker. I'm might be out of my depth on this one.

It might be worth re-running the tests without the inputs=None, outputs=None arguments, which aren't used in the app body anyway, though I think their presence should do no harm.

@maddenp-noaa
Copy link
Author

The error I mentioned above might be a byproduct, not a root cause. Locally I'm seeing errors about importing the work_queue module, via make wqex_local_test, in different tests depending on run order. This might have been what you alluded to earlier.

Also, I found that my local hang for the make htex_local_alternate_test tests disappears, and the tests pass, if I comment out monitoring=MonitoringHub(...) from configs/htex_local_alternate.py. Maybe that's a service that needs to be made available externally?

I'm happy to help with anything, but may lack context to understand some failures. I'll put this on my back burner for now. I'd understand if you decided that this isn't worth pursuing at the moment, in which case it might make sense to roll back the parsl-tutorial change that removed the =[] default argument value, which might be bad advice without corresponding changes here.

@benclifford
Copy link
Collaborator

I think this is worth pursuing and probably the right thing next is for me to figure out why things are failing on this PR. The changes here are a reasonable target for how things "should" work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants