Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: Cmd argument in non-shell mode should not split the string #794

Open
hades opened this issue Feb 20, 2025 · 7 comments
Open

exec: Cmd argument in non-shell mode should not split the string #794

hades opened this issue Feb 20, 2025 · 7 comments

Comments

@hades
Copy link
Contributor

hades commented Feb 20, 2025

As discussed in cnfgmngmntcmp.

Currently, if shell is unset (the default), the cmd parameter is interpreted as a list of arguments separated by a non-empty sequence of spaces. (E.g. "ls -l /tmp" becomes ["ls", "-l", "/tmp"])[1].

While useful on the surface, this is an unusual behaviour, that I personally would least expect from this parameter, and most likely will be hard to make useful for complex commands.

Curiously, Ansible's ansible.builtin.command works in a similar way. However the mgmt implementation is not compatible: Ansible uses the Python's shlex module[2], which implements a parody of a working shell parser.

Additionally, if this ever works on Windows, this will become even more weird. Windows uses a single string to pass parameters to the processes, so the POSIX implementation is to concatenate the array of arguments and pass it as one string. So this mgmt resource will split the string, which then will be concatenated in libc, (which then most likely will be split again in the process itself).

My suggestions (in reverse order of preference) are:

  • replace Args, Cmd and Shell with the following parameters:
    • Command: to specify a command as a list of arguments, where the first argument is a path to the executable
    • ShellCommand: to specify a command as a string to be passed to a shell (sh -c "...")
    • Shell: to override the shell implementation
      Command and (Shell, ShellCommand) would be mutually exclusive. Shell becomes optional, and ShellCommand by itself invokes the default shell.
  • similarly to Ansible, add a new resource ("shell") to specifically use shell for running the commands. Change the Cmd parameter of the exec resource to accept a list of arguments
  • use a shell-like parser to imitate parsing of the Cmd parameter with more fidelity
  • document the current behaviour in detail

I'd be happy to provide a path for whatever suggestion of your choice, as long as you choose correctly.

[1] https://mgmtconfig.com/docs/resources/#exec
[2] https://github.com/python/cpython/blob/3.13/Lib/shlex.py

@purpleidea
Copy link
Owner

as long as you choose correctly

👀

We'll do our best. A few thoughts to aid the design discussion:

  1. Let's not worry about Windows being weird. I know it is weird, and so as a result, we should focus on what's optimal for Linux users. If that happens to be compatible with Windows, then great. If it's not, then we can always add a windows:exec resource.

  2. We want the default to be to avoid invoking the shell. We don't want that overheard by default if we can exec a subprocess directly.

  3. It would be most helpful if you added some pseudo-code exec examples of what you think is the "correct" approach. This will make it a bit less confusing and less ambiguous so I don't misunderstand your wishes!

Thank you =D

@hades
Copy link
Contributor Author

hades commented Feb 22, 2025

Here are some examples. For the current implementation:

# BAD
exec "exec1" {
  cmd => "/usr/bin/ls -l    /tmp",
}
exec "exec2" {
  cmd => "/usr/bin/ls -l    /tmp > /tmp/file_list.txt",
  shell => "/bin/bash",
}
exec "exec3" {
  cmd => "/usr/bin/ls -l    /tmp > /tmp/file_list.txt",
  shell => "/bin/zsh",
}

For the suggested changes:

# GOOD (Option 1)
exec "exec1" {
  command => ["/usr/bin/ls",  "-l", "/tmp",],
}
exec "exec2" {
  shellcommand => "/usr/bin/ls -l    /tmp > /tmp/file_list.txt",
}
exec "exec3" {
  shellcommand => "/usr/bin/ls -l    /tmp > /tmp/file_list.txt",
  shell => "/bin/zsh",
}

Or with the Ansible option:

# GOOD (Option 2)
exec "exec1" {
  cmd => ["/usr/bin/ls",  "-l", "/tmp",],
}
shellexec "exec2" {
  cmd => "/usr/bin/ls -l    /tmp > /tmp/file_list.txt",
}
shellexec "exec3" {
  cmd => "/usr/bin/ls -l    /tmp > /tmp/file_list.txt",
  shell => "/bin/zsh",
}

@purpleidea
Copy link
Owner

If I understand correctly, your main goal is that we express the "command" as a list rather than as a string, is that correct?

Keep in mind we still need a scenario where the user uses the "name" variable and doesn't override with a cmd param, so that should still have some useful semantics. (When we use cmd or args or whatever relevant params, then the name is only used as a unique identifier, and not a source of data for the resource.)

Lastly, assuming we agree on this change, are you willing to implement a polished version?

@hades
Copy link
Contributor Author

hades commented Feb 22, 2025

If I understand correctly, your main goal is that we express the "command" as a list rather than as a string, is that correct?

Well, yes. I want to emphasize that "the command" is inherently a list. Currently this list is represented as a space-delimited string, and I would like to represent it as a list.

Keep in mind we still need a scenario where the user uses the "name" variable and doesn't override with a cmd param, so that should still have some useful semantics.

Ah. I'm not sure I'm a big fan of this feature, but, sure, the name could be interpreted as a list of one item, i.e. these two are equivalent:

exec "/bin/eject" {
}

and

exec "/bin/eject" {
  command => ["/bin/eject",],
}

Is that what you have in mind?

Lastly, assuming we agree on this change, are you willing to implement a polished version?

Sure.

@purpleidea
Copy link
Owner

The behaviour of:

exec "/usr/bin/ls -l /tmp" {

}

must still work. And I expect it would be identical to doing:

exec "whatever" {
    cmd => ["/usr/bin/ls", "-l", "/tmp",],
}

Having said that, I'm okay with this change, but I'd love it if @frebib had a look to ACK it first.

Looking through the golang code, the "important" golang exec API is https://pkg.go.dev/os/exec#CommandContext (or the non ctx version) which both have argv[0] separate from the remaining args... So I suppose that's why I did it that way,

Is there a good reason why it separated those two instead of just having a single argv? The only argument I can think of is that it prevents someone who controls the argv array from controlling the binary path.

@hades
Copy link
Contributor Author

hades commented Feb 23, 2025

The behaviour of [...] must still work.

Well, all the arguments against the status quo would still apply to this shorthand, and we would need to decide what exactly is the semantics of that string. Is it Windows-like or Ansible-like? In other words, is

exec "/usr/bin/ls -l 'Documents and Settings'" {}

equivalent to

exec "whatever" {
    command => ["/usr/bin/ls", "-l", "'Documents", "and", "Settings'",],
}

or

exec "whatever" {
    command => ["/usr/bin/ls", "-l", "Documents and Settings",],
}

Neither is particularly attractive to me, and in general I personally see no value in making that shorthand work, especially given its unclear semantics.

Is there a good reason why it separated those two instead of just having a single argv?

golang works in mysterious ways.

@frebib
Copy link
Contributor

frebib commented Feb 23, 2025

The behaviour of [...] must still work.

Well, all the arguments against the status quo would still apply to this shorthand, and we would need to decide what exactly is the semantics of that string. Is it Windows-like or Ansible-like? In other words, is

exec "/usr/bin/ls -l 'Documents and Settings'" {}

equivalent to

exec "whatever" {
    command => ["/usr/bin/ls", "-l", "'Documents", "and", "Settings'",],
}

or

exec "whatever" {
    command => ["/usr/bin/ls", "-l", "Documents and Settings",],
}

Neither is particularly attractive to me, and in general I personally see no value in making that shorthand work, especially given its unclear semantics.

Echoing what I said in chat. I 100% agree with hades' take here. It's a footgun and a good piece of software will be opinionated enough to protect it's users even at the cost of apparent convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants