Skip to content

Improve APE shell script #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tomberek opened this issue Feb 28, 2021 · 10 comments
Closed

Improve APE shell script #85

tomberek opened this issue Feb 28, 2021 · 10 comments
Labels
contributions welcome We'll commit to review and maintenance if the people who need it write the changes.

Comments

@tomberek
Copy link
Contributor

I'm helping package this up for NixOS where our executables in the store are read-only. This causes an issue with the small bootstrapping shell script to convert itself into a native ELF. We can self-bootstrap during build-time, but this makes the result non-portable.

When run as sh hello.com it just fails with Permission denied. When run as ./hello.com it enters an infinite loop (perhaps the mini-script should check the result of the redirect for failure before exec'ing itself?)

Not sure the best way to proceed. Random list of thoughts:

  • include a conditional to check if self-modification was successful (prevent infinite loop)
  • Allow for the modified binary to be placed into TMP, (runtime cost for each invocation)
  • known hash/location for modification (used by arx for a similar reason: https://github.com/solidsnack/arx)
  • dd/mmap into a memory location and exec into there?
  • allow for a mechanism to override the location of the new binary in some way? out=${COSMO_LOC_HASH-$(command -v "$0")} or similar?
  • something else?

Referencing:

if [ -d /Applications ]; then
dd if="$o" of="$o" bs=8 skip="     351" count="      87" conv=notrunc 2>/dev/null
elif exec 7<> "$o"; then
printf '\177ELF\2\1\1\011\0\0\0\0\0\0\0\0\2\0\076\0\1\0\0\0\076\023\100\000\000\000\000\000\220\010\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0\0\0\0\100\0\070\0\004\000\0\0\000\000\000\000' >&7
exec 7<&-
fi
exec "$0" "$@"
R=$?

btw: thanks for the great work on this project

@tomberek
Copy link
Contributor Author

Bringing the exec "$0" "$@" line into each branch of the previous conditional would allow the R=$? to do it's work. That would stop the infinite loop. It would proceed to the exit $R line further down.

@tomberek
Copy link
Contributor Author

tomberek commented Mar 1, 2021

Or something like this, where

  1. executes ./hello.com
  2. if $COSMO_TEMP/hello does not exist...
  3. ....copy self to $COSMO_TEMP/hello, update to ELF
  4. exec into $COSMO_TEMP/hello

where "hello" can also be a unique identifier or hash to prevent collisions.

@feeley
Copy link

feeley commented Mar 1, 2021

Hashing or using a "unique" identifier is a dangerous practice because if there's a collision it could overwrite an existing executable (with possible security consequences). It is less problematic to derive the name in $COSMO_TEMP from the path of the executable. So if the path is /home/bob/hello.com the file created would be $COSMO_TEMP/home/bob/hello.com .

@alisonatwork
Copy link
Contributor

I guess on a lot of UNIXes the trick will be finding a temp place that is both 1) writeable by users (i.e. not /usr/bin or /bin) and 2) not mounted noexec (i.e. not /var/tmp or /tmp). I think if these binaries were to be distributed via a traditional package manager in a locked-down UNIX environment, the right way would be to have a preinstall step where you execute it once as root (to ELF-ify it) before dropping it into the standard binary location.

To me this project seems more interesting for more "unzip and go" style software distribution, where you download something as a normal user and then just run it from your downloads directory, or put it into bin under your home directory.

@jart
Copy link
Owner

jart commented Mar 1, 2021

I'm willing to merge small improvements to the APE shell script. So long as the change isn't copying the whole executable to /tmp and executing that instead. I like the fact that the current design only requires changing 64-bytes, that it re-execs from the same location, and that subsequent executions happen in a purely native way.

The ideal thing to do would be patching the Linux kernel so that it recognizes the APE format and is able to to load it directly into memory without handing off execution to /bin/sh. In that case, the shell script would serve the purpose of enabling us to continue to support older kernels.

Cosmopolitan provides alternative ways to meet your requirements too. As discussed in another issue you can do two things:

  1. You can ask the APE bootloader to generate ELF binaries by saying make CPPFLAGS=-DSUPPORT_VECTOR=113 which disables Windows + Metal + XNU support. See Compiling Lua #61 (comment)
  2. You can write a fast wrapper program that performs an atomic /tmp copy only for the times when you need it. See Compiling Lua #61 (comment) Having a native program do this is better than a shell script because it means we can use vfork() and copy_file_range() which are less hairy than what a shell script is able to do. Native programs are also able to supply the original path as argv[0] which helps make that operation less disruptive although it unfortunately can't override getauxval(AT_EXECFN).

@jart jart added the contributions welcome We'll commit to review and maintenance if the people who need it write the changes. label Mar 1, 2021
@jart jart changed the title Non self-modifying executable Improve APE shell script Mar 1, 2021
@tomberek
Copy link
Contributor Author

tomberek commented Mar 1, 2021

Would memfd_create help here? I got this running:

@myenv = ();
foreach my $key (keys %ENV) {
    push(@myenv,"$key=$ENV{$key}");
    push(@myenv,0);
}
my $n="ape";
my $p=getppid();
my $pid=$$;

my $filename = "hello.com";
my $size = -s $filename;

$fd = syscall(319,$n,0); die "memfd_create $!" if-1==$n;
$rs = syscall(77,$fd,$size); die "ftruncate $!" if -1==$rs;
$ex = syscall(2,$filename,0); die "open: $!" if -1==$ex;
$rs = syscall(40,$fd,$ex,0,$size); die "sendfile $!" if -1==$rs;
# $rs = syscall(326,$ex,0,$fd,0,$size,0); die "copy_file_range: $!" if -1==$rs;
$rs = syscall(3,$ex,0); die "close: $!" if -1==$rs;
$rs = syscall(8,$fd,0,0); die "lseek $!" if -1==$rs;

my $hdr ="\177ELF\2\1\1\011\0\0\0\0\0\0\0\0\2\0\076\0\1\0\0\0\076\023\100\000\000\000\000\000\220\010\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0\0\0\0\100\0\070\0\004\000\0\0\000\000\000\000";
$rs = syscall(1,$fd,$hdr,64); die ("write $!") if -1==$rs;
$rs = syscall(59,"/proc/$pid/fd/$fd",$ARGV,pack("p*",@myenv)); die ("execve: $!");

This leaves no trace on filesystem. No need for tmpfs or TMP; only requires procfs.

@elimisteve
Copy link
Contributor

elimisteve commented Mar 2, 2021

@tomberek Is the idea that you're making the 64-byte modification to the version of the binary that's just been loaded into RAM, not making that same edit to it on-disk? (Then executing that modified, in-RAM version.)

@tomberek
Copy link
Contributor Author

tomberek commented Mar 2, 2021

@elimisteve This is a mechanism by which you can load a file into memory (it SHOULD be able to do this via copy_file_range, splice, sendfile or some other quick mechanism, i don't know which is best), modify the 64 bytes, then exec into it. It's Perl, so, not ideal.

Without patching the kernel, we'd need to either make the wrapper available somehow, (self extract a binary wrapper for any APE in order to re-use it? extract it each time or check PATH, it can be made super small), or find some other mechanism to make the correct syscalls from the shell context. I'm very likely re-exploring well-trodden ground.

My thought is that something like this can be a fallback if the script detects that the original is not writable, or can't find anywhere else to write. (trying to write it to TMP should be the first fallback? ew....)

@alisonatwork
Copy link
Contributor

If the only solution is to end up writing it into the filesystem, my initial thought on a good location would be to do something similar to Go's UserCacheDir:

https://github.com/golang/go/blob/4c1a7ab49c4c68907bc7f7f7f776edd9116584a5/src/os/file.go#L393-L401

This uses XDG Base Directory Specification variables by default, which feels about as good as you can get on UNIX, especially from the shell. I think it's important not to write it to a shared directory, because you can't guarantee that the user running it can write to that destination, or execute files on the partition. Presumably no sysadmins are cruel enough to mount /home noexec.

This reminds me a bit of Python's pyc files. PEP 3147 has a bit of discussion on alternatives.

jart pushed a commit that referenced this issue Mar 2, 2021
@jart
Copy link
Owner

jart commented Mar 8, 2021

The goal with APE is to overcome arbitrary platform boundaries that create toil for developers. Using memfd_create to worm around an intentional choice to mount ${TMPDIR:-/tmp} as noexec is not the kind of thing we do here. XDG appears to be Systemd thing and it's a great example of the depths I was aiming to avoid in the boot sector. Now that we have a better failure condition, I'm happy to trust the administrator to understand what's happening and then work around it in their preferred manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions welcome We'll commit to review and maintenance if the people who need it write the changes.
Projects
None yet
Development

No branches or pull requests

5 participants