Feature request description
The current inference engine spec assumes the container entry point is a shell, the "binary" field is the command line to be run by the shell.
However in many cases (e.g the upstream llama.cpp images) the entrypoint of the container image is the binary to run.
The inference engine spec should allow both scenarios to be defined.
Suggest potential solution
No response
Have you considered any alternatives?
No response
Additional context
No response