-
Notifications
You must be signed in to change notification settings - Fork 240
Support calling into executor pods directly #2920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f7fcba5 to
ab30996
Compare
|
WDYT @mengqiy - do you want to take this and build on it, or should we go another way? |
| options = append(options, engine.WithGRPCFunctionRuntime(c.ExtraConfig.FunctionRunnerAddress)) | ||
| } else { | ||
| ns := "porch-functions-system" | ||
| options = append(options, engine.WithKubeFunctionRuntime(coreClient, ns)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to read as though the runtimes are mutually exclusive. We still need high performance runtime for cached functions.
(since there appears to be no example of what the Kube function pod looks like, it is hard to tell what expected performance they will have)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are currently mutually exclusive, and I changed this PR so that they aren't enabled by default (I think of it as a first step to start investigating this path).
I can add a "try multiple engines" runtime, but we should try to figure out what we need.
There are some example servers on my experimental branch, and they use something similar to your code, except the functions implement the server directly rather than being invoked as binaries.
Still lots of experiments to be done here, e.g. should we multiplex functions into an image? How long should we run a pod for? Is the performance good enough? My impression is that the performance is good enough, but I'm also (manually) pre-launching all the pods!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimental "multiple-engines" runtime, if we need it: #2937
porch/config/deploy/5-rbac.yaml
Outdated
| - apiGroups: ["flowcontrol.apiserver.k8s.io"] | ||
| resources: ["flowschemas", "prioritylevelconfigurations"] | ||
| verbs: ["get", "watch", "list"] | ||
| # Needed to launch / read executor pods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it is possible to restrict to the porch-functions-system namespace. Do you happen to know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - it is. We would just create a RoleBinding rather than a ClusterRoleBinding. A RoleBinding can bind either to a ClusterRole or a Role, but then only binds in that Namespace.
| klog.Infof("dialing grpc function runner %q", address) | ||
|
|
||
| // TODO: pool connections | ||
| cc, err := grpc.Dial(address, grpc.WithTransportCredentials(insecure.NewCredentials())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the intention for this to work for arbitrary functions?
I understand we can use image rewriting (which will have issues with image signatures later on), including an example of a concrete function would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this works for arbitrary functions - if they implement the GRPC protocol. We could take a non-enlightened function and just overlay your existing function-runner onto it. But technically it's possible that a function might not be safe to run in parallel / multiple times in the same pod. I don't know if we want to require that functions are safe to run in parallel / multiple times (maybe supporting an opt-out mechanism). If we're going to do that, maybe we should just require them to build in GRPC support!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's track this as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working a PoC that has a different flavor of pod evaluator. It doesn't force every KRM function to support gRPC.
It 1) creates pod 2) evaluates function 3) brings it down. More concretely, it does the following:
- Create a pod with 1 regular container (containing the real KRM function) and 1 initContainer (containing a gRPC wrapper server). These 2 containers share the emptyDir volume.
- In initContainer, it copies a wrapper gRPC server binary to the emptyDir volume.
- In the regular container, we run the gPRC server which will invoke the KRM function binary.
I should be able to create a draft PR today.
If suitable function pods are already running, we can call into them directly. This is essentially a proof-of-concept; in practice we will want to launch these pods ourselves / manage them etc.
ab30996 to
3398b29
Compare
|
Split RBAC roles so that we only grant pod permissions on the porch-functions-system namespace |
If suitable function pods are already running, we can call into them
directly.
This is essentially a proof-of-concept; in practice we will want to
launch these pods ourselves / manage them etc.