Skip to content

Conversation

@justinsb
Copy link
Contributor

If suitable function pods are already running, we can call into them
directly.

This is essentially a proof-of-concept; in practice we will want to
launch these pods ourselves / manage them etc.

@justinsb justinsb requested review from martinmaly and mengqiy March 20, 2022 22:36
@justinsb justinsb force-pushed the execute_from_pods branch 4 times, most recently from f7fcba5 to ab30996 Compare March 22, 2022 12:09
@justinsb
Copy link
Contributor Author

WDYT @mengqiy - do you want to take this and build on it, or should we go another way?

options = append(options, engine.WithGRPCFunctionRuntime(c.ExtraConfig.FunctionRunnerAddress))
} else {
ns := "porch-functions-system"
options = append(options, engine.WithKubeFunctionRuntime(coreClient, ns))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to read as though the runtimes are mutually exclusive. We still need high performance runtime for cached functions.

(since there appears to be no example of what the Kube function pod looks like, it is hard to tell what expected performance they will have)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are currently mutually exclusive, and I changed this PR so that they aren't enabled by default (I think of it as a first step to start investigating this path).

I can add a "try multiple engines" runtime, but we should try to figure out what we need.

There are some example servers on my experimental branch, and they use something similar to your code, except the functions implement the server directly rather than being invoked as binaries.

Still lots of experiments to be done here, e.g. should we multiplex functions into an image? How long should we run a pod for? Is the performance good enough? My impression is that the performance is good enough, but I'm also (manually) pre-launching all the pods!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimental "multiple-engines" runtime, if we need it: #2937

- apiGroups: ["flowcontrol.apiserver.k8s.io"]
resources: ["flowschemas", "prioritylevelconfigurations"]
verbs: ["get", "watch", "list"]
# Needed to launch / read executor pods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is possible to restrict to the porch-functions-system namespace. Do you happen to know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - it is. We would just create a RoleBinding rather than a ClusterRoleBinding. A RoleBinding can bind either to a ClusterRole or a Role, but then only binds in that Namespace.

klog.Infof("dialing grpc function runner %q", address)

// TODO: pool connections
cc, err := grpc.Dial(address, grpc.WithTransportCredentials(insecure.NewCredentials()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the intention for this to work for arbitrary functions?
I understand we can use image rewriting (which will have issues with image signatures later on), including an example of a concrete function would help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this works for arbitrary functions - if they implement the GRPC protocol. We could take a non-enlightened function and just overlay your existing function-runner onto it. But technically it's possible that a function might not be safe to run in parallel / multiple times in the same pod. I don't know if we want to require that functions are safe to run in parallel / multiple times (maybe supporting an opt-out mechanism). If we're going to do that, maybe we should just require them to build in GRPC support!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's track this as a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working a PoC that has a different flavor of pod evaluator. It doesn't force every KRM function to support gRPC.
It 1) creates pod 2) evaluates function 3) brings it down. More concretely, it does the following:

  • Create a pod with 1 regular container (containing the real KRM function) and 1 initContainer (containing a gRPC wrapper server). These 2 containers share the emptyDir volume.
  • In initContainer, it copies a wrapper gRPC server binary to the emptyDir volume.
  • In the regular container, we run the gPRC server which will invoke the KRM function binary.

I should be able to create a draft PR today.

If suitable function pods are already running, we can call into them
directly.

This is essentially a proof-of-concept; in practice we will want to
launch these pods ourselves / manage them etc.
@justinsb justinsb force-pushed the execute_from_pods branch from ab30996 to 3398b29 Compare March 25, 2022 13:34
@justinsb
Copy link
Contributor Author

Split RBAC roles so that we only grant pod permissions on the porch-functions-system namespace

@mengqiy mengqiy closed this Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants