Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate extproc to dynamic modules when available in Envoy #90

Open
mathetake opened this issue Jan 13, 2025 · 24 comments
Open

Migrate extproc to dynamic modules when available in Envoy #90

mathetake opened this issue Jan 13, 2025 · 24 comments
Assignees

Comments

@mathetake
Copy link
Member

mathetake commented Jan 13, 2025

As discussed in the community meeting (though I was not able to attend), the upcoming Envoy v1.34 (to be released in April) will have support for Dynamic Modules with all the necessary features to rewrite the extproc in dynamic modules. A dynamic module is a shared object that is written in any language implementing the ABI specified in a single pure C header. It's actively being developed since the last summer (see the PRs so far) but almost ready to be used by the real end users like this project.

I am the author of the new feature in Envoy upstream, and am sure that we should adopt it due to multiple reasons:

  • Dynamic module is faster than ExtProc: it's running almost as identical as native C++ filters. For example, It has zero copy in the sense that it has direct access to Envoy buffer (body and header bytes) vs ExtProc needs to send the entire buffer over the network which has the significant cost. This is verified true by the initial benchmark by multiple people.
  • Dynamic module makes the deployment really simpler, which results in both simple control plane code and less hassles on user end e.g. no need to manage extproc by themselves.

In addition, the dynamic module has already been worked on by multiple (large) companies like Netflix, Linkedin, etc, so I believe the possible only concern on "maturity" is not what blocks us from adopting it vs the benefits stated above.

Since the "official" language supported by Envoy initially will be Rust (and I think C++ later), I would recommend to rewrite it in Rust but Go can also be supported with some care, which is not critical, in the sense the code must be written in a way that won't break Go runtime. As long as I write the initial code, simply porting the Go existing code to dynamic module would be fine as well.

On that note, the public configuration API package, currently named "extprocconfig", will be the "migration interface" that will be accepted by both extproc and dynamic module versions, so I will rename it to something like "aifilterconfig" or something like that. edited: rename done in #92

@mathetake
Copy link
Member Author

cc @envoyproxy/ai-gateway-assignable @missBerg @wengyao04 @aabchoo

@mathetake
Copy link
Member Author

since Envoy won't be ready by the end of the month (or the next Envoy v1.33 to be released in a few days), unfortunately we have to use the extproc in the initial version

mathetake added a commit that referenced this issue Jan 13, 2025
Per #90, we will migrate to dynamic modules in the second or third
version of
AI Gateway, the public configuration package must not be coupled with
extproc (see the description in the issue). This renames `extprocconfig`
to more generic (and accurate) name as `filterconfig`.

Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake
Copy link
Member Author

or if it's ok to use the main branch Envoy in our initial release (and possible delay of 1-3 weeks), then we could use the dynamic modules with the initial release. wdyt?

@zirain
Copy link
Member

zirain commented Jan 13, 2025

I think we'd better delay the first release for better compatibility.

@mathetake
Copy link
Member Author

makes sense

@mathetake
Copy link
Member Author

mathetake commented Jan 13, 2025

ok so the plan would be

  • v0.1.0: only with extproc
  • v0.2.0: both extproc and dynamic module, in a swappable way, passing identical tests, defaulting to extproc
  • v0.3.0: both extproc and dynamic module, assuming there's no problem, defaulting to dynamic module
  • v0.4.0: drops extproc support

?

@Krishanx92
Copy link
Contributor

Krishanx92 commented Jan 15, 2025

ok so the plan would be

  • v0.1.0: only with extproc
  • v0.2.0: both extproc and dynamic module, in a swappable way, passing identical tests, defaulting to extproc
  • v0.3.0: both extproc and dynamic module, assuming there's no problem, defaulting to dynamic module
  • v0.4.0: drops extproc support

?

So are you planing to implement dynamic module Rust by converting existing extproc Go codes in 0.2?

@mathetake
Copy link
Member Author

Not sure if I'm using Rust or Go but yes that's my current plan

@missBerg
Copy link
Contributor

I'd suggest using Go, as I believe many engineers involved in this project know Go.

@missBerg
Copy link
Contributor

What would it take to make it Go friendly @mathetake ? ❤

@mathetake
Copy link
Member Author

yeah

@mathetake
Copy link
Member Author

nothing specific needed from envoy side. just that we need to write code carefully on our side as i noted in the issue desc 👍

@yuzisun
Copy link
Contributor

yuzisun commented Jan 16, 2025

@mathetake Is where any doc I can read how dynamic plugins are deployed? just want to understand the benefits we are getting, performance on proxy is usually not a concern for LLMs as the bottleneck is always the inference time.

@mathetake
Copy link
Member Author

mathetake commented Jan 16, 2025

The doc will be worked on in the next few months- but the summary is the module (shared library) will be located inside Envoy container and loaded inside the Envoy process - so from Envoy pov it's simply the native executable.

For us, the only thing we need is to publish "Envoy+AI Dynamic Module" container instead of ExtProc container, which can be built as

FROM envoyproxy/envoy:v1.33.0
COPY path/to/built/ai-dynamic-module.so  /whatever/pre/configured/path/to/ai-dynamic-module.so
ENTRYPOINT ["envoy"]

think about the NGINX modules - this is working exactly the same as their module system

@mathetake
Copy link
Member Author

mathetake commented Jan 16, 2025

then we won't need anything related to extproc related configuration such as Deployment, Service, ConfigMap etc, The equivalent of ConfigMap will be just dealt at Envoy xds configuration layers

@yuzisun
Copy link
Contributor

yuzisun commented Jan 16, 2025

Thanks! the secret credentials mounted on envoy proxy would be accessible by the dynamic module ?

@mathetake
Copy link
Member Author

yes, anything is possible

@wengyao04
Copy link
Contributor

wengyao04 commented Jan 16, 2025

The doc will be worked on in the next few months- but the summary is the module (shared library) will be located inside Envoy container and loaded inside the Envoy process - so from Envoy pov it's simply the native executable.

For us, the only thing we need is to publish "Envoy+AI Dynamic Module" container instead of ExtProc container, which can be built as

FROM envoyproxy/envoy:v1.33.0
COPY path/to/built/ai-dynamic-module.so /whatever/pre/configured/path/to/ai-dynamic-module.so
ENTRYPOINT ["envoy"]
think about the NGINX modules - this is working exactly the same as their module system

If we build the .so in the envoy image, then we have to restart envoy pod to load the new ai-dynamic-module.so. Our concern is that the development cycle would be very fast to meet AI requirements. Then the end users need to restart their gateway in the production environment to pick up such change frequently.

Is that possible to adopt an approach similar to how istio handles wasm plugin. For example, if the .so in some could be stored in an OCI image and the ai-extpro-controller would pull the new oci image and inject into the envoy dynamically ?

@mathetake
Copy link
Member Author

Unfortunately as long as Go is used it’s not technically possible to hot reload since the Go runtime forbids multiple shared libraries loaded in a process (once one shared library is loaded then it’s even impossible to unload lol). So the answers depends on which language we use (Go or Rust).

@wengyao04
Copy link
Contributor

Unfortunately as long as Go is used it’s not technically possible to hot reload since the Go runtime forbids multiple shared libraries loaded in a process (once one shared library is loaded then it’s even impossible to unload lol). So the answers depends on which language we use (Go or Rust).

Then could we keep both if we decide to write it in GO

@mathetake
Copy link
Member Author

See golang/go#11100 and golang/go#65050

@missBerg
Copy link
Contributor

The more I've been learning about Rust now, Rust seems like a very solid language 👀

@mathetake
Copy link
Member Author

mathetake commented Jan 16, 2025

yeah solid in particular when being used as a shared library compared to Go. The Go runtime has a global side effect in the loaded process, e.g. installing their own signal handler etc, which results in the aforementioned limitations like golang/go#11100 and golang/go#65050. That's the reason why Envoy will not officially have a Go SDK (vs Rust) because it comes with the limitation (of the code that can be written) and uncertainty.

So either works for me and less work for sure when using Go initially, but not sure if that's the best option in the long run

@mathetake mathetake self-assigned this Jan 17, 2025
@mathetake
Copy link
Member Author

Per the discussion yesterday, I think we are leaning a bit towards Rust considering the limitations and bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants