PR into #488#675
Conversation
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
| Informer { | ||
| client, | ||
| run_id, | ||
| action_cache: Arc::new(RwLock::new(HashMap::new())), |
| # Extract flyteidl2 version from root pyproject.toml | ||
| ROOT_VER=$(grep 'flyteidl2==' pyproject.toml | head -1 | sed 's/.*flyteidl2==\([^"]*\).*/\1/') | ||
| echo "Root pyproject.toml: flyteidl2==$ROOT_VER" | ||
|
|
||
| # Extract flyteidl2 version from rs_controller/Cargo.toml | ||
| CARGO_VER=$(grep 'flyteidl2' rs_controller/Cargo.toml | grep -v '^#' | sed 's/.*"=\(.*\)".*/\1/') | ||
| echo "rs_controller/Cargo.toml: flyteidl2=$CARGO_VER" | ||
|
|
||
| # Extract flyteidl2 version from rs_controller/pyproject.toml | ||
| RS_VER=$(grep 'flyteidl2==' rs_controller/pyproject.toml | head -1 | sed 's/.*flyteidl2==\([^"]*\).*/\1/') | ||
| echo "rs_controller/pyproject.toml: flyteidl2==$RS_VER" | ||
|
|
||
| # Compare all three | ||
| if [ "$ROOT_VER" != "$CARGO_VER" ] || [ "$ROOT_VER" != "$RS_VER" ]; then | ||
| echo "ERROR: flyteidl2 versions do not match!" | ||
| echo " pyproject.toml: $ROOT_VER" | ||
| echo " rs_controller/Cargo.toml: $CARGO_VER" | ||
| echo " rs_controller/pyproject.toml: $RS_VER" | ||
| exit 1 | ||
| fi | ||
| echo "All flyteidl2 versions match: $ROOT_VER" |
There was a problem hiding this comment.
nit: I think this check can break if the format in pyproject.toml/Cargo.toml changed (e.g. flyteidl2==1.0.0 to "flyteidl2==1.0.0")? Maybe it's better to parse with python tomllib?
There was a problem hiding this comment.
yes probably, haven't really gone through these GH actions. i think we can merge this PR though - GH actions and devex ergonomics will take a bunch more additional work I think.
There was a problem hiding this comment.
The changes here seem to be for local testing? Should we remove it here?
There was a problem hiding this comment.
can delete yeah. or delete in the main PR since it might still be useful for testing.
| pub async fn remove_action(&self, action_name: &str) -> Option<Action> { | ||
| let mut cache = self.action_cache.write().await; | ||
| let dropped_action = cache.remove(action_name); | ||
| self.completion_events.write().await.remove(action_name); | ||
| debug!("Removed action and completion event for {}", action_name); | ||
| dropped_action |
There was a problem hiding this comment.
It would be better to release the lock before acquiring the other, like so:
pub async fn remove_action(&self, action_name: &str) -> Option<Action> {
let dropped_action = {
let mut cache = self.action_cache.write().await;
cache.remove(action_name)
};
{
let mut events = self.completion_events.write().await;
events.remove(action_name);
}
debug!("Removed action and completion event for {}", action_name);
dropped_action
}There was a problem hiding this comment.
yes, yes it would!
…protos Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
|
Add few fixes to ensure happy path work and improve user experience:
Things left for follow-ups, will need to port those changes from python controller to rust side:
|
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
|
Test with following case: With Rust controller
❯ _F_USE_RUST_CONTROLLER=1 python examples/basics/hello_v2.py
> Building 1 image...
> Building image flyte for environment hello_v2
i Image ghcr.io/flyteorg/flyte:6d40e1e2f731bdec86597ba82c7a5d0e already exists, skipping build
✓ Built image for environment hello_v2: ghcr.io/flyteorg/flyte:6d40e1e2f731bdec86597ba82c7a5d0e
> Bundling code...
✓ Code bundle: 1 files, 0.009765625 MB (compressed 0.0007238388061523438 MB)
> Uploading code bundle...
u26prlh7jr4bjnrtk77b
https://dogfood-gcp.cloud-staging.union.ai/v2/domain/development/project/flytesnacks/runs/u26prlh7jr4bjnrtk77b
Run 'u26prlh7jr4bjnrtk77b' completed successfully.
With Python controller
❯ python examples/basics/hello_v2.py
> Building 1 image...
> Building image flyte for environment hello_v2
i Image ghcr.io/flyteorg/flyte:0169cc9f783a34a9fdbe88a711239166 already exists, skipping build
✓ Built image for environment hello_v2: ghcr.io/flyteorg/flyte:0169cc9f783a34a9fdbe88a711239166
> Bundling code...
✓ Code bundle: 1 files, 0.009765625 MB (compressed 0.0007238388061523438 MB)
> Uploading code bundle...
ucgp6ht9xvzr562fv2jg
https://dogfood-gcp.cloud-staging.union.ai/v2/domain/development/project/flytesnacks/runs/ucgp6ht9xvzr562fv2jg
Run 'ucgp6ht9xvzr562fv2jg' completed successfully.
❯ python examples/basics/hello_v2.py
> Building 1 image...
> Building image flyte for environment hello_v2
i Image flyte:0169cc9f783a34a9fdbe88a711239166 was not found or has expired
> Image ghcr.io/flyteorg/flyte:0169cc9f783a34a9fdbe88a711239166 not found, building...
> Submitting a new build...
> Started build at: https://dogfood-gcp.cloud-staging.union.ai/v2/domain/development/project/flytesnacks/runs/ux75ffhvtmzpmsx2vhdx
> Waiting for build to finish
✓ Build completed in 0:00:33
✓ Built image for environment hello_v2: us-docker.pkg.dev/dogfood-gcp-dataplane/orgs/dogfood-gcp/flyte:0169cc9f783a34a9fdbe88a711239166
> Bundling code...
✓ Code bundle: 1 files, 0.009765625 MB (compressed 0.0007238388061523438 MB)
> Uploading code bundle...
uhhcn4gcd47xtkcxnd48
https://dogfood-gcp.cloud-staging.union.ai/v2/domain/development/project/flytesnacks/runs/uhhcn4gcd47xtkcxnd48
Run 'uhhcn4gcd47xtkcxnd48' completed successfully.Script for testing, modified from import asyncio
import flyte
env = flyte.TaskEnvironment(
name="hello_v2",
)
@env.task()
async def hello_worker(id: int) -> str:
import os
import sys
print(f"[CHECK] _F_USE_RUST_CONTROLLER = {os.environ.get('_F_USE_RUST_CONTROLLER')!r}", file=sys.stderr, flush=True)
print(f"[CHECK] _U_USE_ACTIONS = {os.environ.get('_U_USE_ACTIONS')!r}", file=sys.stderr, flush=True)
try:
import flyte_controller_base
print(f"[CHECK] flyte_controller_base: YES ({flyte_controller_base.__file__})", file=sys.stderr, flush=True)
except ImportError as e:
print(f"[CHECK] flyte_controller_base: NO ({e})", file=sys.stderr, flush=True)
ctx = flyte.ctx()
assert ctx is not None
return f"hello, my id is: {id} and I am being run by Action: {ctx.action}"
@env.task()
async def hello_driver(ids: list[int] = [1, 2, 3]) -> list[str]:
coros = []
with flyte.group("fanout-group"):
for id in ids:
coros.append(hello_worker(id))
vals = await asyncio.gather(*coros)
return vals
if __name__ == "__main__":
flyte.init_from_config()
run = flyte.run(hello_driver)
print(run.name)
print(run.url)
run.wait()
`` |
Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
Fail cases without rust controller
Push image without rust controller and add in the script, for example: env = flyte.TaskEnvironment(
name="hello_v2",
image="ghcr.io/machichima/flyte:py3.12-v2.2.4.dev33-g8d2773300.d20260508",
)❯ _F_USE_RUST_CONTROLLER=1 python examples/basics/hello_v2.py
> Bundling code...
✓ Code bundle: 1 files, 0.009765625 MB (compressed 0.000606536865234375 MB)
> Uploading code bundle...
uh9mdhxtm7mqnjmcfctj
https://dogfood-gcp.cloud-staging.union.ai/v2/domain/development/project/flytesnacks/runs/uh9mdhxtm7mqnjmcfctj
Run 'uh9mdhxtm7mqnjmcfctj' exited unsuccessfully in state ActionPhase.FAILED with error:
terminated with exit code (1). Reason [Error]. Message:
Flyte runtime started for action a0 with run name uh9mdhxtm7mqnjmcfctj
File "/opt/venv/bin/a0", line 10, in <module>
sys.exit(main())
^^^^^^
File "/opt/venv/lib/python3.12/site-packages/click/core.py", line 1514, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/click/core.py", line 1435, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/click/core.py", line 1298, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/click/core.py", line 853, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/flyte/_bin/runtime.py", line 135, in main
raise RuntimeError(
Filtered traceback (most recent call last):
RuntimeError: _F_USE_RUST_CONTROLLER=1 was set but `flyte_controller_base` is not installed. Install it with `pip
install flyte`. For development, run `make dev-rs-dist` from the repo root.
.
❯ _F_USE_RUST_CONTROLLER=1 python examples/basics/hello_v2.py
> Building 1 image...
> Building image flyte for environment hello_v2
> Image ghcr.io/flyteorg/flyte:51eb4be6859c7d6a1ed8e7d3252ab2e1 not found, building...
11:09:07.981750 WARNING docker_builder.py:723 - Temporary directory:
/var/folders/27/52rf9wx95tvfqzcw0t_x7qqc0000gn/T/tmp31i0l67h
Run command: docker buildx build --builder flytex --tag ghcr.io/flyteorg/flyte:51eb4be6859c7d6a1ed8e7d3252ab2e1 --platform linux/amd64,linux/arm64 --push /var/folders/27/52rf9wx95tvfqzcw0t_x7qqc0000gn/T/tmp31i0l67h
[+] Building 5.0s (30/35) docker-container:flytex
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 2.45kB 0.0s
=> resolve image config for docker-image://docker.io/docker/dockerfile:1.10 0.7s
=> CACHED docker-image://docker.io/docker/dockerfile:1.10@sha256:865e5dd094beca432e8c0a1d5e1c465db5f998dca4e439 0.0s
=> => resolve docker.io/docker/dockerfile:1.10@sha256:865e5dd094beca432e8c0a1d5e1c465db5f998dca4e439981029b3b81 0.0s
=> [linux/amd64 internal] load metadata for docker.io/library/python:3.12-slim-bookworm 0.3s
=> [linux/arm64 internal] load metadata for ghcr.io/astral-sh/uv:0.8.13 1.0s
=> [linux/amd64 internal] load metadata for ghcr.io/astral-sh/uv:0.8.13 1.0s
=> [linux/arm64 internal] load metadata for docker.io/library/python:3.12-slim-bookworm 0.6s
=> [auth] astral-sh/uv:pull token for ghcr.io 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [linux/amd64 uv 1/1] FROM ghcr.io/astral-sh/uv:0.8.13@sha256:4de5495181a281bc744845b9579acf7b221d6791f99bcc2 0.0s
=> => resolve ghcr.io/astral-sh/uv:0.8.13@sha256:4de5495181a281bc744845b9579acf7b221d6791f99bcc211b9ec13f417c28 0.0s
=> [linux/arm64 uv 1/1] FROM ghcr.io/astral-sh/uv:0.8.13@sha256:4de5495181a281bc744845b9579acf7b221d6791f99bcc2 0.0s
=> => resolve ghcr.io/astral-sh/uv:0.8.13@sha256:4de5495181a281bc744845b9579acf7b221d6791f99bcc211b9ec13f417c28 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 666.07kB 0.0s
=> [linux/amd64 stage-1 1/11] FROM docker.io/library/python:3.12-slim-bookworm@sha256:58525e1a8dada8e72d6f8a11 0.0s
=> => resolve docker.io/library/python:3.12-slim-bookworm@sha256:58525e1a8dada8e72d6f8a11a0ddff8d981fd888549108 0.0s
=> CACHED [internal] setting cache mount permissions 0.0s
=> [linux/arm64 stage-1 1/11] FROM docker.io/library/python:3.12-slim-bookworm@sha256:58525e1a8dada8e72d6f8a11 0.0s
=> => resolve docker.io/library/python:3.12-slim-bookworm@sha256:58525e1a8dada8e72d6f8a11a0ddff8d981fd888549108 0.0s
=> CACHED [linux/arm64 stage-1 2/11] COPY --from=uv /uv /usr/bin/uv 0.0s
=> CACHED [linux/arm64 stage-1 3/11] RUN if [ ! -f "/opt/venv/bin/python" ]; then uv venv /opt/venv --p 0.0s
=> CACHED [linux/arm64 stage-1 4/11] RUN if ! id -u flyte >/dev/null 2>&1; then useradd --create-home --shell 0.0s
=> CACHED [linux/arm64 stage-1 5/11] WORKDIR /home/flyte 0.0s
=> CACHED [linux/arm64 stage-1 6/11] RUN --mount=type=cache,sharing=locked,mode=0777,target=/var/cache/apt,id= 0.0s
=> CACHED [linux/amd64 stage-1 2/11] COPY --from=uv /uv /usr/bin/uv 0.0s
=> CACHED [linux/amd64 stage-1 3/11] RUN if [ ! -f "/opt/venv/bin/python" ]; then uv venv /opt/venv --p 0.0s
=> CACHED [linux/amd64 stage-1 4/11] RUN if ! id -u flyte >/dev/null 2>&1; then useradd --create-home --shell 0.0s
=> CACHED [linux/amd64 stage-1 5/11] WORKDIR /home/flyte 0.0s
=> CACHED [linux/amd64 stage-1 6/11] RUN --mount=type=cache,sharing=locked,mode=0777,target=/var/cache/apt,id= 0.0s
=> [linux/arm64 stage-1 7/11] RUN --mount=type=cache,sharing=locked,mode=0777,target=/root/.cache/uv,id=wheel 0.3s
=> [linux/amd64 stage-1 7/11] RUN --mount=type=cache,sharing=locked,mode=0777,target=/root/.cache/uv,id=wheel 1.0s
=> CANCELED [linux/arm64 stage-1 8/11] RUN --mount=type=cache,sharing=locked,mode=0777,target=/root/.cache/uv, 2.5s
=> [linux/amd64 stage-1 8/11] RUN --mount=type=cache,sharing=locked,mode=0777,target=/root/.cache/uv,id=wheel 1.5s
=> ERROR [linux/amd64 stage-1 9/11] RUN --mount=type=cache,sharing=locked,mode=0777,target=/root/.cache/uv,id= 0.3s
------
> [linux/amd64 stage-1 9/11] RUN --mount=type=cache,sharing=locked,mode=0777,target=/root/.cache/uv,id=wheel --mount=source=/dist,target=/dist,type=bind uv pip install --python /opt/venv/bin/python --find-links /dist --no-deps --no-index --reinstall flyte_controller_base:
0.246 Using Python 3.12.13 environment at: /opt/venv
0.292 × No solution found when resolving dependencies:
0.293 ╰─▶ Because flyte-controller-base was not found in the provided package
0.293 locations and you require flyte-controller-base, we can conclude that
0.293 your requirements are unsatisfiable.
------
ERROR: failed to build: failed to solve: process "/bin/sh -c uv pip install --python $UV_PYTHON --find-links /dist --no-deps --no-index --reinstall flyte_controller_base" did not complete successfully: exit code: 1 |
…protos Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: machichima <nary12321@gmail.com>
The first version for rust controller. Following README "Developing the Rust controller" section for development guide. To run examples with rust controller, add `_F_USE_RUST_CONTROLLER=1` env var (e.g. `_F_USE_RUST_CONTROLLER=1 python examples/basics/hello_v2.py`) More changes described in: - #675 Tested following cases: - #675 (comment) - local / remote image builder "without" rust controller enabled - local image builder "with" rust controller enabled (remote image build not supported for now) - #675 (comment) - error raised when enable rust controller but without its wheel in the image - More TODOs listed in: #675 (comment) - Note: we make rust controller (`flyte_controller_base`) an optional dependency. `pip install flyte` does not install it automatically - To use the Rust controller, install with the extra: `pip install flyte[rust-controller]`. - If `_F_USE_RUST_CONTROLLER=1` is set without the extra, image build (default image) or runtime (custom image) will fail fast with a clear error pointing to the install command (see #675 (comment)) --------- Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: machichima <nary12321@gmail.com> Signed-off-by: Sergey Vilgelm <sergey@union.ai> Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com> Co-authored-by: Sergey Vilgelm <523825+SVilgelm@users.noreply.github.com> Co-authored-by: machichima <nary12321@gmail.com>
The first version for rust controller. Following README "Developing the Rust controller" section for development guide. To run examples with rust controller, add `_F_USE_RUST_CONTROLLER=1` env var (e.g. `_F_USE_RUST_CONTROLLER=1 python examples/basics/hello_v2.py`) More changes described in: - #675 Tested following cases: - #675 (comment) - local / remote image builder "without" rust controller enabled - local image builder "with" rust controller enabled (remote image build not supported for now) - #675 (comment) - error raised when enable rust controller but without its wheel in the image - More TODOs listed in: #675 (comment) - Note: we make rust controller (`flyte_controller_base`) an optional dependency. `pip install flyte` does not install it automatically - To use the Rust controller, install with the extra: `pip install flyte[rust-controller]`. - If `_F_USE_RUST_CONTROLLER=1` is set without the extra, image build (default image) or runtime (custom image) will fail fast with a clear error pointing to the install command (see #675 (comment)) --------- Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: machichima <nary12321@gmail.com> Signed-off-by: Sergey Vilgelm <sergey@union.ai> Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com> Co-authored-by: Sergey Vilgelm <523825+SVilgelm@users.noreply.github.com> Co-authored-by: machichima <nary12321@gmail.com> Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
The first version for rust controller. Following README "Developing the Rust controller" section for development guide. To run examples with rust controller, add `_F_USE_RUST_CONTROLLER=1` env var (e.g. `_F_USE_RUST_CONTROLLER=1 python examples/basics/hello_v2.py`) More changes described in: - #675 Tested following cases: - #675 (comment) - local / remote image builder "without" rust controller enabled - local image builder "with" rust controller enabled (remote image build not supported for now) - #675 (comment) - error raised when enable rust controller but without its wheel in the image - More TODOs listed in: #675 (comment) - Note: we make rust controller (`flyte_controller_base`) an optional dependency. `pip install flyte` does not install it automatically - To use the Rust controller, install with the extra: `pip install flyte[rust-controller]`. - If `_F_USE_RUST_CONTROLLER=1` is set without the extra, image build (default image) or runtime (custom image) will fail fast with a clear error pointing to the install command (see #675 (comment)) --------- Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: machichima <nary12321@gmail.com> Signed-off-by: Sergey Vilgelm <sergey@union.ai> Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com> Co-authored-by: Sergey Vilgelm <523825+SVilgelm@users.noreply.github.com> Co-authored-by: machichima <nary12321@gmail.com> Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Bump flyteidl2 version to latest and get rid of the manually copied code
Fix the pem issue where we had to manually download the cert from amazon before creating the channel (with tonic connections you have to call
with_native_roots()to pick up root certs).Fix maturin setting so it'll build.
Pull in some changes that have been made to the main controller. (Fixes a potential subtle race condition in call-seq-generation #607, ActionAbortedError is raised #521, Introducing TUI: Flyte local runs #621)
Add in some CI as a separate PR - PR into #675 #676. But we need to play with this experience a bit more first. Need to go through the whole experience and see how things feel.
More changes described in PR into #488 #675 (comment)