diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e06db538c7c..9d0e09cc1e9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -19,9 +19,10 @@ Before contributing, ensure you have: - [Buf CLI](https://buf.build/docs/installation) installed - Go 1.24.6 or later - Node.js and npm (for TypeScript) -- Python 3.9+ with `uv` package manager +- Python 3.10+ with `uv` package manager - Rust toolchain (if working with Rust bindings) - Git configured with your name and email +- Docker (for building and running the devbox image) ### Setting Up Your Environment @@ -44,6 +45,87 @@ Before contributing, ensure you have: make gen ``` +## Running Flyte Locally + +The fastest way to run a full Flyte stack on your machine is the bundled **devbox** — a k3d-based Kubernetes cluster with all dependencies (TaskAction CRD, Knative, PostgreSQL, etc.) pre-installed — combined with a locally-running `flyte-manager` binary. + +### Start the Flyte Devbox + +From the repo root: + +```bash +# Build the devbox image (first time only, or after Dockerfile changes) +make devbox-build + +# Start the devbox cluster in dev mode (required for running the manager locally) +make devbox-run FLYTE_DEV=true + +# Stop the devbox when you're done +make devbox-stop +``` + +`FLYTE_DEV=true` is required when you intend to run the manager locally — it disables the in-cluster manager so your local process can take over. `make devbox-run` writes a kubeconfig pointing at the devbox cluster in global kubeconfig, so `kubectl` will target it automatically. + +### Build and Run the Manager + +With the devbox running, start the manager locally: + +```bash +# From the repo root +make -C manager run + +# Or from manager/ +make run + +# Or build and run the binary directly +cd manager +make build +./bin/flyte-manager --config config.yaml +``` + +The manager will: +1. Connect to PostgreSQL and run database migrations +2. Start all services in parallel goroutines +3. Connect to your Kubernetes cluster +4. Begin reconciling TaskAction CRs + +### Configuration + +Edit `manager/config.yaml`: + +```yaml +manager: + # Single server port hosting all Connect services (Runs, Actions, DataProxy, Events, Cache, Secret, App). + server: + host: "0.0.0.0" + port: 8090 + + executor: + healthProbePort: 8081 + + kubernetes: + namespace: "flyte" + # Optional: specify custom kubeconfig path + # kubeconfig: "/path/to/kubeconfig" + +runs: + storagePrefix: "s3://flyte-data" + database: + postgres: + host: "localhost" + port: 30001 + dbname: "runs" + username: "postgres" + password: "postgres" + options: "sslmode=disable" + +logger: + level: 4 # Info level + show-source: true +``` + +See [`manager/README.md`](manager/README.md) for the full architecture, API endpoints, and troubleshooting tips. + ## Development Workflow ### Creating a Feature Branch diff --git a/manager/README.md b/manager/README.md index 4a6bd1a2326..a702c5174c4 100644 --- a/manager/README.md +++ b/manager/README.md @@ -2,132 +2,39 @@ The Flyte Manager is a unified binary that runs all Flyte services in a single process: -- **Runs Service** (port 8090) - Manages workflow runs and action state -- **Queue Service** (port 8089) - Creates and manages TaskAction CRs in Kubernetes -- **Executor/Operator** (port 8081 health) - Reconciles TaskAction CRs and transitions them through states +- **Runs Service** - Manages workflow runs and action state +- **Executor/Operator** - Reconciles and transitions TaskAction CRs through their lifecycle +- **Actions Service** - Serves action metadata and lifecycle APIs, including enqueueing TaskAction CRs +- **DataProxy Service** - Proxies signed-URL and blob access for task I/O +- **Events Service** - Ingests and fans out task/run events +- **Cache Service** - Backs task output caching and lookups +- **App Service** (+ internal proxy) - Hosts the Flyte UI/app and routes to internal services +- **Secret Service** - Manages secret references used by tasks ## Features ✅ **Single Binary** - One process to deploy and manage -✅ **Single SQLite Database** - All data in one file +✅ **PostgreSQL Backend** - Shared database for all services ✅ **Auto Kubernetes Detection** - Uses current kubeconfig ✅ **Unified Configuration** - One config file for all services ✅ **HTTP/2 Support** - Buf Connect compatible -## Quick Start - -### Prerequisites - -1. **Kubernetes cluster** (k3d, kind, minikube, or any cluster) -2. **Go 1.24 or later** -3. **TaskAction CRD** installed in the cluster -4. **Kubeconfig** configured (or running in-cluster) - -### Install TaskAction CRD - -```bash -kubectl apply -f ../executor/config/crd/bases/flyte.org_taskactions.yaml -``` - -### Build and Run - -```bash -# Build the binary -make build - -# Run the manager -make run - -# Or run directly -./bin/flyte-manager --config config.yaml -``` - -The manager will: -1. Initialize a SQLite database (`flyte.db`) -2. Run database migrations -3. Start all three services in parallel goroutines -4. Connect to your Kubernetes cluster -5. Begin reconciling TaskAction CRs - -## Configuration - -Edit `config.yaml`: - -```yaml -manager: - runsService: - host: "0.0.0.0" - port: 8090 - - queueService: - host: "0.0.0.0" - port: 8089 - - executor: - healthProbePort: 8081 - - kubernetes: - namespace: "flyte" - # Optional: specify custom kubeconfig path - # kubeconfig: "/path/to/kubeconfig" - -database: - type: "sqlite" - sqlite: - file: "flyte.db" - -logger: - level: 4 # Info level - show-source: true -``` - -## Architecture - -``` -┌─────────────────────────────────────────────────────────┐ -│ Flyte Manager Process │ -├─────────────────────────────────────────────────────────┤ -│ │ -│ ┌──────────────┐ ┌───────────────┐ ┌──────────────┐ │ -│ │ Runs Service │ │ Queue Service │ │ Executor │ │ -│ │ :8090 │ │ :8089 │ │ │ │ -│ │ │ │ │ │ Reconciles │ │ -│ │ - RunService │ │ Creates K8s │ │ TaskActions │ │ -│ │ - StateServ. │ │ TaskAction CRs│ │ │ │ -│ └──────┬───────┘ └───────┬───────┘ └──────┬───────┘ │ -│ │ │ │ │ -│ └──────────────────┴──────────────────┘ │ -│ │ │ -│ ┌────────┴────────┐ │ -│ │ SQLite DB │ │ -│ │ flyte.db │ │ -│ └─────────────────┘ │ -└─────────────────────────────────────────────────────────┘ - │ - ↓ - Kubernetes Cluster - (TaskAction CRs) -``` - ## API Endpoints -### Runs Service (port 8090) +### Manager (port 8090) -- `POST /flyteidl2.workflow.RunService/CreateRun` - Create a new run -- `POST /flyteidl2.workflow.RunService/GetRun` - Get run details -- `POST /flyteidl2.workflow.RunService/ListRuns` - List runs -- `POST /flyteidl2.workflow.RunService/AbortRun` - Abort a run -- `POST /flyteidl2.workflow.StateService/Put` - Update action state -- `POST /flyteidl2.workflow.StateService/Get` - Get action state -- `POST /flyteidl2.workflow.StateService/Watch` - Watch state updates -- `GET /healthz` - Health check -- `GET /readyz` - Readiness check - -### Queue Service (port 8089) +All Connect/gRPC services are mounted on a single port. Notable handlers: -- `POST /flyteidl2.workflow.QueueService/EnqueueAction` - Create TaskAction CR -- `POST /flyteidl2.workflow.QueueService/AbortQueuedRun` - Delete root TaskAction -- `POST /flyteidl2.workflow.QueueService/AbortQueuedAction` - Delete specific TaskAction +- `flyteidl2.workflow.RunService` - Create / Get / List / Abort runs +- `flyteidl2.workflow.InternalRunService` - Internal run-control APIs used by the executor +- `flyteidl2.workflow.TranslatorService` - Translates user task definitions +- `flyteidl2.workflow.RunLogsService` - Stream logs for a run +- `flyteidl2.actions.ActionsService` - Action lifecycle and metadata +- `flyteidl2.task.TaskService` - Task registration and lookup +- `flyteidl2.trigger.TriggerService` - Schedules and triggers +- `flyteidl2.project.ProjectService` - Project management +- `flyteidl2.auth.IdentityService` / `AuthMetadataService` - Identity and auth metadata +- DataProxy, Events, Cache, Secret, and App services (see their respective packages) - `GET /healthz` - Health check - `GET /readyz` - Readiness check @@ -138,26 +45,20 @@ logger: ## How It Works -1. **CreateRun** → Runs Service persists run to SQLite DB -2. **CreateRun** → Runs Service calls Queue Service to enqueue root action -3. **EnqueueAction** → Queue Service creates TaskAction CR in Kubernetes -4. **Executor** → Watches TaskAction CRs and reconciles them -5. **Executor** → Transitions: Queued → Initializing → Running → Succeeded -6. **Executor** → Calls State Service Put() on each transition -7. **State Service** → Persists state updates to SQLite DB -8. **State Service** → Notifies watchers of state changes +1. **CreateRun** → Runs Service persists the run to PostgreSQL and calls `ActionsService.Enqueue(...)` to enqueue the root action +2. **Actions Service / Executor** → That enqueue flow results in the root TaskAction CR being created in Kubernetes, which the Executor then watches and reconciles +3. **Executor** → Transitions: Queued → Initializing → Running → Succeeded +4. **Actions Service** → Watches TaskAction CRs via a shared informer and forwards status updates (phase, output URI, error state) to subscribers; sdk controller consumes these updates through `WatchForUpdates` to drive the run forward +5. **Runs Service** → Persists state changes to PostgreSQL and notifies its own watchers ## Testing ### Check Services ```bash -# Runs Service health +# Manager (Connect services) health curl http://localhost:8090/healthz -# Queue Service health -curl http://localhost:8089/healthz - # Executor health curl http://localhost:8081/healthz ``` @@ -178,11 +79,11 @@ kubectl describe taskaction -n flyte ### Check Database ```bash -# Open SQLite database -sqlite3 flyte.db +# Connect to the PostgreSQL backend (devbox defaults) +psql -h localhost -p 30001 -U postgres -d runs # List tables -.tables +\dt # Query runs SELECT * FROM runs; @@ -199,55 +100,6 @@ SELECT name, phase, state FROM actions; make run ``` -### Docker - -```dockerfile -FROM golang:1.21 AS builder -WORKDIR /app -COPY . . -RUN cd manager && make build - -FROM alpine:latest -RUN apk --no-cache add ca-certificates sqlite -WORKDIR /root/ -COPY --from=builder /app/manager/bin/flyte-manager . -COPY --from=builder /app/manager/config.yaml . -CMD ["./flyte-manager", "--config", "config.yaml"] -``` - -### Kubernetes - -Deploy as a single pod with access to the Kubernetes API: - -```yaml -apiVersion: apps/v1 -kind: Deployment -metadata: - name: flyte-manager - namespace: flyte -spec: - replicas: 1 - selector: - matchLabels: - app: flyte-manager - template: - metadata: - labels: - app: flyte-manager - spec: - serviceAccountName: flyte-manager - containers: - - name: manager - image: flyte-manager:latest - ports: - - containerPort: 8090 - name: runs - - containerPort: 8089 - name: queue - - containerPort: 8081 - name: health -``` - ## Troubleshooting ### Connection Issues @@ -280,12 +132,11 @@ manager: ```bash # Check what's using the ports lsof -i :8090 -lsof -i :8089 lsof -i :8081 # Change ports in config.yaml manager: - runsService: + server: port: 9090 # Changed from 8090 ``` diff --git a/manager/config.yaml b/manager/config.yaml index 640cfc52218..e49260391d4 100644 --- a/manager/config.yaml +++ b/manager/config.yaml @@ -65,7 +65,7 @@ runs: postgres: host: "localhost" port: 30001 - dbName: "runs" + dbname: "runs" username: "postgres" password: "postgres" options: "sslmode=disable"