Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 83 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@ Before contributing, ensure you have:
- [Buf CLI](https://buf.build/docs/installation) installed
- Go 1.24.6 or later
- Node.js and npm (for TypeScript)
- Python 3.9+ with `uv` package manager
- Python 3.10+ with `uv` package manager
- Rust toolchain (if working with Rust bindings)
- Git configured with your name and email
- Docker (for building and running the devbox image)
Comment thread
pingsutw marked this conversation as resolved.

### Setting Up Your Environment

Expand All @@ -44,6 +45,87 @@ Before contributing, ensure you have:
make gen
```

## Running Flyte Locally

The fastest way to run a full Flyte stack on your machine is the bundled **devbox** — a k3d-based Kubernetes cluster with all dependencies (TaskAction CRD, Knative, PostgreSQL, etc.) pre-installed — combined with a locally-running `flyte-manager` binary.
Comment thread
pingsutw marked this conversation as resolved.

### Start the Flyte Devbox

From the repo root:

```bash
# Build the devbox image (first time only, or after Dockerfile changes)
make devbox-build

# Start the devbox cluster in dev mode (required for running the manager locally)
make devbox-run FLYTE_DEV=true

# Stop the devbox when you're done
make devbox-stop
Comment thread
pingsutw marked this conversation as resolved.
```

`FLYTE_DEV=true` is required when you intend to run the manager locally — it disables the in-cluster manager so your local process can take over. `make devbox-run` writes a kubeconfig pointing at the devbox cluster in global kubeconfig, so `kubectl` will target it automatically.
Comment thread
pingsutw marked this conversation as resolved.

### Build and Run the Manager

With the devbox running, start the manager locally:

```bash
# From the repo root
make -C manager run

# Or from manager/
make run

# Or build and run the binary directly
cd manager
make build
./bin/flyte-manager --config config.yaml
```

The manager will:
1. Connect to PostgreSQL and run database migrations
2. Start all services in parallel goroutines
3. Connect to your Kubernetes cluster
4. Begin reconciling TaskAction CRs

### Configuration

Edit `manager/config.yaml`:

```yaml
manager:
# Single server port hosting all Connect services (Runs, Actions, DataProxy, Events, Cache, Secret, App).
server:
host: "0.0.0.0"
port: 8090

executor:
healthProbePort: 8081

kubernetes:
namespace: "flyte"
# Optional: specify custom kubeconfig path
# kubeconfig: "/path/to/kubeconfig"

runs:
storagePrefix: "s3://flyte-data"
database:
postgres:
host: "localhost"
port: 30001
dbname: "runs"
username: "postgres"
password: "postgres"
options: "sslmode=disable"

logger:
level: 4 # Info level
show-source: true
```

See [`manager/README.md`](manager/README.md) for the full architecture, API endpoints, and troubleshooting tips.

## Development Workflow

### Creating a Feature Branch
Expand Down
211 changes: 31 additions & 180 deletions manager/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,132 +2,39 @@

The Flyte Manager is a unified binary that runs all Flyte services in a single process:

- **Runs Service** (port 8090) - Manages workflow runs and action state
- **Queue Service** (port 8089) - Creates and manages TaskAction CRs in Kubernetes
- **Executor/Operator** (port 8081 health) - Reconciles TaskAction CRs and transitions them through states
- **Runs Service** - Manages workflow runs and action state
- **Executor/Operator** - Reconciles and transitions TaskAction CRs through their lifecycle
- **Actions Service** - Serves action metadata and lifecycle APIs, including enqueueing TaskAction CRs
- **DataProxy Service** - Proxies signed-URL and blob access for task I/O
- **Events Service** - Ingests and fans out task/run events
- **Cache Service** - Backs task output caching and lookups
- **App Service** (+ internal proxy) - Hosts the Flyte UI/app and routes to internal services
- **Secret Service** - Manages secret references used by tasks

## Features

✅ **Single Binary** - One process to deploy and manage
✅ **Single SQLite Database** - All data in one file
✅ **PostgreSQL Backend** - Shared database for all services
✅ **Auto Kubernetes Detection** - Uses current kubeconfig
✅ **Unified Configuration** - One config file for all services
✅ **HTTP/2 Support** - Buf Connect compatible

## Quick Start

### Prerequisites

1. **Kubernetes cluster** (k3d, kind, minikube, or any cluster)
2. **Go 1.24 or later**
3. **TaskAction CRD** installed in the cluster
4. **Kubeconfig** configured (or running in-cluster)

### Install TaskAction CRD

```bash
kubectl apply -f ../executor/config/crd/bases/flyte.org_taskactions.yaml
```

### Build and Run

```bash
# Build the binary
make build

# Run the manager
make run

# Or run directly
./bin/flyte-manager --config config.yaml
```

The manager will:
1. Initialize a SQLite database (`flyte.db`)
2. Run database migrations
3. Start all three services in parallel goroutines
4. Connect to your Kubernetes cluster
5. Begin reconciling TaskAction CRs

## Configuration

Edit `config.yaml`:

```yaml
manager:
runsService:
host: "0.0.0.0"
port: 8090

queueService:
host: "0.0.0.0"
port: 8089

executor:
healthProbePort: 8081

kubernetes:
namespace: "flyte"
# Optional: specify custom kubeconfig path
# kubeconfig: "/path/to/kubeconfig"

database:
type: "sqlite"
sqlite:
file: "flyte.db"

logger:
level: 4 # Info level
show-source: true
```

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│ Flyte Manager Process │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ Runs Service │ │ Queue Service │ │ Executor │ │
│ │ :8090 │ │ :8089 │ │ │ │
│ │ │ │ │ │ Reconciles │ │
│ │ - RunService │ │ Creates K8s │ │ TaskActions │ │
│ │ - StateServ. │ │ TaskAction CRs│ │ │ │
│ └──────┬───────┘ └───────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┴──────────────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ SQLite DB │ │
│ │ flyte.db │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────┘
Kubernetes Cluster
(TaskAction CRs)
```

## API Endpoints

### Runs Service (port 8090)
### Manager (port 8090)

- `POST /flyteidl2.workflow.RunService/CreateRun` - Create a new run
- `POST /flyteidl2.workflow.RunService/GetRun` - Get run details
- `POST /flyteidl2.workflow.RunService/ListRuns` - List runs
- `POST /flyteidl2.workflow.RunService/AbortRun` - Abort a run
- `POST /flyteidl2.workflow.StateService/Put` - Update action state
- `POST /flyteidl2.workflow.StateService/Get` - Get action state
- `POST /flyteidl2.workflow.StateService/Watch` - Watch state updates
- `GET /healthz` - Health check
- `GET /readyz` - Readiness check

### Queue Service (port 8089)
All Connect/gRPC services are mounted on a single port. Notable handlers:

- `POST /flyteidl2.workflow.QueueService/EnqueueAction` - Create TaskAction CR
- `POST /flyteidl2.workflow.QueueService/AbortQueuedRun` - Delete root TaskAction
- `POST /flyteidl2.workflow.QueueService/AbortQueuedAction` - Delete specific TaskAction
- `flyteidl2.workflow.RunService` - Create / Get / List / Abort runs
- `flyteidl2.workflow.InternalRunService` - Internal run-control APIs used by the executor
- `flyteidl2.workflow.TranslatorService` - Translates user task definitions
Comment thread
pingsutw marked this conversation as resolved.
- `flyteidl2.workflow.RunLogsService` - Stream logs for a run
- `flyteidl2.actions.ActionsService` - Action lifecycle and metadata
- `flyteidl2.task.TaskService` - Task registration and lookup
- `flyteidl2.trigger.TriggerService` - Schedules and triggers
- `flyteidl2.project.ProjectService` - Project management
- `flyteidl2.auth.IdentityService` / `AuthMetadataService` - Identity and auth metadata
- DataProxy, Events, Cache, Secret, and App services (see their respective packages)
- `GET /healthz` - Health check
- `GET /readyz` - Readiness check

Expand All @@ -138,26 +45,20 @@ logger:

## How It Works

1. **CreateRun** → Runs Service persists run to SQLite DB
2. **CreateRun** → Runs Service calls Queue Service to enqueue root action
3. **EnqueueAction** → Queue Service creates TaskAction CR in Kubernetes
4. **Executor** → Watches TaskAction CRs and reconciles them
5. **Executor** → Transitions: Queued → Initializing → Running → Succeeded
6. **Executor** → Calls State Service Put() on each transition
7. **State Service** → Persists state updates to SQLite DB
8. **State Service** → Notifies watchers of state changes
1. **CreateRun** → Runs Service persists the run to PostgreSQL and calls `ActionsService.Enqueue(...)` to enqueue the root action
2. **Actions Service / Executor** → That enqueue flow results in the root TaskAction CR being created in Kubernetes, which the Executor then watches and reconciles
3. **Executor** → Transitions: Queued → Initializing → Running → Succeeded
4. **Actions Service** → Watches TaskAction CRs via a shared informer and forwards status updates (phase, output URI, error state) to subscribers; sdk controller consumes these updates through `WatchForUpdates` to drive the run forward
5. **Runs Service** → Persists state changes to PostgreSQL and notifies its own watchers

## Testing

### Check Services

```bash
# Runs Service health
# Manager (Connect services) health
curl http://localhost:8090/healthz

# Queue Service health
curl http://localhost:8089/healthz

# Executor health
curl http://localhost:8081/healthz
```
Expand All @@ -178,11 +79,11 @@ kubectl describe taskaction <name> -n flyte
### Check Database

```bash
# Open SQLite database
sqlite3 flyte.db
# Connect to the PostgreSQL backend (devbox defaults)
psql -h localhost -p 30001 -U postgres -d runs

# List tables
.tables
\dt

# Query runs
SELECT * FROM runs;
Expand All @@ -199,55 +100,6 @@ SELECT name, phase, state FROM actions;
make run
```

### Docker

```dockerfile
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN cd manager && make build

FROM alpine:latest
RUN apk --no-cache add ca-certificates sqlite
WORKDIR /root/
COPY --from=builder /app/manager/bin/flyte-manager .
COPY --from=builder /app/manager/config.yaml .
CMD ["./flyte-manager", "--config", "config.yaml"]
```

### Kubernetes

Deploy as a single pod with access to the Kubernetes API:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: flyte-manager
namespace: flyte
spec:
replicas: 1
selector:
matchLabels:
app: flyte-manager
template:
metadata:
labels:
app: flyte-manager
spec:
serviceAccountName: flyte-manager
containers:
- name: manager
image: flyte-manager:latest
ports:
- containerPort: 8090
name: runs
- containerPort: 8089
name: queue
- containerPort: 8081
name: health
```

## Troubleshooting

### Connection Issues
Expand Down Expand Up @@ -280,12 +132,11 @@ manager:
```bash
# Check what's using the ports
lsof -i :8090
lsof -i :8089
lsof -i :8081

# Change ports in config.yaml
manager:
runsService:
server:
port: 9090 # Changed from 8090
```

Expand Down
2 changes: 1 addition & 1 deletion manager/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ runs:
postgres:
host: "localhost"
port: 30001
dbName: "runs"
dbname: "runs"
username: "postgres"
password: "postgres"
options: "sslmode=disable"
Expand Down
Loading