Skip to content

feat: multi-engine pods and node-role fix#778

Open
k-rister wants to merge 2 commits intomasterfrom
kube-updates
Open

feat: multi-engine pods and node-role fix#778
k-rister wants to merge 2 commits intomasterfrom
kube-updates

Conversation

@k-rister
Copy link
Contributor

Summary

  • Multi-engine pods: Add support for grouping multiple client/server engines into a single pod as separate containers via a new optional pods key in the kube endpoint run-file config. Engines not listed in any pod group automatically get their own solo pod, preserving full backward compatibility.
  • Node-role detection fix: Correct inverted OCP label mapping (master/worker were swapped) and eliminate duplicate node detection when nodes match multiple label conditions.

Test plan

  • Validate schema changes with existing run-files (no pods key) to confirm backward compatibility
  • Test with a run-file using pods config to group engines into shared pods
  • Verify pod-level setting validation catches mismatches (e.g., different cpu-partitioning in same pod group)
  • Verify node discovery no longer reports duplicate nodes

🤖 Generated with Claude Code

k-rister and others added 2 commits March 26, 2026 16:50
Add an optional "pods" key to the kube endpoint run-file config that
allows grouping multiple client/server engines into a single pod as
separate containers. Each pod group can have a user-defined name or
an auto-generated one. Engines not listed in any pod group
automatically get their own solo pod, preserving backward
compatibility.

Key changes:
- Add "pods" schema to kube.json with name, engines (role + ids)
- Normalize and validate pod groups in normalize_endpoint_settings()
- Validate pod-level setting consistency (cpu-partitioning,
  nodeSelector, hostNetwork, runtimeClassName, annotations,
  securityContext.pod) across engines sharing a pod
- Refactor create_pod_crd() to accept a list of engines and build
  one container per engine with per-engine settings
- Refactor create_cs_pods() to iterate pod groups instead of
  individual engines
- Update verify_pods_running() to handle multi-engine pod details
- Add engine-to-pod aliases so downstream lookups by engine name
  (tool deployment, services, log collection) continue to work
- Add deduplication in engine_init() and kube_cleanup() to avoid
  processing aliased pod entries multiple times

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Fix two bugs in get_k8s_config() node discovery:

1. OCP label mapping was inverted — nodes with the "worker" label
   were added to the masters list and vice versa.

2. Nodes matching multiple label conditions (e.g., both
   "node-role.kubernetes.io/master" and
   "node-role.kubernetes.io/control-plane") were appended to the
   same list multiple times. Switch from lists with append() to
   sets with add(), then convert to sorted lists at the end.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant