Replies: 4 comments 1 reply
-
|
Interesting setup. I've been running OpenClaw in a similar split-infra pattern (not K8s specifically, but agent + external APIs on separate hosts). A few thoughts: 1. Direct K8s API vs Grafana Direct K8s API is better for your use case. The agent can query exactly what it needs ( That said — don't give it cluster-admin. Create a ServiceAccount with a Role scoped to your test namespace: 2. Context window management This is the real challenge. A busy namespace can produce megabytes of events and logs per hour. What works:
With Nemotron-3-120B you have a decent context window, but token cost per run adds up fast if you're scanning every 5 minutes. 3. Practical tip for the CronJob trigger Rather than a fixed interval, consider having your CronJob check for recent warning events first ( What namespace complexity are we talking about? (number of pods, typical churn rate) That would help narrow down the filtering strategy. |
Beta Was this translation helpful? Give feedback.
-
|
The test namespace doesn't exist yet. I am currently creating it from scratch just to test nemoclaw. My plan is to start small by setting up a dummy test backend-app's, test databases etc. to validate the workflow and see how the LLM handles the context. Once I prove this concept works, my main goal is to deploy it to real applications. I will probably split my applications into different namespaces and assign different roles across various VMs. I will definitely apply your filtering and RBAC tips while building the system. I'll make sure to share my test results and findings here once the setup is up and running. Thanks again. |
Beta Was this translation helpful? Give feedback.
-
|
Were you able to get OpenClaw to access your K8s cluster without using an SSH tunnel? I tried running it directly using kubectl get pods (with roles and everything set up), but I’m getting this error: 'Direct access to standard Kubernetes control-plane ports (like port 6443) is unconditionally hard-blocked.' As far as I understand, it's blocked by OpenShell and cannot be changed. But maybe I misunderstood something? :) |
Beta Was this translation helpful? Give feedback.
-
|
Yeah that's by design — OpenShell hard-blocks a handful of ports that could let the sandbox mess with the host's control plane. Port 6443 (K8s API) is one of them, and you can't override it through the network policy file. What I'd do instead: run
Option 2 is basically what @uzunenes would end up building anyway for the production setup — a controlled interface between the agent and the cluster, rather than handing it raw API access. The hard-block exists for a good reason though. If the sandbox could talk to 6443, a compromised agent could potentially escalate privileges through the K8s API. The indirection layer forces you to decide exactly what the agent is allowed to see and do. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone, looking for best practice advice on this architecture:
The Stack: nvidia/nemotron-3-super-120b-a12b (2x H200) + Nemoclaw on a separate VM (same DC, HTTP endpoint).
The Plan: Give Nemoclaw K8s API access (via token) to a test namespace. Trigger it via CronJob or Telegram to scan for issues and send SMTP alerts.
Important context: I already use Grafana alerts for deterministic problems. This LLM setup is strictly for rapid detection of complex, non-deterministic edge cases.
My Questions:
Is querying the K8s API directly the best practice for Nemoclaw in this scenario?
Alternatively, should I ship all namespace logs/events to my Grafana stack first and have Nemoclaw analyze them from there instead of direct K8s access?
Any quick tips on filtering the data to avoid blowing up the context window?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions