Skip to content

Commit cd219ca

Browse files
authored
fix(bug): Ensure windows agent stability using hubble/legacy helm values (#1128)
# Description This PR aims to fix the stability of the retina windows agent. There were 4 causes identified and each commit resolves one respectively. 1. Invalid rendering of the namespace helm value (1st commit) ``` matmerr@matmerr-cloud-dev: ~/go/src/github.com/Azure/telescope [06:56:29 PM][matmerr-aks-pktmon-11][matmerr/enable-ama]$ k logs -f retina-agent-win-7f7kb Starting Retina Agent starting Retina daemon with legacy control plane v0.0.17 2024/12/02 18:56:22 metricsInterval is deprecated, please use metricsIntervalDuration instead init client-go KUBECONFIG set, using kubeconfig: C:\hpc\kubeconfig Error: starting daemon: creating controller-runtime manager: error loading config file "C:\hpc\kubeconfig": yaml: invalid map key: map[interface {}]interface {}{".Values.namespace":interface {}(nil)} ``` 2. Default operator value is enabled and will cause RBAC issues for the windows agents (2nd commit) ``` ts=2024-12-10T16:58:48.634Z level=info caller=hnsstats/hnsstats_windows.go:212 msg="Start hnsstats plugin..." W1210 16:58:49.990792 7108 reflector.go:547] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: failed to list *v1alpha1.MetricsConfiguration: metricsconfigurations.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "metricsconfigurations" in API group "retina.sh" at the cluster scope ``` 3. Telemetry enabled also causes the agent to panic if application insights is not defined. User can change the config map as desired but default values should not cause the agent to crash (3rd commit) 4. `kubeconfig` file cannot be found for the legacy chart values. Executing the `setkubeconfigpath.ps1` was required for the container setup (4th commit). Update: It was later found that the missing `kubeconfig` error only exists on redeploy if the initial retina was before this change (#1118). A later GH issue was created - #1138 ``` beegii@bignamboi:~/src/retina$ k logs retina-agent-win-4tl7m -n kube-system Starting Retina Agent starting Retina daemon with legacy control plane v0.0.17 2024/12/11 18:40:15 metricsInterval is deprecated, please use metricsIntervalDuration instead init client-go KUBECONFIG set, using kubeconfig: C:\hpc\kubeconfig Error: starting daemon: creating controller-runtime manager: CreateFile C:\hpc\kubeconfig: The system cannot find the file specified. ``` ## Related Issue #1122 ## Checklist - [x] I have read the [contributing documentation](https://retina.sh/docs/contributing). - [x] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [x] I have correctly attributed the author(s) of the code. - [x] I have tested the changes locally. - [x] I have followed the project's style guidelines. - [x] I have updated the documentation, if necessary. - [x] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed Each commit corresponding image was built and tested on the cluster to confirm each fix works! ![image](https://github.com/user-attachments/assets/dde7fe23-22ff-49bf-8c96-2c1a42c96f9d) ## Additional Notes First three problems were experienced when deploying retina using the hubble path and the last issue was experienced when deploying retina using the legacy path --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project.
1 parent 29331f0 commit cd219ca

File tree

5 files changed

+12
-6
lines changed

5 files changed

+12
-6
lines changed

deploy/hubble/manifests/controller/helm/retina/templates/agent/configmap.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ data:
132132
metricsInterval: {{ .Values.metricsInterval }}
133133
metricsIntervalDuration: {{ .Values.metricsIntervalDuration }}
134134
enableTelemetry: {{ .Values.enableTelemetry }}
135-
enablePodLevel: {{ .Values.enablePodLevel }}
135+
enablePodLevel: false
136136
remoteContext: {{ .Values.remoteContext }}
137137
bypassLookupIPOfInterest: {{ .Values.bypassLookupIPOfInterest }}
138138
{{- end}}

deploy/hubble/manifests/controller/helm/retina/values.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ logLevel: info
9090
enabledPlugin_linux: '["linuxutil","packetforward","packetparser","dns", "dropreason"]'
9191
enabledPlugin_win: '["hnsstats"]'
9292

93-
enableTelemetry: true
93+
enableTelemetry: false
9494

9595
# Interval, in duration, to scrape/publish metrics.
9696
metricsIntervalDuration: "10s"

deploy/legacy/manifests/controller/helm/retina/templates/daemonset.yaml

+7-1
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,13 @@ spec:
203203
containerPort: {{ .Values.retinaPort }}
204204
workingDir: $env:CONTAINER_SANDBOX_MOUNT_POINT
205205
command:
206-
- controller.exe --config ./retina/config.yaml
206+
- powershell.exe
207+
- -command
208+
{{- if semverCompare ">=1.28" .Capabilities.KubeVersion.GitVersion }}
209+
- $env:CONTAINER_SANDBOX_MOUNT_POINT/controller.exe --config ./retina/config.yaml
210+
{{- else }}
211+
- .\setkubeconfigpath.ps1; ./controller.exe --config ./retina/config.yaml --kubeconfig ./kubeconfig
212+
{{- end }}
207213
env:
208214
- name: POD_NAME
209215
valueFrom:

windows/kubeconfigtemplate.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ contexts:
99
- name: azure-retina-windows@kubernetes
1010
context:
1111
cluster: kubernetes
12-
namespace: {{ .Values.namespace }}
12+
namespace: kube-system
1313
user: azure-retina-windows
1414
current-context: azure-retina-windows@kubernetes
1515
users:

windows/manifests/windows.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ metadata:
44
labels:
55
app: retina
66
name: retina-win
7-
namespace: {{ .Values.namespace }}
7+
namespace: kube-system
88
annotations:
99
prometheus.io/port: "10093"
1010
prometheus.io/scrape: "true"
@@ -62,7 +62,7 @@ apiVersion: v1
6262
kind: ConfigMap
6363
metadata:
6464
name: retina-config-win
65-
namespace: {{ .Values.namespace }}
65+
namespace: kube-system
6666
data:
6767
config.yaml: |-
6868
apiServer:

0 commit comments

Comments
 (0)