Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions helm/blueapi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ A Helm chart deploying a worker pod that runs Bluesky plans
|-----|------|---------|-------------|
| affinity | object | `{}` | May be required to run on specific nodes (e.g. the control machine) |
| debug.enabled | bool | `false` | If enabled, disables liveness and readiness probes, and does not start the service on startup This allows connecting to the pod and starting the service manually to allow debugging on the cluster |
| debug.volume.size | string | `"2Gi"` | |
| extraEnvVars | list | `[]` | Additional envVars to mount to the pod |
| fullnameOverride | string | `""` | |
| global | object | `{}` | Not used, but must be present for validation when using as a dependency of another chart |
Expand Down
54 changes: 52 additions & 2 deletions helm/blueapi/templates/statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@ spec:
sources:
- configMap:
name: {{ include "blueapi.fullname" . }}-config
{{- if .Values.debug.enabled }}
- name: {{ include "blueapi.fullname" . }}-develop
persistentVolumeClaim:
claimName: {{ include "blueapi.fullname" . }}-develop
{{- end }}
Comment on lines +48 to +52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug.enabled is doing a lot of things on a single boolean- it changes the container that is being used to one that does not start the service by default (allows debugging service startup); it starts a sidecar and mounts config (to allow VSCode to attach); and it would now mount an additional PVC and initContainer in addition to one that may already be configured for scratch purposes, overwriting code changes, so long as the container follows a convention for placement and use of submodule dependencies that is not documented.

We must better decompose those behaviours, as well as documenting the requirements.

I propose we keep the existing scratch configuration/PVC for the copying of code from the container, but make it more in line with what this change is suggesting, while documenting the requirements for mutability:

For the repository plan-bluesky

  • pyproject.toml depends on blueapi, dodal directly
  • Dockerfile places the source for plan-bluesky and dodal in a predictable location /workspaces/$REPO
  • Dockerfile may also place the source for other dependencies in /workspaces/ if they may need to be editable

For the blueapi repository:

  • Dockerfile places the source for blueapi in a predictable location /workspaces/$REPO
  • Dockerfile has a -debug image tag that does not start the service by default to allow debugging container startup by execing into the pod

For both:

  • Dockerfile runs the blueapi CLI
  • Helm chart from blueapi repository can be used, -debug image can be specified in the image: field
  • Helm chart reads scratch configuration from values.yaml and attempts to copy source from the container, falling back to fetching the latest release (later: allow configuration of specific versions), copying either to scratch on /dls_sw/ or a volume as currently configured (now: add field to switch ephemeral/persisted for version/persisted constantly)
  • When using a PVC for the scratch area, it is chmod -R o+wrX'd, and the sidecar/mounting to allow fedids to be picked up is enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting we get rid of the ability to include arbitrary external repos in the scratch area (i.e. no more git cloning at all)?

Copy link
Contributor

@DiamondJoseph DiamondJoseph Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

falling back to fetching the latest release (later: allow configuration of specific versions)

falling back to fetching from the default branch of the remote_url from the ScratchRepository config (later: allow configuration of specific refs)*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And with this current setup, the pvc is deleted when you disable debug mode, are you suggesting getting rid of that too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting in add field to switch ephemeral/persisted for version/persisted constantly that that becomes configurable behaviour.

initContainer:
  persistentVolume:
    enabled: false
    # New field, current behaviour is Version, removed when initContainer is disabled is Session?
    lifetime: Version | Persistent | Session

{{- with .Values.volumes }}
{{- toYaml . | nindent 6 }}
{{- end }}
Expand Down Expand Up @@ -76,8 +81,10 @@ spec:
emptyDir:
sizeLimit: 5Mi
{{- end }}
{{- if .Values.initContainer.enabled }}
{{- if or .Values.initContainer.enabled .Values.debug.enabled }}
initContainers:
{{- end}}
{{- if .Values.initContainer.enabled }}
- name: setup-scratch
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}{{ ternary "-debug" "" .Values.debug.enabled }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
Expand Down Expand Up @@ -106,6 +113,37 @@ spec:
mountPropagation: HostToContainer
{{- end }}
{{- end }}
{{- if .Values.debug.enabled }}
# add in an init container to copy out virtual environment and workspaces
- name: {{ include "blueapi.fullname" . }}-init-debug
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}-debug"
imagePullPolicy: IfNotPresent
resources:
{{- .Values.initResources | default .Values.resources | toYaml | nindent 12 }}
{{- with .Values.securityContext }}
securityContext:
{{- toYaml . | nindent 12 }}
{{- end }}
command:
- /bin/bash
- -c
- |
echo "Running as account"; id
if [ -d /dest/venv ]; then
echo "Virtual environment already exists, skipping copy"
else
echo "Copying virtual env to the debugging volume"
cp -r /venv /dest/venv
echo "Copying workspaces to the debugging volume"
cp -r /workspaces /dest/workspaces
echo "Setting permissions on the debugging volume"
chmod -R o+rwX /dest/venv /dest/workspaces
fi
echo "Init container completed successfully"
volumeMounts:
- name: {{ include "blueapi.fullname" . }}-develop
mountPath: /dest
{{- end }}
containers:
- name: {{ .Chart.Name }}
{{- with .Values.securityContext }}
Expand Down Expand Up @@ -147,7 +185,19 @@ spec:
- name: venv
mountPath: /venv
{{- end }}
{{- if or .Values.debug.enabled (and .Values.initContainer.enabled .Values.initContainer.persistentVolume.enabled) }}
{{- if or .Values.debug.enabled }}
- mountPath: /home
name: home
- mountPath: /var/run/nslcd
name: nslcd
Comment on lines +189 to +192
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these, and /venv would cause issues if debug and initContainer.persistentVolume were enabled simultaneously, as duplicating the entry?

- name: {{ include "blueapi.fullname" . }}-develop
mountPath: /workspaces
subPath: workspaces
- name: {{ include "blueapi.fullname" . }}-develop
mountPath: /venv
subPath: venv
{{- end }}
{{- if and .Values.initContainer.enabled .Values.initContainer.persistentVolume.enabled }}
- mountPath: /home
name: home
- mountPath: /var/run/nslcd
Expand Down
16 changes: 16 additions & 0 deletions helm/blueapi/templates/volumes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,19 @@ spec:
requests:
storage: 1Gi
{{- end }}
---
# PVC for debugging volume
{{- if .Values.debug.enabled }}
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: {{ include "blueapi.fullname" . }}-develop
labels:
app: {{ include "blueapi.fullname" . }}
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: {{ .Values.debug.volume.size }}
{{- end }}
8 changes: 8 additions & 0 deletions helm/blueapi/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@
"enabled": {
"description": "If enabled, disables liveness and readiness probes, and does not start the service on startup This allows connecting to the pod and starting the service manually to allow debugging on the cluster",
"type": "boolean"
},
"volume": {
"type": "object",
"properties": {
"size": {
"type": "string"
}
}
}
}
},
Expand Down
3 changes: 2 additions & 1 deletion helm/blueapi/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,6 @@ resources:
requests:
cpu: 200m
memory: 400Mi


# -- Override resources for init container. By default copies resources of main container.
initResources: {}
Expand Down Expand Up @@ -226,6 +225,8 @@ debug:
# -- If enabled, disables liveness and readiness probes, and does not start the service on startup
# This allows connecting to the pod and starting the service manually to allow debugging on the cluster
enabled: false
volume:
size: 2Gi

# -- Not used, but must be present for validation when using as a dependency of another chart
global: {}
108 changes: 86 additions & 22 deletions tests/unit_tests/test_helm_chart.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,23 @@ def test_init_container_spec_generated():
assert len(init_containers) == 1


def test_init_container_spec_generated_in_debug_mode_even_if_disabled():
manifests = render_chart(
values={
"initContainer": {
"enabled": False,
},
"debug": {
"enabled": True,
},
},
)
init_containers = manifests["StatefulSet"]["blueapi"]["spec"]["template"]["spec"][
"initContainers"
]
assert len(init_containers) == 1


def test_init_container_spec_disablable():
manifests = render_chart(
values={
Expand Down Expand Up @@ -577,10 +594,10 @@ def test_init_container_venv_volume_mount(
@pytest.mark.parametrize("existingClaimName", [None, "foo"])
@pytest.mark.parametrize("debug_enabled", [True, False])
def test_persistent_volume_claim_exists(
initContainer_enabled,
persistentVolume_enabled,
existingClaimName,
debug_enabled,
initContainer_enabled: bool,
persistentVolume_enabled: bool,
existingClaimName: str | None,
debug_enabled: bool,
):
manifests = render_persistent_volume_chart(
initContainer_enabled,
Expand All @@ -589,7 +606,7 @@ def test_persistent_volume_claim_exists(
debug_enabled,
)

persistent_volume_claim = {
scratch_claim = {
"scratch-0.1.0": {
"apiVersion": "v1",
"kind": "PersistentVolumeClaim",
Expand All @@ -603,9 +620,26 @@ def test_persistent_volume_claim_exists(
},
}
}
debug_claim = {
"blueapi-develop": {
"apiVersion": "v1",
"kind": "PersistentVolumeClaim",
"metadata": {"labels": {"app": "blueapi"}, "name": "blueapi-develop"},
"spec": {
"accessModes": ["ReadWriteMany"],
"resources": {"requests": {"storage": "2Gi"}},
},
}
}

if persistentVolume_enabled and not existingClaimName:
assert persistent_volume_claim == manifests["PersistentVolumeClaim"]
is_existing_claim = existingClaimName is not None

if (persistentVolume_enabled and not is_existing_claim) and not debug_enabled:
assert scratch_claim == manifests["PersistentVolumeClaim"]
elif not (persistentVolume_enabled and not is_existing_claim) and debug_enabled:
assert debug_claim == manifests["PersistentVolumeClaim"]
elif persistentVolume_enabled and not is_existing_claim and debug_enabled:
assert {**scratch_claim, **debug_claim} == manifests["PersistentVolumeClaim"]
else:
assert "PersistentVolumeClaim" not in manifests

Expand Down Expand Up @@ -750,12 +784,12 @@ def test_main_container_venv_volume_mount(
@pytest.mark.parametrize("existingClaimName", [None, "foo"])
@pytest.mark.parametrize("debug_enabled", [True, False])
def test_main_container_home_and_nslcd_volume_mounts(
initContainer_enabled,
persistentVolume_enabled,
existingClaimName,
debug_enabled,
home_volume_mount,
nslcd_volume_mount,
initContainer_enabled: bool,
persistentVolume_enabled: bool,
existingClaimName: str | None,
debug_enabled: bool,
home_volume_mount: dict,
nslcd_volume_mount: dict,
):
manifests = render_persistent_volume_chart(
initContainer_enabled,
Expand Down Expand Up @@ -815,8 +849,8 @@ def test_main_container_args(
@pytest.mark.parametrize("existingClaimName", [None, "foo"])
@pytest.mark.parametrize("debug_enabled", [True, False])
def test_scratch_volume_uses_correct_claimName(
existingClaimName,
debug_enabled,
existingClaimName: str | None,
debug_enabled: bool,
):
manifests = render_persistent_volume_chart(
True,
Expand All @@ -825,16 +859,46 @@ def test_scratch_volume_uses_correct_claimName(
debug_enabled,
)

claim_name = manifests["StatefulSet"]["blueapi"]["spec"]["template"]["spec"][
"volumes"
][3]["persistentVolumeClaim"]["claimName"]
volumes = manifests["StatefulSet"]["blueapi"]["spec"]["template"]["spec"]["volumes"]
claim_names = [
volume["persistentVolumeClaim"]["claimName"]
for volume in volumes
if "persistentVolumeClaim" in volume
]

if existingClaimName:
assert claim_name == existingClaimName
assert "PersistentVolumeClaim" not in manifests
assert existingClaimName in claim_names
else:
assert "scratch-0.1.0" in claim_names


@pytest.mark.parametrize("initContainer_enabled", [True, False])
@pytest.mark.parametrize("persistentVolume_enabled", [True, False])
@pytest.mark.parametrize("existingClaimName", [None, "foo"])
@pytest.mark.parametrize("debug_enabled", [True, False])
def test_debug_volume_exists(
initContainer_enabled: bool,
persistentVolume_enabled: bool,
existingClaimName: str | None,
debug_enabled: bool,
):
manifests = render_persistent_volume_chart(
initContainer_enabled,
persistentVolume_enabled,
existingClaimName,
debug_enabled,
)

expected_volume = {
"name": "blueapi-develop",
"persistentVolumeClaim": {"claimName": "blueapi-develop"},
}

volumes = manifests["StatefulSet"]["blueapi"]["spec"]["template"]["spec"]["volumes"]
if debug_enabled:
assert expected_volume in volumes
else:
assert claim_name == "scratch-0.1.0"
assert claim_name in manifests["PersistentVolumeClaim"]
assert expected_volume not in volumes


@pytest.fixture
Expand Down