Skip to content

Commit af88093

Browse files
committed
sushy_emulator: healthcheck + restart on failure
In adoption jobs the sushy_emulator stops working somewhere in the job run. We see errors raised from python libvirt library: - `libvirt: XML-RPC error : Cannot write data: Broken pipe` - `libvirt: XML-RPC error : internal error: client socket is closed` This change adds a healthcheck to the podman pod, probe the service every 30 seconds, and trigger a container restart on two failures. Jira: OSPRH-15686
1 parent 83c519f commit af88093

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

roles/sushy_emulator/tasks/create_container.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,12 @@
4141
- "{{ dest_dir }}/known_hosts:/root/.ssh/known_hosts:ro,Z"
4242
- "{{ cifmw_sushy_emulator_sshkey_path }}:/root/.ssh/id_rsa:ro,Z"
4343
- "{{ cifmw_sushy_emulator_sshkey_path }}.pub:/root/.ssh/id_rsa.pub:ro,Z"
44+
healthcheck: >-
45+
python3 -c "import urllib.request, base64;
46+
req = urllib.request.Request('http://localhost:8000/redfish/v1/Systems/{{ _cifmw_sushy_emulator_instances[0] }}');
47+
req.add_header('Authorization', 'Basic ' + base64.b64encode(b'{{ cifmw_sushy_emulator_redfish_username }}:{{ cifmw_sushy_emulator_redfish_password }}').decode());
48+
urllib.request.urlopen(req, timeout=5).read()"
49+
healthcheck_interval: "30s"
50+
healthcheck_timeout: "30s"
51+
healthcheck_retries: 2
52+
healthcheck_failure_action: restart

0 commit comments

Comments
 (0)