Description
What happened?
We recently migrated from version 0.7.1 to 1.2.1 and migrated the way we build our docker images from using a modified version of the old template to py_image_layer
. Overall it has been great except for one thing: we deploy our containers to K8S and have health/status checks on them. The issue is the health checks use the same binary that the image runs normally, just in different modes. Concretely, we are running Dagster which has the container run dagster api grpc
and then the health checks use dagster api grpc-health-check
. What we found is since the same binary target is being run by two separate cases in the same container, the venv that backed the python script was being re-created each health check run. This caused the base process to lose the packages in it's venv temporarily, causing it to be unhealthy and thus fail.
It seems like #522 would fix this since the venv would be stable but in the meantime, is there a way to fix this temporarily?
Version
Development (host) and target OS/architectures: aarch Darwin -> aarch Darwin, Linux x86_64 -> Linux x86_64
Output of bazel --version
: 8.0.0
Version of the Aspect rules, or other relevant rules from your
WORKSPACE
or MODULE.bazel
file: 1.2.1
Language(s) and/or frameworks involved: Python 3.11, Docker/rules_oci 1.7.4
How to reproduce
Hard to reliably reproduce since it is a bit of a race condition.
One way would be to have a python binary that continually tries to import a package and create an OCI image using `py_image_layer`. Then run the image and then `exec` the binary again in another window. One of the two should error out but may take several iterations
Any other information?
We were getting errors that looked like
ERROR 2025-02-05T17:31:44.082500645Z [resource.labels.containerName: dagster-user-deployments] File "/data/pomelo/dagster.runfiles/.dagster.venv/lib/python3.11/site-packages/dsp/modules/__init__.py", line 22, in <module>
ERROR 2025-02-05T17:31:44.082966540Z [resource.labels.containerName: dagster-user-deployments] from .pyserini import *
ERROR 2025-02-05T17:31:44.082988358Z [resource.labels.containerName: dagster-user-deployments] File "/data/pomelo/dagster.runfiles/.dagster.venv/lib/python3.11/site-packages/dsp/modules/pyserini.py", line 4, in <module>
ERROR 2025-02-05T17:31:44.083429066Z [resource.labels.containerName: dagster-user-deployments] from datasets import Dataset
ERROR 2025-02-05T17:31:44.083460Z [resource.labels.containerName: dagster-user-deployments] ModuleNotFoundError: No module named 'datasets'
despite having datasets
included in the binary. After turning off our health checks, the error went away. I also SSH'd into the pod and inspected the packages in the venv that was generated and saw it would repeatedly have a subset of the expected packages and then quickly after have all of the expected packages