Found auditing the website branch (OpenCL convolution; also in PR #106). The three critical correctness bugs are fixed in #122 — this issue tracks the remaining non-critical items in __opencl__.py / image_convolution.py.
MEDIUM — per-call context + kernel recompile. convolve1D_opencl creates a fresh cl.Context, cl.CommandQueue, re-reads the .cl file and re-runs cl.Program(...).build() on every call — i.e. 3 context creations + 3 JIT compiles per 3D convolution, looped over every frame in generate_frames_volume_convolution. Buffers are never explicitly released. This negates the performance benefit the GPU path is meant to provide. Cache the context/queue/compiled program (module- or device-scoped) and reuse.
LOW — dead fp64 check. In __opencl__.py the cl64/fp64-extension conditional sets cl_dp = False in both branches, so double precision is never enabled and _get_cl_code always rewrites double→float. Harmless (kernel only uses float) but misleading dead code.
LOW — single-device path skips the GPU-type filter. When exactly one platform+device exists, it's selected as _fastest_device without the "GPU" in device_type check applied in the multi-device branch. A CPU-only OpenCL runtime (e.g. PoCL) would then drive the GPU path on a CPU device.
LOW — debug print + redundant except tuple. except (ImportError, OSError, Exception) — Exception subsumes the others; and print("This exception is what's causing cl equals None:", e) prints on every import on machines without pyopencl. Reduce to except Exception and drop/downgrade the print to warnings.warn/logging.
Found auditing the
websitebranch (OpenCL convolution; also in PR #106). The three critical correctness bugs are fixed in #122 — this issue tracks the remaining non-critical items in__opencl__.py/image_convolution.py.MEDIUM — per-call context + kernel recompile.
convolve1D_openclcreates a freshcl.Context,cl.CommandQueue, re-reads the.clfile and re-runscl.Program(...).build()on every call — i.e. 3 context creations + 3 JIT compiles per 3D convolution, looped over every frame ingenerate_frames_volume_convolution. Buffers are never explicitly released. This negates the performance benefit the GPU path is meant to provide. Cache the context/queue/compiled program (module- or device-scoped) and reuse.LOW — dead fp64 check. In
__opencl__.pythecl64/fp64-extension conditional setscl_dp = Falsein both branches, so double precision is never enabled and_get_cl_codealways rewritesdouble→float. Harmless (kernel only usesfloat) but misleading dead code.LOW — single-device path skips the GPU-type filter. When exactly one platform+device exists, it's selected as
_fastest_devicewithout the"GPU" in device_typecheck applied in the multi-device branch. A CPU-only OpenCL runtime (e.g. PoCL) would then drive the GPU path on a CPU device.LOW — debug print + redundant except tuple.
except (ImportError, OSError, Exception)—Exceptionsubsumes the others; andprint("This exception is what's causing cl equals None:", e)prints on every import on machines without pyopencl. Reduce toexcept Exceptionand drop/downgrade the print towarnings.warn/logging.