feat: per-message worker gating#76
Conversation
Adds an optional gate_cmd field to Task. When set, each worker runs the gate command before the task; non-zero exit causes the worker to return the message to the queue (with a short visibility delay) without consuming a retry, so a different worker can pick it up. Gate results are cached per-worker (LRU keyed on sha256(gate_cmd)) so subsequent messages with the same gate are resolved without spawning another subprocess. Public API additions: * ai4s.jobq.GateNotPassedError * JobQ.push(..., gate_cmd=...) * batch_enqueue(..., gate_cmd=...) * ai4s-jobq push --gate-cmd * Envelope.requeue(visibility_timeout=...) * New module ai4s.jobq.gating Backward compatibility: tasks without a gate keep schema v1 and stable deterministic IDs. Tasks with a gate use schema v2. Storage Queue honors the requested visibility delay; the Service Bus REST backend logs a one-time warning that the delay degrades to immediate redelivery (no native scheduled-redelivery primitive). GateNotPassedError is treated by launch_workers like LockLostError: not counted toward --max-consecutive-failures. Tunable via JOBQ_GATE_CACHE_SIZE, JOBQ_GATE_TIMEOUT_S, and JOBQ_GATE_REQUEUE_DELAY_S. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot , tests are failing: make sure they pass. Then:
|
Agent-Logs-Url: https://github.com/microsoft/ai4s-jobq/sessions/406554ad-6f7b-435d-a423-703697fe326a Co-authored-by: temporaer <27792+temporaer@users.noreply.github.com>
…avior Agent-Logs-Url: https://github.com/microsoft/ai4s-jobq/sessions/406554ad-6f7b-435d-a423-703697fe326a Co-authored-by: temporaer <27792+temporaer@users.noreply.github.com>
Implemented in
No UI changes were made, so there is no screenshot to provide. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
|
||
| max_gate_not_qualified = DEFAULT_MAX_NOT_QUALIFIED | ||
| raw_max_gate_not_qualified = os.environ.get("JOBQ_GATE_MAX_NOT_QUALIFIED") | ||
| if raw_max_gate_not_qualified: |
There was a problem hiding this comment.
extract this whole validation piece into a helper, see if you can field it in other places. Something like
max_gate_not_qualified = validate_envvar("JOBQ_GATE_MAX_NOT_QUALIFIED", int, DEFAULT_MAX_NOT_QUALIFIED)I see you already created _env_int in another file.
There was a problem hiding this comment.
Done in c1e21e1 (main changes in d8fb7ec). I extracted env-var parsing/validation into validate_envvar(...) in ai4s/jobq/gating.py and now launch_workers uses it for JOBQ_GATE_MAX_NOT_QUALIFIED exactly as requested. No UI changes were made, so there is no screenshot.
| return sha256(gate_cmd.encode("utf-8")).hexdigest() | ||
|
|
||
|
|
||
| class GateCache: |
There was a problem hiding this comment.
we already depend on cachetools, no need to reinvent the wheel!
There was a problem hiding this comment.
Done in c1e21e1 (cache change in d8fb7ec). GateCache now uses cachetools.LRUCache instead of a custom OrderedDict LRU implementation. No UI changes were made, so there is no screenshot.
| ## How it works | ||
|
|
||
| 1. When pushing a task, the producer sets `gate_cmd` to any shell command. | ||
| 2. When a worker dequeues the message, it computes |
There was a problem hiding this comment.
I think users would know how a cache works, just mention it in passing.
There was a problem hiding this comment.
Done in c1e21e1 (docs update in d8fb7ec). I simplified the “How it works” section to mention caching briefly without deep cache internals. No UI changes were made, so there is no screenshot.
Agent-Logs-Url: https://github.com/microsoft/ai4s-jobq/sessions/fa6d3beb-5f43-4cca-9a60-c631ad721c16 Co-authored-by: temporaer <27792+temporaer@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/ai4s-jobq/sessions/fa6d3beb-5f43-4cca-9a60-c631ad721c16 Co-authored-by: temporaer <27792+temporaer@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/ai4s-jobq/sessions/fa6d3beb-5f43-4cca-9a60-c631ad721c16 Co-authored-by: temporaer <27792+temporaer@users.noreply.github.com>
Addressed in Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Adds an optional
gate_cmdfield to Task. When set, each worker runs the gate command before the task; non-zero exit causes the worker to return the message to the queue (with a short visibility delay) without consuming a retry, so a different worker can pick it up. Gate results are cached per-worker (LRU keyed on sha256(gate_cmd)) so subsequent messages with the same gate are resolved without spawning another subprocess.Public API additions
ai4s.jobq.GateNotPassedErrorJobQ.push(..., gate_cmd=...)batch_enqueue(..., gate_cmd=...)ai4s-jobq push --gate-cmdEnvelope.requeue(visibility_timeout=...)ai4s.jobq.gatingBackward compatibility
Tasks without a gate keep schema v1 and stable deterministic IDs. Tasks with a gate use schema v2.
Backend behavior
GateNotPassedErroris treated bylaunch_workerslikeLockLostError: not counted toward--max-consecutive-failures.Configuration
Tunable via
JOBQ_GATE_CACHE_SIZE,JOBQ_GATE_TIMEOUT_S, andJOBQ_GATE_REQUEUE_DELAY_S.