Replies: 1 comment
-
|
✦ My Analysis of the CodeRabbit Finding Valid concern - this is a real tradeoff. The CodeRabbit bot correctly identifies that num_gpus=0 removes Ray's However, there are mitigating factors:
The root problem:
Recommended approach (defer to Issue #314): This finding is essentially Issue #314 which I already created. The solution should be:
For now:
✦ Do you want me to:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
server/app/workers/ray_actors.py
Comment on lines +30 to +33
No GPU requirement: Allow scheduling on CPU-only clusters.
Plugins can detect and use GPU at runtime via CUDA availability.
This fixes Issue #301: "No available node types can fulfill resource request {'GPU': 1.0}"
@ray.remote(num_gpus=0)
⚠️ Potential issue | 🟠 Major
@coderabbitai
coderabbitai bot
25 minutes ago
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Ray decorators with GPU accounting =="
rg -n --type=py '@ray.remote(num_gpus=' server/app
echo
echo "== Spawn sites for StreamingToolActor and execute_pipeline_remote =="
rg -n -C2 --type=py 'StreamingToolActor.remote(|execute_pipeline_remote.remote(' server/app
echo
echo "== Plugin validate methods =="
rg -n -C3 --type=py 'def validate(' server/app
echo
echo "== CUDA-specific model placement hints =="
rg -n -C3 --type=py 'torch.cuda|.to("cuda"|device\s*=\s*"cuda"|torch.device("cuda"' server/app
Repository: rogermt/forgesyte
Length of output: 5913
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Plugin implementations: Does validate() load models? =="
find server/app/plugins -name "*.py" -type f | head -20 | while read f; do
if rg -q 'def validate' "$f"; then
echo "File: $f"
rg -A20 'def validate' "$f" | head -25
echo "---"
fi
done
echo
echo "== Concurrency controls: semaphore, lock, max_tasks =="
rg -n --type=py 'semaphore|asyncio.Lock|Semaphore|max_task' server/app/workers server/app/services | head -30
echo
echo "== Ray cluster config or resource setup =="
fd -type f '(ray_config|cluster|resources)' server/app
echo
echo "== validate() call sites and context =="
rg -n -B3 -A5 '.validate(' server/app/workers server/app/services | head -50
Repository: rogermt/forgesyte
Length of output: 1629
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Concrete plugin subclasses and validate() overrides =="
find server/app/plugins -name "*.py" -type f | while read f; do
if grep -q 'class.*BasePlugin|class.*Plugin' "$f" 2>/dev/null; then
echo "=== $f ==="
rg -A30 'def validate(' "$f"
fi
done
echo
echo "== Ray actor/task options and resource configuration =="
rg -n -B5 -A10 '@ray.remote' server/app/workers/ray_actors.py server/app/ray_tasks.py
echo
echo "== Actor/task execution: check for concurrent limits =="
rg -n -C5 'active_actors|max_concurrent|num_tasks' server/app/services/vision_analysis.py
echo
echo "== Full context of execute_pipeline_remote and StreamingToolActor creation =="
rg -n -B10 -A10 'def execute_pipeline_remote' server/app/ray_tasks.py | head -50
Repository: rogermt/forgesyte
Length of output: 565
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== All plugin files with classes =="
find server/app/plugins -name "*.py" -type f -exec grep -l 'class.*Plugin' {} ;
echo
echo "== Plugin directory structure =="
find server/app/plugins -type f -name "*.py" | head -20
echo
echo "== Full execute_pipeline_remote definition =="
sed -n '90,130p' server/app/ray_tasks.py
echo
echo "== Full StreamingToolActor definition =="
sed -n '30,60p' server/app/workers/ray_actors.py
echo
echo "== vision_analysis.py actor creation context =="
sed -n '100,125p' server/app/services/vision_analysis.py
echo
echo "== Check for Ray task/actor options: num_cpus, max_concurrency, resources =="
rg -n 'num_cpus|max_concurrency|resources|scheduling_strategy' server/app/workers server/app/ray_tasks.py
Repository: rogermt/forgesyte
Length of output: 5178
Restore GPU resource accounting to prevent concurrent model-loading OOM.
Setting num_gpus=0 fixes CPU-only scheduling but removes Ray's only constraint on GPU memory. Since plugin.validate() can load models into VRAM, and StreamingToolActor is created per client/tool (vision_analysis.py:114) whilst execute_pipeline_remote is dispatched per job (worker.py:518), multiple actors and tasks will schedule concurrently and all attempt GPU preload without any admission control—increasing OOM risk significantly under load.
To maintain CPU-friendly defaults, add a plugin-specific option (e.g. .options(num_gpus=...) for GPU-heavy plugins), a custom Ray resource type, or an application-level semaphore around model preloading.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
@server/app/workers/ray_actors.pyaround lines 30 - 33, The current@ray.remote(num_gpus=0) decorator removes GPU accounting and allows concurrentGPU model loads; revert/remove the fixed num_gpus=0 and instead enforce GPU
admission per-plugin or per-actor: update actor/task creation sites (e.g., where
StreamingToolActor is instantiated in vision_analysis.py and where
execute_pipeline_remote is dispatched in worker.py) to call
.options(num_gpus=...) for GPU-heavy plugins based on plugin.validate() or
plugin metadata, or register/require a custom Ray resource (e.g., "gpu_model")
and allocate it via .options(resources={...}); alternatively implement an
application-level semaphore around plugin.validate()/model preload to serialize
GPU loads. Ensure references to the ray.remote decorated actor/class are
preserved and that plugin metadata drives the chosen num_gpus/resource
allocation
Beta Was this translation helpful? Give feedback.
All reactions