Conversation
v0.16.8: Decouple server from dictating summary structure. - Worker now uses plugin-provided summary when available - Falls back to derive_video_summary() for legacy plugins - API route reads pre-computed summary from job.summary - API no longer parses results JSON, just checks file existence - Tests updated to set job.summary for completed jobs This allows plugins to define their own summary metrics without server changes.
WalkthroughThe PR transitions video summary generation from on-demand computation to plugin-provided, pre-computed summaries stored in the database. The API endpoint now retrieves pre-computed summaries from the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
server/app/workers/worker.py (1)
994-999: Extract summary selection into a shared helper.This contract now exists in both
_finalize_job()and_execute_pipeline(). Any future tweak to fallback rules or summary validation has to be patched twice, which is easy to miss.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/app/workers/worker.py` around lines 994 - 999, Extract the duplicated summary-selection logic into a single helper (e.g., get_summary_dict or select_summary) and replace the inline code in both _finalize_job() and _execute_pipeline() with a call to that helper; the helper should accept output_data, return the summary dict (use output_data.get("summary") and fallback to derive_video_summary(output_data) exactly as before), and be placed at module scope so both _finalize_job and _execute_pipeline can import/use it to ensure a single source of truth for summary selection and future changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/app/api_routes/routes/jobs.py`:
- Around line 235-242: The code calls storage.load_file(job.output_path) solely
to verify the artifact exists, which with the S3 backend downloads the full
object; replace this with a lightweight existence/HEAD check (e.g.,
storage.exists or storage.head) or remove the existence call and rely on
storage.get_signed_url(job.output_path) to surface errors, and update the
exception handling to raise HTTPException(status_code=404, ...) when the
lightweight check indicates missing artifact (referencing storage.load_file,
storage.get_signed_url, and job.output_path to locate the change).
- Around line 243-244: The detail endpoint currently does an unguarded
json.loads(job.summary) which raises on malformed JSON; wrap that call (where
summary = json.loads(job.summary) if job.summary else None) in a try/except
matching the behavior of list_jobs(): catch json.JSONDecodeError (and optionally
TypeError if needed) and set summary = None on error so a bad row does not raise
a 500.
In `@server/app/workers/worker.py`:
- Around line 679-684: The current logic calls derive_video_summary() for every
job when output_data lacks a summary; change it to only run the video-specific
fallback for video jobs by guarding the call with a job-type check (e.g., if
job.type == "video" or job.kind == "video" per your model), so: read
summary_dict = output_data.get("summary'); if not summary_dict and job indicates
video then set summary_dict = derive_video_summary(output_data); otherwise leave
summary_dict as-is (including empty dict) and then set job.summary =
json.dumps(summary_dict); ensure you reference the existing variables/functions
summary_dict, output_data, derive_video_summary, and job.summary when making
this change.
---
Nitpick comments:
In `@server/app/workers/worker.py`:
- Around line 994-999: Extract the duplicated summary-selection logic into a
single helper (e.g., get_summary_dict or select_summary) and replace the inline
code in both _finalize_job() and _execute_pipeline() with a call to that helper;
the helper should accept output_data, return the summary dict (use
output_data.get("summary") and fallback to derive_video_summary(output_data)
exactly as before), and be placed at module scope so both _finalize_job and
_execute_pipeline can import/use it to ensure a single source of truth for
summary selection and future changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 35d07b29-557c-46ce-b19a-6d70c6f6bd6e
📒 Files selected for processing (4)
server/app/api_routes/routes/jobs.pyserver/app/workers/worker.pyserver/tests/api/routes/test_jobs_unified.pyserver/tests/app/workers/test_worker.py
| # Verify results file exists | ||
| try: | ||
| results_path = job.output_path | ||
| file_path = storage.load_file(results_path) | ||
| with open(file_path, "r") as f: | ||
| results = json.load(f) | ||
| storage.load_file(job.output_path) | ||
| except FileNotFoundError as err: | ||
| raise HTTPException(status_code=404, detail="Results file not found") from err | ||
| except json.JSONDecodeError as err: | ||
| raise HTTPException(status_code=500, detail="Invalid results file") from err | ||
|
|
||
| # Return result_url and summary for all completed jobs | ||
| result_url = storage.get_signed_url(job.output_path) |
There was a problem hiding this comment.
Avoid load_file() just to prove the artefact exists.
With the S3 backend, load_file() downloads the whole object to a temp file. Here that path is discarded, so every GET /v1/jobs/{job_id} can now trigger a full download and leak a temp file on the polling path. Please use a lightweight exists/HEAD-style storage call instead, or make the signed-URL step the only storage round-trip here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/app/api_routes/routes/jobs.py` around lines 235 - 242, The code calls
storage.load_file(job.output_path) solely to verify the artifact exists, which
with the S3 backend downloads the full object; replace this with a lightweight
existence/HEAD check (e.g., storage.exists or storage.head) or remove the
existence call and rely on storage.get_signed_url(job.output_path) to surface
errors, and update the exception handling to raise
HTTPException(status_code=404, ...) when the lightweight check indicates missing
artifact (referencing storage.load_file, storage.get_signed_url, and
job.output_path to locate the change).
| # v0.16.8: Use pre-computed summary from job.summary (set by worker) | ||
| summary = json.loads(job.summary) if job.summary else None |
There was a problem hiding this comment.
Handle malformed job.summary the same way as list_jobs().
This json.loads() is unguarded, so a bad row now turns the detail endpoint into a 500 even though the list endpoint already falls back to summary=None on JSONDecodeError.
🛡️ Small hardening patch
- summary = json.loads(job.summary) if job.summary else None
+ try:
+ summary = json.loads(job.summary) if job.summary else None
+ except json.JSONDecodeError:
+ summary = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/app/api_routes/routes/jobs.py` around lines 243 - 244, The detail
endpoint currently does an unguarded json.loads(job.summary) which raises on
malformed JSON; wrap that call (where summary = json.loads(job.summary) if
job.summary else None) in a try/except matching the behavior of list_jobs():
catch json.JSONDecodeError (and optionally TypeError if needed) and set summary
= None on error so a bad row does not raise a 500.
| # v0.16.8: Use plugin-provided summary if available (decoupled from server) | ||
| summary_dict = output_data.get("summary") | ||
| if not summary_dict: | ||
| # Fallback for plugins that don't provide summary | ||
| summary_dict = derive_video_summary(output_data) | ||
| job.summary = json.dumps(summary_dict) |
There was a problem hiding this comment.
Only derive a fallback summary for video jobs.
derive_video_summary() is video-specific, but this block now runs for every completed job. For a legacy image plugin that does not emit summary, or one that deliberately returns {}, we'll persist a bogus {"frame_count": 0, ...} payload into job.summary, and /v1/jobs/{job_id} will expose that as if it were real image metadata.
🔧 Minimal fix
- summary_dict = output_data.get("summary")
- if not summary_dict:
- # Fallback for plugins that don't provide summary
- summary_dict = derive_video_summary(output_data)
- job.summary = json.dumps(summary_dict)
+ summary_dict = output_data.get("summary")
+ if summary_dict is None and job.job_type in ("video", "video_multi"):
+ # Fallback for legacy video plugins only
+ summary_dict = derive_video_summary(output_data)
+ job.summary = json.dumps(summary_dict) if summary_dict is not None else None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # v0.16.8: Use plugin-provided summary if available (decoupled from server) | |
| summary_dict = output_data.get("summary") | |
| if not summary_dict: | |
| # Fallback for plugins that don't provide summary | |
| summary_dict = derive_video_summary(output_data) | |
| job.summary = json.dumps(summary_dict) | |
| # v0.16.8: Use plugin-provided summary if available (decoupled from server) | |
| summary_dict = output_data.get("summary") | |
| if summary_dict is None and job.job_type in ("video", "video_multi"): | |
| # Fallback for legacy video plugins only | |
| summary_dict = derive_video_summary(output_data) | |
| job.summary = json.dumps(summary_dict) if summary_dict is not None else None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/app/workers/worker.py` around lines 679 - 684, The current logic calls
derive_video_summary() for every job when output_data lacks a summary; change it
to only run the video-specific fallback for video jobs by guarding the call with
a job-type check (e.g., if job.type == "video" or job.kind == "video" per your
model), so: read summary_dict = output_data.get("summary'); if not summary_dict
and job indicates video then set summary_dict =
derive_video_summary(output_data); otherwise leave summary_dict as-is (including
empty dict) and then set job.summary = json.dumps(summary_dict); ensure you
reference the existing variables/functions summary_dict, output_data,
derive_video_summary, and job.summary when making this change.
Summary
v0.16.8: Decouple server from dictating summary structure.
This allows plugins to define their own summary metrics without server changes.
TEST-CHANGE Justification
The following test changes are included in this PR:
These test changes are required to validate the summary refactor and ensure no regressions.
Summary by CodeRabbit
Release Notes
Bug Fixes
Improvements