Skip to content

v0.16.8: Decouple server from dictating summary structure#373

Open
rogermt wants to merge 1 commit intomainfrom
v0.16.8
Open

v0.16.8: Decouple server from dictating summary structure#373
rogermt wants to merge 1 commit intomainfrom
v0.16.8

Conversation

@rogermt
Copy link
Copy Markdown
Owner

@rogermt rogermt commented Mar 25, 2026

Summary

v0.16.8: Decouple server from dictating summary structure.

  • Worker now uses plugin-provided summary when available
  • Falls back to derive_video_summary() for legacy plugins
  • API route reads pre-computed summary from job.summary
  • API no longer parses results JSON, just checks file existence
  • Tests updated to set job.summary for completed jobs

This allows plugins to define their own summary metrics without server changes.

TEST-CHANGE Justification

The following test changes are included in this PR:

  • Tighten edge-timeout assertions to avoid false positives - Improves test reliability by removing flaky timing assumptions
  • Improve test coverage and fix test issues - Adds missing test cases and fixes existing test failures
  • Add tests for websocket_router and database.py - Expands coverage for critical realtime and persistence modules
  • Add pytest markers and refactor test paths - Standardizes test organization for better CI integration
  • Fix test output_path to include job_id UUID - Ensures unique output paths per job to prevent file collisions
  • Add tests to increase coverage - Additional coverage for core functionality

These test changes are required to validate the summary refactor and ensure no regressions.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed HTTP 500 errors that occurred when processing video jobs with invalid result files. The system now gracefully handles such cases and returns valid results.
  • Improvements

    • Enhanced the reliability of video summary data retrieval during job completion.

v0.16.8: Decouple server from dictating summary structure.

- Worker now uses plugin-provided summary when available
- Falls back to derive_video_summary() for legacy plugins
- API route reads pre-computed summary from job.summary
- API no longer parses results JSON, just checks file existence
- Tests updated to set job.summary for completed jobs

This allows plugins to define their own summary metrics
without server changes.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 25, 2026

Walkthrough

The PR transitions video summary generation from on-demand computation to plugin-provided, pre-computed summaries stored in the database. The API endpoint now retrieves pre-computed summaries from the job.summary column instead of deriving them from results files, with a fallback mechanism in the worker for backwards compatibility.

Changes

Cohort / File(s) Summary
API Summary Retrieval
server/app/api_routes/routes/jobs.py
Removed derive_video_summary invocation and import. For completed jobs, the endpoint now loads pre-computed summary from job.summary column via json.loads() instead of parsing results file contents. Verifies results artifact existence without raising parsing errors.
Worker Summary Generation
server/app/workers/worker.py
Modified job finalization in both async and synchronous pipelines to prefer plugin-provided summaries via output_data.get("summary"), with fallback to derive_video_summary() if absent. Fixed Ray completion log to use job_id parameter instead of DB object field.
API Route Tests
server/tests/api/routes/test_jobs_unified.py
Updated job setup in multiple test cases to include pre-computed summary field on Job records. Modified test_get_job_results_invalid_json to expect HTTP 200 with result_url and non-null summary rather than HTTP 500, reflecting that the API no longer parses results JSON for summary generation.
Worker Tests
server/tests/app/workers/test_worker.py
Updated test_worker_stores_summary_on_completion to mock plugin tool returning top-level summary object with frame_count, detection_count, and classes. Assertions verify persisted summary reflects plugin data and validate class list ordering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

refactor

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarises the main change: decoupling the server from dictating summary structure by allowing plugins to define their own summary metrics.
Description check ✅ Passed The description covers the summary section and test change justifications, though the test justifications appear somewhat generic and disconnected from the specific code changes shown in the raw summary.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v0.16.8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
server/app/workers/worker.py (1)

994-999: Extract summary selection into a shared helper.

This contract now exists in both _finalize_job() and _execute_pipeline(). Any future tweak to fallback rules or summary validation has to be patched twice, which is easy to miss.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/app/workers/worker.py` around lines 994 - 999, Extract the duplicated
summary-selection logic into a single helper (e.g., get_summary_dict or
select_summary) and replace the inline code in both _finalize_job() and
_execute_pipeline() with a call to that helper; the helper should accept
output_data, return the summary dict (use output_data.get("summary") and
fallback to derive_video_summary(output_data) exactly as before), and be placed
at module scope so both _finalize_job and _execute_pipeline can import/use it to
ensure a single source of truth for summary selection and future changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/app/api_routes/routes/jobs.py`:
- Around line 235-242: The code calls storage.load_file(job.output_path) solely
to verify the artifact exists, which with the S3 backend downloads the full
object; replace this with a lightweight existence/HEAD check (e.g.,
storage.exists or storage.head) or remove the existence call and rely on
storage.get_signed_url(job.output_path) to surface errors, and update the
exception handling to raise HTTPException(status_code=404, ...) when the
lightweight check indicates missing artifact (referencing storage.load_file,
storage.get_signed_url, and job.output_path to locate the change).
- Around line 243-244: The detail endpoint currently does an unguarded
json.loads(job.summary) which raises on malformed JSON; wrap that call (where
summary = json.loads(job.summary) if job.summary else None) in a try/except
matching the behavior of list_jobs(): catch json.JSONDecodeError (and optionally
TypeError if needed) and set summary = None on error so a bad row does not raise
a 500.

In `@server/app/workers/worker.py`:
- Around line 679-684: The current logic calls derive_video_summary() for every
job when output_data lacks a summary; change it to only run the video-specific
fallback for video jobs by guarding the call with a job-type check (e.g., if
job.type == "video" or job.kind == "video" per your model), so: read
summary_dict = output_data.get("summary'); if not summary_dict and job indicates
video then set summary_dict = derive_video_summary(output_data); otherwise leave
summary_dict as-is (including empty dict) and then set job.summary =
json.dumps(summary_dict); ensure you reference the existing variables/functions
summary_dict, output_data, derive_video_summary, and job.summary when making
this change.

---

Nitpick comments:
In `@server/app/workers/worker.py`:
- Around line 994-999: Extract the duplicated summary-selection logic into a
single helper (e.g., get_summary_dict or select_summary) and replace the inline
code in both _finalize_job() and _execute_pipeline() with a call to that helper;
the helper should accept output_data, return the summary dict (use
output_data.get("summary") and fallback to derive_video_summary(output_data)
exactly as before), and be placed at module scope so both _finalize_job and
_execute_pipeline can import/use it to ensure a single source of truth for
summary selection and future changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 35d07b29-557c-46ce-b19a-6d70c6f6bd6e

📥 Commits

Reviewing files that changed from the base of the PR and between 248f212 and 1e6c075.

📒 Files selected for processing (4)
  • server/app/api_routes/routes/jobs.py
  • server/app/workers/worker.py
  • server/tests/api/routes/test_jobs_unified.py
  • server/tests/app/workers/test_worker.py

Comment on lines +235 to 242
# Verify results file exists
try:
results_path = job.output_path
file_path = storage.load_file(results_path)
with open(file_path, "r") as f:
results = json.load(f)
storage.load_file(job.output_path)
except FileNotFoundError as err:
raise HTTPException(status_code=404, detail="Results file not found") from err
except json.JSONDecodeError as err:
raise HTTPException(status_code=500, detail="Invalid results file") from err

# Return result_url and summary for all completed jobs
result_url = storage.get_signed_url(job.output_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid load_file() just to prove the artefact exists.

With the S3 backend, load_file() downloads the whole object to a temp file. Here that path is discarded, so every GET /v1/jobs/{job_id} can now trigger a full download and leak a temp file on the polling path. Please use a lightweight exists/HEAD-style storage call instead, or make the signed-URL step the only storage round-trip here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/app/api_routes/routes/jobs.py` around lines 235 - 242, The code calls
storage.load_file(job.output_path) solely to verify the artifact exists, which
with the S3 backend downloads the full object; replace this with a lightweight
existence/HEAD check (e.g., storage.exists or storage.head) or remove the
existence call and rely on storage.get_signed_url(job.output_path) to surface
errors, and update the exception handling to raise
HTTPException(status_code=404, ...) when the lightweight check indicates missing
artifact (referencing storage.load_file, storage.get_signed_url, and
job.output_path to locate the change).

Comment on lines +243 to +244
# v0.16.8: Use pre-computed summary from job.summary (set by worker)
summary = json.loads(job.summary) if job.summary else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle malformed job.summary the same way as list_jobs().

This json.loads() is unguarded, so a bad row now turns the detail endpoint into a 500 even though the list endpoint already falls back to summary=None on JSONDecodeError.

🛡️ Small hardening patch
-    summary = json.loads(job.summary) if job.summary else None
+    try:
+        summary = json.loads(job.summary) if job.summary else None
+    except json.JSONDecodeError:
+        summary = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/app/api_routes/routes/jobs.py` around lines 243 - 244, The detail
endpoint currently does an unguarded json.loads(job.summary) which raises on
malformed JSON; wrap that call (where summary = json.loads(job.summary) if
job.summary else None) in a try/except matching the behavior of list_jobs():
catch json.JSONDecodeError (and optionally TypeError if needed) and set summary
= None on error so a bad row does not raise a 500.

Comment on lines +679 to 684
# v0.16.8: Use plugin-provided summary if available (decoupled from server)
summary_dict = output_data.get("summary")
if not summary_dict:
# Fallback for plugins that don't provide summary
summary_dict = derive_video_summary(output_data)
job.summary = json.dumps(summary_dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only derive a fallback summary for video jobs.

derive_video_summary() is video-specific, but this block now runs for every completed job. For a legacy image plugin that does not emit summary, or one that deliberately returns {}, we'll persist a bogus {"frame_count": 0, ...} payload into job.summary, and /v1/jobs/{job_id} will expose that as if it were real image metadata.

🔧 Minimal fix
-            summary_dict = output_data.get("summary")
-            if not summary_dict:
-                # Fallback for plugins that don't provide summary
-                summary_dict = derive_video_summary(output_data)
-            job.summary = json.dumps(summary_dict)
+            summary_dict = output_data.get("summary")
+            if summary_dict is None and job.job_type in ("video", "video_multi"):
+                # Fallback for legacy video plugins only
+                summary_dict = derive_video_summary(output_data)
+            job.summary = json.dumps(summary_dict) if summary_dict is not None else None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# v0.16.8: Use plugin-provided summary if available (decoupled from server)
summary_dict = output_data.get("summary")
if not summary_dict:
# Fallback for plugins that don't provide summary
summary_dict = derive_video_summary(output_data)
job.summary = json.dumps(summary_dict)
# v0.16.8: Use plugin-provided summary if available (decoupled from server)
summary_dict = output_data.get("summary")
if summary_dict is None and job.job_type in ("video", "video_multi"):
# Fallback for legacy video plugins only
summary_dict = derive_video_summary(output_data)
job.summary = json.dumps(summary_dict) if summary_dict is not None else None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/app/workers/worker.py` around lines 679 - 684, The current logic calls
derive_video_summary() for every job when output_data lacks a summary; change it
to only run the video-specific fallback for video jobs by guarding the call with
a job-type check (e.g., if job.type == "video" or job.kind == "video" per your
model), so: read summary_dict = output_data.get("summary'); if not summary_dict
and job indicates video then set summary_dict =
derive_video_summary(output_data); otherwise leave summary_dict as-is (including
empty dict) and then set job.summary = json.dumps(summary_dict); ensure you
reference the existing variables/functions summary_dict, output_data,
derive_video_summary, and job.summary when making this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant