v0.16.8: Decouple server from dictating summary structure by rogermt · Pull Request #373 · rogermt/forgesyte

rogermt · 2026-03-25T04:30:56Z

Summary

v0.16.8: Decouple server from dictating summary structure.

Worker now uses plugin-provided summary when available
Falls back to derive_video_summary() for legacy plugins
API route reads pre-computed summary from job.summary
API no longer parses results JSON, just checks file existence
Tests updated to set job.summary for completed jobs

This allows plugins to define their own summary metrics without server changes.

TEST-CHANGE Justification

The following test changes are included in this PR:

Tighten edge-timeout assertions to avoid false positives - Improves test reliability by removing flaky timing assumptions
Improve test coverage and fix test issues - Adds missing test cases and fixes existing test failures
Add tests for websocket_router and database.py - Expands coverage for critical realtime and persistence modules
Add pytest markers and refactor test paths - Standardizes test organization for better CI integration
Fix test output_path to include job_id UUID - Ensures unique output paths per job to prevent file collisions
Add tests to increase coverage - Additional coverage for core functionality

These test changes are required to validate the summary refactor and ensure no regressions.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed HTTP 500 errors that occurred when processing video jobs with invalid result files. The system now gracefully handles such cases and returns valid results.
Improvements
- Enhanced the reliability of video summary data retrieval during job completion.

v0.16.8: Decouple server from dictating summary structure. - Worker now uses plugin-provided summary when available - Falls back to derive_video_summary() for legacy plugins - API route reads pre-computed summary from job.summary - API no longer parses results JSON, just checks file existence - Tests updated to set job.summary for completed jobs This allows plugins to define their own summary metrics without server changes.

coderabbitai · 2026-03-25T04:31:13Z

Walkthrough

The PR transitions video summary generation from on-demand computation to plugin-provided, pre-computed summaries stored in the database. The API endpoint now retrieves pre-computed summaries from the job.summary column instead of deriving them from results files, with a fallback mechanism in the worker for backwards compatibility.

Changes

Cohort / File(s)	Summary
API Summary Retrieval `server/app/api_routes/routes/jobs.py`	Removed `derive_video_summary` invocation and import. For completed jobs, the endpoint now loads pre-computed summary from `job.summary` column via `json.loads()` instead of parsing results file contents. Verifies results artifact existence without raising parsing errors.
Worker Summary Generation `server/app/workers/worker.py`	Modified job finalization in both async and synchronous pipelines to prefer plugin-provided summaries via `output_data.get("summary")`, with fallback to `derive_video_summary()` if absent. Fixed Ray completion log to use `job_id` parameter instead of DB object field.
API Route Tests `server/tests/api/routes/test_jobs_unified.py`	Updated job setup in multiple test cases to include pre-computed `summary` field on Job records. Modified `test_get_job_results_invalid_json` to expect HTTP 200 with `result_url` and non-null `summary` rather than HTTP 500, reflecting that the API no longer parses results JSON for summary generation.
Worker Tests `server/tests/app/workers/test_worker.py`	Updated `test_worker_stores_summary_on_completion` to mock plugin tool returning top-level `summary` object with `frame_count`, `detection_count`, and `classes`. Assertions verify persisted summary reflects plugin data and validate class list ordering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

refactor

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarises the main change: decoupling the server from dictating summary structure by allowing plugins to define their own summary metrics.
Description check	✅ Passed	The description covers the summary section and test change justifications, though the test justifications appear somewhat generic and disconnected from the specific code changes shown in the raw summary.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch v0.16.8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

server/app/workers/worker.py (1)

994-999: Extract summary selection into a shared helper.

This contract now exists in both _finalize_job() and _execute_pipeline(). Any future tweak to fallback rules or summary validation has to be patched twice, which is easy to miss.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@server/app/workers/worker.py` around lines 994 - 999, Extract the duplicated
summary-selection logic into a single helper (e.g., get_summary_dict or
select_summary) and replace the inline code in both _finalize_job() and
_execute_pipeline() with a call to that helper; the helper should accept
output_data, return the summary dict (use output_data.get("summary") and
fallback to derive_video_summary(output_data) exactly as before), and be placed
at module scope so both _finalize_job and _execute_pipeline can import/use it to
ensure a single source of truth for summary selection and future changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/app/api_routes/routes/jobs.py`:
- Around line 235-242: The code calls storage.load_file(job.output_path) solely
to verify the artifact exists, which with the S3 backend downloads the full
object; replace this with a lightweight existence/HEAD check (e.g.,
storage.exists or storage.head) or remove the existence call and rely on
storage.get_signed_url(job.output_path) to surface errors, and update the
exception handling to raise HTTPException(status_code=404, ...) when the
lightweight check indicates missing artifact (referencing storage.load_file,
storage.get_signed_url, and job.output_path to locate the change).
- Around line 243-244: The detail endpoint currently does an unguarded
json.loads(job.summary) which raises on malformed JSON; wrap that call (where
summary = json.loads(job.summary) if job.summary else None) in a try/except
matching the behavior of list_jobs(): catch json.JSONDecodeError (and optionally
TypeError if needed) and set summary = None on error so a bad row does not raise
a 500.

In `@server/app/workers/worker.py`:
- Around line 679-684: The current logic calls derive_video_summary() for every
job when output_data lacks a summary; change it to only run the video-specific
fallback for video jobs by guarding the call with a job-type check (e.g., if
job.type == "video" or job.kind == "video" per your model), so: read
summary_dict = output_data.get("summary'); if not summary_dict and job indicates
video then set summary_dict = derive_video_summary(output_data); otherwise leave
summary_dict as-is (including empty dict) and then set job.summary =
json.dumps(summary_dict); ensure you reference the existing variables/functions
summary_dict, output_data, derive_video_summary, and job.summary when making
this change.

---

Nitpick comments:
In `@server/app/workers/worker.py`:
- Around line 994-999: Extract the duplicated summary-selection logic into a
single helper (e.g., get_summary_dict or select_summary) and replace the inline
code in both _finalize_job() and _execute_pipeline() with a call to that helper;
the helper should accept output_data, return the summary dict (use
output_data.get("summary") and fallback to derive_video_summary(output_data)
exactly as before), and be placed at module scope so both _finalize_job and
_execute_pipeline can import/use it to ensure a single source of truth for
summary selection and future changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 35d07b29-557c-46ce-b19a-6d70c6f6bd6e

📥 Commits

Reviewing files that changed from the base of the PR and between 248f212 and 1e6c075.

📒 Files selected for processing (4)

server/app/api_routes/routes/jobs.py
server/app/workers/worker.py
server/tests/api/routes/test_jobs_unified.py
server/tests/app/workers/test_worker.py

coderabbitai · 2026-03-25T04:40:03Z

+    # Verify results file exists
    try:
-        results_path = job.output_path
-        file_path = storage.load_file(results_path)
-        with open(file_path, "r") as f:
-            results = json.load(f)
+        storage.load_file(job.output_path)
    except FileNotFoundError as err:
        raise HTTPException(status_code=404, detail="Results file not found") from err
-    except json.JSONDecodeError as err:
-        raise HTTPException(status_code=500, detail="Invalid results file") from err

    # Return result_url and summary for all completed jobs
    result_url = storage.get_signed_url(job.output_path)


⚠️ Potential issue | 🟠 Major

Avoid load_file() just to prove the artefact exists.

With the S3 backend, load_file() downloads the whole object to a temp file. Here that path is discarded, so every GET /v1/jobs/{job_id} can now trigger a full download and leak a temp file on the polling path. Please use a lightweight exists/HEAD-style storage call instead, or make the signed-URL step the only storage round-trip here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/app/api_routes/routes/jobs.py` around lines 235 - 242, The code calls storage.load_file(job.output_path) solely to verify the artifact exists, which with the S3 backend downloads the full object; replace this with a lightweight existence/HEAD check (e.g., storage.exists or storage.head) or remove the existence call and rely on storage.get_signed_url(job.output_path) to surface errors, and update the exception handling to raise HTTPException(status_code=404, ...) when the lightweight check indicates missing artifact (referencing storage.load_file, storage.get_signed_url, and job.output_path to locate the change).

coderabbitai · 2026-03-25T04:40:03Z

+    # v0.16.8: Use pre-computed summary from job.summary (set by worker)
+    summary = json.loads(job.summary) if job.summary else None


⚠️ Potential issue | 🟠 Major

Handle malformed job.summary the same way as list_jobs().

This json.loads() is unguarded, so a bad row now turns the detail endpoint into a 500 even though the list endpoint already falls back to summary=None on JSONDecodeError.

🛡️ Small hardening patch

- summary = json.loads(job.summary) if job.summary else None + try: + summary = json.loads(job.summary) if job.summary else None + except json.JSONDecodeError: + summary = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/app/api_routes/routes/jobs.py` around lines 243 - 244, The detail endpoint currently does an unguarded json.loads(job.summary) which raises on malformed JSON; wrap that call (where summary = json.loads(job.summary) if job.summary else None) in a try/except matching the behavior of list_jobs(): catch json.JSONDecodeError (and optionally TypeError if needed) and set summary = None on error so a bad row does not raise a 500.

coderabbitai · 2026-03-25T04:40:03Z

+            # v0.16.8: Use plugin-provided summary if available (decoupled from server)
+            summary_dict = output_data.get("summary")
+            if not summary_dict:
+                # Fallback for plugins that don't provide summary
+                summary_dict = derive_video_summary(output_data)
            job.summary = json.dumps(summary_dict)


⚠️ Potential issue | 🟠 Major

Only derive a fallback summary for video jobs.

derive_video_summary() is video-specific, but this block now runs for every completed job. For a legacy image plugin that does not emit summary, or one that deliberately returns {}, we'll persist a bogus {"frame_count": 0, ...} payload into job.summary, and /v1/jobs/{job_id} will expose that as if it were real image metadata.

🔧 Minimal fix

- summary_dict = output_data.get("summary") - if not summary_dict: - # Fallback for plugins that don't provide summary - summary_dict = derive_video_summary(output_data) - job.summary = json.dumps(summary_dict) + summary_dict = output_data.get("summary") + if summary_dict is None and job.job_type in ("video", "video_multi"): + # Fallback for legacy video plugins only + summary_dict = derive_video_summary(output_data) + job.summary = json.dumps(summary_dict) if summary_dict is not None else None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# v0.16.8: Use plugin-provided summary if available (decoupled from server)

summary_dict = output_data.get("summary")

if not summary_dict:

# Fallback for plugins that don't provide summary

summary_dict = derive_video_summary(output_data)

job.summary = json.dumps(summary_dict)

# v0.16.8: Use plugin-provided summary if available (decoupled from server)

summary_dict = output_data.get("summary")

if summary_dict is None and job.job_type in ("video", "video_multi"):

# Fallback for legacy video plugins only

summary_dict = derive_video_summary(output_data)

job.summary = json.dumps(summary_dict) if summary_dict is not None else None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/app/workers/worker.py` around lines 679 - 684, The current logic calls derive_video_summary() for every job when output_data lacks a summary; change it to only run the video-specific fallback for video jobs by guarding the call with a job-type check (e.g., if job.type == "video" or job.kind == "video" per your model), so: read summary_dict = output_data.get("summary'); if not summary_dict and job indicates video then set summary_dict = derive_video_summary(output_data); otherwise leave summary_dict as-is (including empty dict) and then set job.summary = json.dumps(summary_dict); ensure you reference the existing variables/functions summary_dict, output_data, derive_video_summary, and job.summary when making this change.

coderabbitai Bot reviewed Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.16.8: Decouple server from dictating summary structure#373

v0.16.8: Decouple server from dictating summary structure#373
rogermt wants to merge 1 commit intomainfrom
v0.16.8

rogermt commented Mar 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Uh oh!

coderabbitai Bot Mar 25, 2026

Uh oh!

coderabbitai Bot Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# v0.16.8: Use pre-computed summary from job.summary (set by worker)
		summary = json.loads(job.summary) if job.summary else None

Conversation

rogermt commented Mar 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

TEST-CHANGE Justification

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rogermt commented Mar 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 25, 2026 •

edited

Loading