Skip to content

fix(models): reject false-complete downloads that wrote no data (#1548)#1562

Merged
jaylfc merged 3 commits into
devfrom
fix/download-false-complete
Jul 2, 2026
Merged

fix(models): reject false-complete downloads that wrote no data (#1548)#1562
jaylfc merged 3 commits into
devfrom
fix/download-false-complete

Conversation

@jaylfc

@jaylfc jaylfc commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Backend half of #1548 (confirmed on beta.17: the Models dialog fills the progress bar to 100 percent instantly and shows the model installed, but nothing is downloaded and nothing persists).

Root cause in download_manager.py: two paths marked a download complete without checking the file exists and is non-empty.

  • Torrent path (_download_with_fallback): set status=complete on any non-exception return from torrent.download(), with no dest validation.
  • HTTP path (_download): when no expected_sha256 was pinned (the common case), any response, including a 0-byte body or a 200 error page, fell through to status=complete.

Fix: one shared _validate_download (file exists and non-empty, size matches known total_bytes when set, SHA when provided; unlink the bad file and set status=error otherwise), run before marking complete on BOTH paths. Behaviour for the existing SHA-mismatch case is unchanged.

Tests: 0-byte HTTP body errors instead of completing; torrent success that wrote nothing / an empty file errors; a genuine non-empty download still completes. 34 pass, no regressions in the torrent/download suites.

Note: fixes the false-complete half of #1548 only; the Providers-error-state and empty-log-viewer halves are separate (the Logs app work covers the latter). Built by a subagent to an orchestrator spec; diff audited and re-verified in the main repo.

Summary by CodeRabbit

  • Bug Fixes
    • Improved download validation to prevent marking downloads as successful when the destination file is missing, empty, incomplete, or corrupted.
    • Downloads now verify expected size and/or checksum when available, and will remove invalid output instead of leaving a false success state.
  • Tests
    • Expanded Download Manager coverage for HTTP and torrent edge cases, including zero-byte and “no data written” scenarios, to ensure errors are correctly reported and files aren’t created.

…lete

Both the torrent and HTTP paths could mark a download complete without
checking that anything was actually written to dest. The torrent path
trusted torrent.download() returning cleanly; the HTTP path only
checked SHA256 when one was supplied, so an empty or truncated body
still passed. Add a shared validation step (file exists, size > 0, size
matches the known total when available, SHA256 matches when provided)
and run it before marking either path complete. On failure the task is
marked error and the bad file is removed, matching the existing
SHA-mismatch behavior.

Fixes the backend half of #1548.
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@jaylfc, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 49 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: be5d55d7-bf5a-478f-8f35-f553e1db134c

📥 Commits

Reviewing files that changed from the base of the PR and between c15c219 and fe5edd5.

📒 Files selected for processing (1)
  • tinyagentos/download_manager.py
📝 Walkthrough

Walkthrough

Adds shared finished-download validation to HTTP and torrent download paths, with task errors and file cleanup when a download produces no data or fails SHA/size checks. Tests cover empty-response and false-complete torrent cases.

Changes

Download validation

Layer / File(s) Summary
Validation helper
tinyagentos/download_manager.py
Adds _validate_download to check destination existence, non-empty size, optional byte count, and optional SHA-256 match.
HTTP download path integration
tinyagentos/download_manager.py, tests/test_download_manager.py
Uses _validate_download in the HTTP flow, adjusts total_bytes handling for encoded responses, and adds an empty-response regression test.
Torrent download path integration
tinyagentos/download_manager.py, tests/test_download_manager.py
Validates torrent downloads after completion, deletes invalid outputs, updates the success-path mock to write bytes, and adds no-data and empty-file regression tests.

Estimated code review effort: 2 (Simple) | ~15 minutes

Possibly related PRs

  • jaylfc/taOS#1241: Both PRs modify DownloadManager download validation logic and add related unit tests for _download and fallback scenarios.

Poem

A rabbit checked each byte with care,
No empty file left lurking there.
SHA and size, both matched with pride,
Torrent or HTTP, none can hide.
Hop, hop — validated, error-free! 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main fix: preventing downloads from being marked complete when they wrote no data.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/download-false-complete

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gitar-bot

gitar-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

Gitar is working

Gitar

Comment thread tinyagentos/download_manager.py Outdated
if expected_sha256:
digest = computed_sha256
if digest is None:
digest = hashlib.sha256(task.dest.read_bytes()).hexdigest()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Synchronous task.dest.read_bytes() inside an async-context helper will block the event loop while loading + hashing a (potentially multi-GB) model file. For the SHA fallback this runs after a long HTTP/torrent stream has already held the loop; the SHA256 path on the HTTP side correctly streams into sha.update(chunk) and only this fallback performs a second full read. Consider asyncio.to_thread(hashlib.sha256, task.dest.read_bytes()) (after reading bytes via aiofiles or a thread) so other tasks aren't starved.

Comment thread tinyagentos/download_manager.py Outdated
expected_sha256=expected_sha256,
progress_cb=_progress,
)
error = self._validate_download(task, expected_sha256)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: The HTTP path passes computed_sha256=sha.hexdigest() to skip the re-read, but the torrent path calls self._validate_download(task, expected_sha256) with no computed_sha256. When the torrent downloader already validated the SHA internally on a multi-GB payload, this re-hashes the entire file from disk in the validator's fallback branch, doubling I/O for the mismatch case. Either plumb the in-progress digest through torrent.download (same pattern as sha) or document that the torrent client must validate SHA itself so this fallback is unreachable.

Suggested change
error = self._validate_download(task, expected_sha256)
error = self._validate_download(task, expected_sha256, computed_sha256=getattr(task, "_computed_sha256", None))

@kilo-code-bot

kilo-code-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 0

Incremental Review Notes

This commit (fe5edd5) directly addresses the WARNING flagged on the previous incremental review (sync read_bytes() in the async-context _validate_download SHA fallback at tinyagentos/download_manager.py:120). The change correctly:

  • Converts _validate_download from sync to async (tinyagentos/download_manager.py:100).
  • Wraps the blocking hashlib.sha256(task.dest.read_bytes()).hexdigest() call in asyncio.to_thread(...) so a multi-GB model read no longer stalls the event loop (tinyagentos/download_manager.py:123-125).
  • Adds the matching await at both call sites (torrent: tinyagentos/download_manager.py:172; HTTP: tinyagentos/download_manager.py:228).

The remaining sync paths in the same function (task.dest.exists(), task.dest.stat().st_size) are cheap filesystem stats and do not warrant offloading. The previously-flagged WARNING is fully resolved; the earlier CR/SUGGESTION defects remain addressed by c15c219.

No new issues found on the changed lines in this incremental commit.

Files Reviewed (1 file)
  • tinyagentos/download_manager.py - 0 new issues (previously-flagged WARNING resolved)
Previous Review Summaries (2 snapshots, latest commit c15c219)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit c15c219)

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
tinyagentos/download_manager.py 120 Sync read_bytes() of a potentially multi-GB model inside an async helper blocks the event loop during the SHA fallback.
Files Reviewed (2 files)
  • tinyagentos/download_manager.py - 1 issue (unchanged from previous review)
  • tests/test_download_manager.py - 0 issues

Incremental Review Notes

This commit (c15c219) addresses the three previously-flagged findings on the changed lines:

  • Content-Length / Content-Encoding mismatch (was CodeRabbit line 116, now lines 213-217): The HTTP path now sets task.total_bytes only when Content-Length is present AND Content-Encoding is absent, so a gzip/br/deflate/zstd response no longer triggers a false "size mismatch" deletion. Valid fix.
  • Case-sensitive SHA comparison (was CodeRabbit line 124, now line 123): Both digest and expected_sha256 are lowercased before comparison. Valid fix.
  • Redundant full-file re-hash on the torrent path (was kilo SUGGESTION at line 161, now line 167): The torrent path now calls _validate_download(task) without expected_sha256, relying on torrent.download()'s internal SHA verification (confirmed at tinyagentos/torrent_downloader.py line ~245). Documented in the new comment. Acceptable resolution; SHA defense for the torrent path now lives entirely in the torrent module.

The still-active WARNING (sync read_bytes() at line 120) was not introduced by this commit and remains a latent issue in the _validate_download SHA-fallback branch.

Fix these issues in Kilo Cloud

Previous review (commit b87ba27)

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 1

The fix correctly closes the false-complete hole on both the torrent and HTTP paths via a shared validator, and the new tests cover 0-byte body, no-file torrent, and empty-file torrent variants. Two follow-ups worth considering:

Issue Details (click to expand)

WARNING

File Line Issue
tinyagentos/download_manager.py 120 Sync read_bytes() of a potentially multi-GB model inside an async helper blocks the event loop during the SHA fallback.

SUGGESTION

File Line Issue
tinyagentos/download_manager.py 161 Torrent path passes no computed_sha256, so a torrent SHA mismatch triggers a second full-file read+hash on disk.
Files Reviewed (2 files)
  • tinyagentos/download_manager.py - 2 issues
  • tests/test_download_manager.py - 0 issues

Fix these issues in Kilo Cloud


Reviewed by minimax-m3 · Input: 26.8K · Output: 2.3K · Cached: 167.4K

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/test_download_manager.py (1)

267-279: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Consider adding regression coverage for the gaps flagged in download_manager.py.

The current SHA and content-length tests only exercise lowercase hex digests and unencoded bodies, so they wouldn't catch a mixed-case expected_sha256 false-mismatch or a Content-Encoding: gzip response tripping the new size-mismatch check. Adding a test with an uppercase expected_sha256 and one with a content-encoding header (mocked decoded body shorter/longer than content-length) would pin down the correct behavior once those are fixed.

Also applies to: 338-347

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_download_manager.py` around lines 267 - 279, Add regression
coverage in test_download_manager for the two gaps in DownloadTask/_download:
one test should use an uppercase or mixed-case expected_sha256 and assert it
still matches the downloaded body instead of failing with a false SHA256
mismatch, and another should mock a response with a Content-Encoding header such
as gzip where the decoded body size differs from content-length to verify the
size check uses the correct post-decompression behavior. Keep the existing test
style around _make_async_context_manager_mock and _make_mock_client so the new
cases clearly exercise the fixed logic in _download.
tinyagentos/download_manager.py (1)

119-120: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Full-file read_bytes() re-hash duplicates work and loads the whole file into memory.

When computed_sha256 isn't supplied (torrent path), the digest is recomputed by loading the entire file into memory via read_bytes(). TorrentDownloader.download() already performs an equivalent full-file read internally for its own SHA check, so for large model downloads this doubles memory pressure and I/O for no added benefit.

♻️ Proposed fix: hash in chunks instead of loading the whole file
-            digest = computed_sha256
-            if digest is None:
-                digest = hashlib.sha256(task.dest.read_bytes()).hexdigest()
+            digest = computed_sha256
+            if digest is None:
+                hasher = hashlib.sha256()
+                with open(task.dest, "rb") as f:
+                    for chunk in iter(lambda: f.read(65536), b""):
+                        hasher.update(chunk)
+                digest = hasher.hexdigest()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tinyagentos/download_manager.py` around lines 119 - 120, In
DownloadManager.download, the fallback SHA256 path currently uses
task.dest.read_bytes(), which duplicates full-file I/O and memory use after
TorrentDownloader.download already validates the file. Replace that digest
computation with a chunked streaming hash using the existing task.dest path so
the file is never fully loaded into memory, and keep the logic inside the digest
None branch unchanged otherwise.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tinyagentos/download_manager.py`:
- Around line 113-116: The size check in the download validation logic is
comparing decoded content against Content-Length, which can falsely fail for
compressed responses. Update the download path in the method that validates
`task.dest` and `task.total_bytes` so that encoded responses either clear
`task.total_bytes` before verification or use the raw response stream instead of
`resp.aiter_bytes()`. Keep the existing empty-file and mismatch checks, but
ensure `size mismatch` is only triggered when the expected byte count is valid
for the actual bytes written.
- Around line 117-122: The SHA256 validation in download_manager.py is comparing
digests case-sensitively, so mixed-case variant["sha256"] values can be rejected
even when valid. Update the digest comparison in the expected_sha256 check to
normalize both the computed digest and the expected value to the same case, and
apply the same normalization in the later validation path around the follow-up
SHA256 check so both comparisons behave consistently.

---

Nitpick comments:
In `@tests/test_download_manager.py`:
- Around line 267-279: Add regression coverage in test_download_manager for the
two gaps in DownloadTask/_download: one test should use an uppercase or
mixed-case expected_sha256 and assert it still matches the downloaded body
instead of failing with a false SHA256 mismatch, and another should mock a
response with a Content-Encoding header such as gzip where the decoded body size
differs from content-length to verify the size check uses the correct
post-decompression behavior. Keep the existing test style around
_make_async_context_manager_mock and _make_mock_client so the new cases clearly
exercise the fixed logic in _download.

In `@tinyagentos/download_manager.py`:
- Around line 119-120: In DownloadManager.download, the fallback SHA256 path
currently uses task.dest.read_bytes(), which duplicates full-file I/O and memory
use after TorrentDownloader.download already validates the file. Replace that
digest computation with a chunked streaming hash using the existing task.dest
path so the file is never fully loaded into memory, and keep the logic inside
the digest None branch unchanged otherwise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 64594d19-abf5-4d02-901f-1927b71486fc

📥 Commits

Reviewing files that changed from the base of the PR and between 2b5659f and b87ba27.

📒 Files selected for processing (2)
  • tests/test_download_manager.py
  • tinyagentos/download_manager.py

Comment thread tinyagentos/download_manager.py
Comment thread tinyagentos/download_manager.py
jaylfc added 2 commits July 2, 2026 23:12
- MAJOR: do not treat Content-Length as the expected on-disk size for
  content-encoded (gzip/br/deflate/zstd) responses. httpx auto-decompresses,
  so the written file is larger than Content-Length and the size check would
  delete a valid download. Leave total_bytes unknown and rely on the SHA.
- normalize SHA256 comparison to lowercase so an uppercase expected hash is
  not a false mismatch
- torrent path: torrent.download() already SHA-verifies internally, so skip
  the redundant full-file re-hash (avoids a multi-GB blocking read)
… block the loop

The computed_sha256-is-None fallback re-reads the whole file to hash it;
for a multi-GB model that would stall the event loop. Move _validate_download
to async and run the read+hash via asyncio.to_thread.
@jaylfc jaylfc merged commit 006e980 into dev Jul 2, 2026
9 checks passed
@jaylfc jaylfc deleted the fix/download-false-complete branch July 2, 2026 22:41
@github-project-automation github-project-automation Bot moved this from Todo to Done in TinyAgentOS Roadmap Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant