fix(models): reject false-complete downloads that wrote no data (#1548)#1562
Conversation
…lete Both the torrent and HTTP paths could mark a download complete without checking that anything was actually written to dest. The torrent path trusted torrent.download() returning cleanly; the HTTP path only checked SHA256 when one was supplied, so an empty or truncated body still passed. Add a shared validation step (file exists, size > 0, size matches the known total when available, SHA256 matches when provided) and run it before marking either path complete. On failure the task is marked error and the bad file is removed, matching the existing SHA-mismatch behavior. Fixes the backend half of #1548.
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Warning Review limit reached
Next review available in: 49 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds shared finished-download validation to HTTP and torrent download paths, with task errors and file cleanup when a download produces no data or fails SHA/size checks. Tests cover empty-response and false-complete torrent cases. ChangesDownload validation
Estimated code review effort: 2 (Simple) | ~15 minutes Possibly related PRs
PoemA rabbit checked each byte with care, 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| if expected_sha256: | ||
| digest = computed_sha256 | ||
| if digest is None: | ||
| digest = hashlib.sha256(task.dest.read_bytes()).hexdigest() |
There was a problem hiding this comment.
WARNING: Synchronous task.dest.read_bytes() inside an async-context helper will block the event loop while loading + hashing a (potentially multi-GB) model file. For the SHA fallback this runs after a long HTTP/torrent stream has already held the loop; the SHA256 path on the HTTP side correctly streams into sha.update(chunk) and only this fallback performs a second full read. Consider asyncio.to_thread(hashlib.sha256, task.dest.read_bytes()) (after reading bytes via aiofiles or a thread) so other tasks aren't starved.
| expected_sha256=expected_sha256, | ||
| progress_cb=_progress, | ||
| ) | ||
| error = self._validate_download(task, expected_sha256) |
There was a problem hiding this comment.
SUGGESTION: The HTTP path passes computed_sha256=sha.hexdigest() to skip the re-read, but the torrent path calls self._validate_download(task, expected_sha256) with no computed_sha256. When the torrent downloader already validated the SHA internally on a multi-GB payload, this re-hashes the entire file from disk in the validator's fallback branch, doubling I/O for the mismatch case. Either plumb the in-progress digest through torrent.download (same pattern as sha) or document that the torrent client must validate SHA itself so this fallback is unreachable.
| error = self._validate_download(task, expected_sha256) | |
| error = self._validate_download(task, expected_sha256, computed_sha256=getattr(task, "_computed_sha256", None)) |
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Overview
Incremental Review NotesThis commit (fe5edd5) directly addresses the WARNING flagged on the previous incremental review (sync
The remaining sync paths in the same function ( No new issues found on the changed lines in this incremental commit. Files Reviewed (1 file)
Previous Review Summaries (2 snapshots, latest commit c15c219)Current summary above is authoritative. Previous snapshots are kept for context only. Previous review (commit c15c219)Status: 1 Issue Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Files Reviewed (2 files)
Incremental Review NotesThis commit (c15c219) addresses the three previously-flagged findings on the changed lines:
The still-active WARNING (sync Fix these issues in Kilo Cloud Previous review (commit b87ba27)Status: 2 Issues Found | Recommendation: Address before merge Overview
The fix correctly closes the false-complete hole on both the torrent and HTTP paths via a shared validator, and the new tests cover 0-byte body, no-file torrent, and empty-file torrent variants. Two follow-ups worth considering: Issue Details (click to expand)WARNING
SUGGESTION
Files Reviewed (2 files)
Reviewed by minimax-m3 · Input: 26.8K · Output: 2.3K · Cached: 167.4K |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/test_download_manager.py (1)
267-279: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winConsider adding regression coverage for the gaps flagged in
download_manager.py.The current SHA and content-length tests only exercise lowercase hex digests and unencoded bodies, so they wouldn't catch a mixed-case
expected_sha256false-mismatch or aContent-Encoding: gzipresponse tripping the new size-mismatch check. Adding a test with an uppercaseexpected_sha256and one with acontent-encodingheader (mocked decoded body shorter/longer thancontent-length) would pin down the correct behavior once those are fixed.Also applies to: 338-347
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_download_manager.py` around lines 267 - 279, Add regression coverage in test_download_manager for the two gaps in DownloadTask/_download: one test should use an uppercase or mixed-case expected_sha256 and assert it still matches the downloaded body instead of failing with a false SHA256 mismatch, and another should mock a response with a Content-Encoding header such as gzip where the decoded body size differs from content-length to verify the size check uses the correct post-decompression behavior. Keep the existing test style around _make_async_context_manager_mock and _make_mock_client so the new cases clearly exercise the fixed logic in _download.tinyagentos/download_manager.py (1)
119-120: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winFull-file
read_bytes()re-hash duplicates work and loads the whole file into memory.When
computed_sha256isn't supplied (torrent path), the digest is recomputed by loading the entire file into memory viaread_bytes().TorrentDownloader.download()already performs an equivalent full-file read internally for its own SHA check, so for large model downloads this doubles memory pressure and I/O for no added benefit.♻️ Proposed fix: hash in chunks instead of loading the whole file
- digest = computed_sha256 - if digest is None: - digest = hashlib.sha256(task.dest.read_bytes()).hexdigest() + digest = computed_sha256 + if digest is None: + hasher = hashlib.sha256() + with open(task.dest, "rb") as f: + for chunk in iter(lambda: f.read(65536), b""): + hasher.update(chunk) + digest = hasher.hexdigest()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tinyagentos/download_manager.py` around lines 119 - 120, In DownloadManager.download, the fallback SHA256 path currently uses task.dest.read_bytes(), which duplicates full-file I/O and memory use after TorrentDownloader.download already validates the file. Replace that digest computation with a chunked streaming hash using the existing task.dest path so the file is never fully loaded into memory, and keep the logic inside the digest None branch unchanged otherwise.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tinyagentos/download_manager.py`:
- Around line 113-116: The size check in the download validation logic is
comparing decoded content against Content-Length, which can falsely fail for
compressed responses. Update the download path in the method that validates
`task.dest` and `task.total_bytes` so that encoded responses either clear
`task.total_bytes` before verification or use the raw response stream instead of
`resp.aiter_bytes()`. Keep the existing empty-file and mismatch checks, but
ensure `size mismatch` is only triggered when the expected byte count is valid
for the actual bytes written.
- Around line 117-122: The SHA256 validation in download_manager.py is comparing
digests case-sensitively, so mixed-case variant["sha256"] values can be rejected
even when valid. Update the digest comparison in the expected_sha256 check to
normalize both the computed digest and the expected value to the same case, and
apply the same normalization in the later validation path around the follow-up
SHA256 check so both comparisons behave consistently.
---
Nitpick comments:
In `@tests/test_download_manager.py`:
- Around line 267-279: Add regression coverage in test_download_manager for the
two gaps in DownloadTask/_download: one test should use an uppercase or
mixed-case expected_sha256 and assert it still matches the downloaded body
instead of failing with a false SHA256 mismatch, and another should mock a
response with a Content-Encoding header such as gzip where the decoded body size
differs from content-length to verify the size check uses the correct
post-decompression behavior. Keep the existing test style around
_make_async_context_manager_mock and _make_mock_client so the new cases clearly
exercise the fixed logic in _download.
In `@tinyagentos/download_manager.py`:
- Around line 119-120: In DownloadManager.download, the fallback SHA256 path
currently uses task.dest.read_bytes(), which duplicates full-file I/O and memory
use after TorrentDownloader.download already validates the file. Replace that
digest computation with a chunked streaming hash using the existing task.dest
path so the file is never fully loaded into memory, and keep the logic inside
the digest None branch unchanged otherwise.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 64594d19-abf5-4d02-901f-1927b71486fc
📒 Files selected for processing (2)
tests/test_download_manager.pytinyagentos/download_manager.py
- MAJOR: do not treat Content-Length as the expected on-disk size for content-encoded (gzip/br/deflate/zstd) responses. httpx auto-decompresses, so the written file is larger than Content-Length and the size check would delete a valid download. Leave total_bytes unknown and rely on the SHA. - normalize SHA256 comparison to lowercase so an uppercase expected hash is not a false mismatch - torrent path: torrent.download() already SHA-verifies internally, so skip the redundant full-file re-hash (avoids a multi-GB blocking read)
… block the loop The computed_sha256-is-None fallback re-reads the whole file to hash it; for a multi-GB model that would stall the event loop. Move _validate_download to async and run the read+hash via asyncio.to_thread.
Backend half of #1548 (confirmed on beta.17: the Models dialog fills the progress bar to 100 percent instantly and shows the model installed, but nothing is downloaded and nothing persists).
Root cause in download_manager.py: two paths marked a download complete without checking the file exists and is non-empty.
Fix: one shared _validate_download (file exists and non-empty, size matches known total_bytes when set, SHA when provided; unlink the bad file and set status=error otherwise), run before marking complete on BOTH paths. Behaviour for the existing SHA-mismatch case is unchanged.
Tests: 0-byte HTTP body errors instead of completing; torrent success that wrote nothing / an empty file errors; a genuine non-empty download still completes. 34 pass, no regressions in the torrent/download suites.
Note: fixes the false-complete half of #1548 only; the Providers-error-state and empty-log-viewer halves are separate (the Logs app work covers the latter). Built by a subagent to an orchestrator spec; diff audited and re-verified in the main repo.
Summary by CodeRabbit