feat: enable recording episodes into existent dataset#14
feat: enable recording episodes into existent dataset#14JeronimoMendes wants to merge 2 commits into
Conversation
`_CONTAINER_OUTPUT_DIR = "/tmp/lelab/train"` is a fixed path inside the remote HF Jobs container (already documented above the line), not a host-local temp dir, so B108 is a false positive here. The pre-commit workflow runs bandit with `--all-files` on every PR, so this latent violation — merged straight to main in 68d7b45, where pre-commit doesn't run — was failing unrelated PRs (#13, #14). Add an inline `# nosec B108` with justification to clear it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
nicolas-rabault
left a comment
There was a problem hiding this comment.
Very nice PR @JeronimoMendes
A few simple modifications :
- Dead code
The resume feature is driven entirely from EditDataset.tsx, which renders RecordingSettingsFields inline and calls rec.startRecording directly. It never uses RecordingModal. Consequences:
useRecording.ts:48 openForResume(...) is exported but never called anywhere.
The entire isResuming branch of RecordingModal.tsx is unreachable: the modal is only mounted in Landing.tsx, which always opens it via openForNew → resumeRepoId is always null there. So "Record More Episodes" / "Resume Recording" / the resume-repo display block / the "Append new episodes…" description can never render.
So the modal carries a resumeRepoId prop + branch and the hook carries an openForResume path that nothing exercises. Either wire the modal up for resume, or strip the resume branch from RecordingModal and drop openForResume. Given EditDataset already inlines the fields, the latter (delete) is the simpler, consistent choice.
- Sync-status mtime comparison is fragile (medium)
handle_dataset_sync_status uses local_root.stat().st_mtime (top-level dataset dir) vs Hub lastModified. Two failure modes:
A directory's mtime only changes when a direct child entry is added/removed/renamed, not when files deep in data/chunk-*/ or videos/ are written, nor when an existing top-level file is rewritten in place. New episodes can land without the top-level mtime advancing, so needs_sync can be falsely False. The worst direction (UI says "in sync" when it isn't).
After a fresh download/clone from the Hub, local mtime = download time > hub_mtime, so it reports falsely needs_sync: True.
A more reliable signal would be a max mtime walk over the dataset tree, or reading the dataset's own episode count/revision metadata.
9eb6c58 to
f53933d
Compare
|
Thanks a lot for the quick review @nicolas-rabault .
|
This PR:
Demo
Disclaimer
AI tools such as Pi and Claude were used, specially on the frontend part of the PR, although I reviewed all of the code generated.
Closes #11