-
Notifications
You must be signed in to change notification settings - Fork 2.3k
SAM 2 Update 12/11/2024 -- full model compilation for a major VOS speedup and a new SAM2VideoPredictor to better handle multi-object tracking #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ng accuracy regressions
speed optimizations cleanup
…per-object inference sam2/sam2_video_predictor.py In this PR, we switch to a new implementation of the class `SAM2VideoPredictor` for in sam2/sam2_video_predictor.py, which allows for independent per-object inference. Specifically, the new `SAM2VideoPredictor`: * it handles the inference of each object separately, as if we are opening a separate session for each object * it relaxes the assumption on prompting * previously if a frame receives clicks only for a subset of objects, the rest of (non-prompted) objects are assumed to be non-existing in this frame * now if a frame receives clicks only for a subset of objects, we don't make any assumptions for the remaining (non-prompted) objects * it allows adding new objects after tracking starts * (The previous implementation is backed up to `SAM2VideoPredictor` in sam2/sam2_video_predictor_legacy.py) Also, fix a small typo `APP_URL` => `API_URL` in the doc. Test plan: tested with the predictor notebook `notebooks/video_predictor_example.ipynb` and VOS script `tools/vos_inference.py`. Also tested with the demo.
…with `torch.compile`; update README.md
…inference switch to a new implementation of the class `SAM2VideoPredictor` for per-object inference sam2/sam2_video_predictor.py
chayryali
approved these changes
Dec 11, 2024
raedle
added a commit
that referenced
this pull request
Dec 25, 2024
Summary: PR #486 changed propagation to track each object independently. This allows to add objects after the initial propagation. The frontend has a constraint that would prevent users from adding new objects after the initial "tracking". Now that the model supports tracking objects independently, we can remove this constraint from the UI. Test Plan: `cd demo/frontend` `yarn lint` ``` (base) ➜ demo/frontend $ yarn lint ⎇ remotes/origin/HEAD* yarn run v1.18.0 $ eslint . --ext ts,tsx --report-unused-disable-directives --max-warnings 0 ✨ Done in 16.98s. ```
mahmoudzaouali
pushed a commit
to mahmoudzaouali/sam2
that referenced
this pull request
Jul 31, 2025
…edup and a new SAM2VideoPredictor to better handle multi-object tracking (facebookresearch#486) This PR provides new features and updates for SAM 2: - We now support `torch.compile` of the entire SAM 2 model on videos, which can be turned on by setting `vos_optimized=True` in `build_sam2_video_predictor` (it uses the new `SAM2VideoPredictorVOS` predictor class in `sam2/sam2_video_predictor.py`). * Compared to the previous setting (which only compiles the image encoder backbone), the new full model compilation gives a major speedup in inference FPS. * In the VOS prediction script `tools/vos_inference.py`, you can specify this option in `tools/vos_inference.py` via the `--use_vos_optimized_video_predictor` flag. * Note that turning on this flag might introduce a small variance in the predictions due to numerical differences caused by `torch.compile` of the full model. * **PyTorch 2.5.1 is the minimum version for full support of this feature**. (Earlier PyTorch versions might run into compilation errors in some cases.) Therefore, we have updated the minimum PyTorch version to 2.5.1 accordingly in the installation scripts. - We also update the implementation of the `SAM2VideoPredictor` class for the SAM 2 video prediction in `sam2/sam2_video_predictor.py`, which allows for independent per-object inference. Specifically, in the new `SAM2VideoPredictor`: * Now **we handle the inference of each object independently** (as if we are opening a separate session for each object) while sharing their backbone features. * This change allows us to relax the assumption of prompting for multi-object tracking. Previously (due to the batching behavior in inference), if a video frame receives clicks for only a subset of objects, the rest of the (non-prompted) objects are assumed to be non-existent in this frame (i.e., in such frames, the user is telling SAM 2 that the rest of the objects don't appear). Now, if a frame receives clicks for only a subset of objects, we do not make any assumptions about the remaining (non-prompted) objects (i.e., now each object is handled independently and is not affected by how other objects are prompted). As a result, **we allow adding new objects after tracking starts** after this change (which was previously a restriction on usage). * We believe that the new version is a more natural inference behavior and therefore switched to it as the default behavior. The previous implementation of `SAM2VideoPredictor` is backed up to in `sam2/sam2_video_predictor_legacy.py`. All the VOS inference results using `tools/vos_inference.py` should remain the same after this change to the `SAM2VideoPredictor` class.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR provides new features and updates for SAM 2:
torch.compileof the entire SAM 2 model on videos, which can be turned on by settingvos_optimized=Trueinbuild_sam2_video_predictor(it uses the newSAM2VideoPredictorVOSpredictor class insam2/sam2_video_predictor.py).tools/vos_inference.py, you can specify this option intools/vos_inference.pyvia the--use_vos_optimized_video_predictorflag.torch.compileof the full model.SAM2VideoPredictorclass for the SAM 2 video prediction insam2/sam2_video_predictor.py, which allows for independent per-object inference. Specifically, in the newSAM2VideoPredictor:SAM2VideoPredictoris backed up to insam2/sam2_video_predictor_legacy.py. All the VOS inference results usingtools/vos_inference.pyshould remain the same after this change to theSAM2VideoPredictorclass.