This repository extends Meta AI’s SAM2 foundation model to address identity fragmentation in video segmentation under:
- Occlusion
- Similar-looking object interaction
- Object splitting and merging
- Cluttered dynamic scenes
This project was completed for CIS 6800 – Advanced Machine Perception (University of Pennsylvania).
Team Members:
- Maanasa Rajeshwer
- Marika Nishi
- Prakriti Prasad
Spatial re-identification and tracklet extensions were developed collaboratively as part of the course final project.
Original SAM2 repository:
https://github.com/facebookresearch/sam2
Foundation video segmentation models excel at mask propagation but often fail to preserve object identity over time when:
- Objects become partially or fully occluded
- Multiple identical objects interact
- Viewpoints change significantly
- Objects split or merge
These failures are critical in embodied AI systems where identity consistency is required for:
- Tracking
- Manipulation
- Multi-object reasoning
- Perception-to-control pipelines
This project introduces a training-free spatial re-identification pipeline that augments SAM2 with temporal reasoning and proximity-aware matching.
This repository adds the following extensions to the original SAM2 framework:
- Maintains short-term identity memory
- Aggregates mask predictions across frames
- Enables temporal smoothing
- Uses geometric proximity constraints
- Reduces ID swaps between similar objects
- Improves identity stability in cluttered scenes
- Integrates RAFT optical flow for motion consistency
- Aligns masks temporally
- Improves re-identification under fast motion
Tested on:
- Cup shuffling sequences
- Similar-looking object tracking
- Sticky note tracking
- Cluttered paper splitting scenarios
These experiments focus on identity fragmentation failure modes.
Multi-Frame Video
↓
SAM2 Masklet Prediction
↓
Optical Flow Motion Alignment (RAFT)
↓
Tracklet Formation
↓
Spatial Re-Identification
↓
Identity-Consistent Mask Propagation
This design preserves SAM2’s foundation capabilities while introducing temporal reasoning without retraining the model.
Follow the original SAM2 installation instructions:
git clone https://github.com/facebookresearch/sam2.git
cd sam2
pip install -e .
## Installation Assumptions
This fork assumes:
- Python >= 3.10
- PyTorch >= 2.5
- CUDA-enabled GPU
Optional (for notebooks):
```bash
pip install -e ".[notebooks]"Example:
python sam2/tracklets/tracklets_demo3.pyOr explore the provided notebooks:
notebooks/tracklets_demo.ipynbnotebooks/ap2.ipynb(cup shuffling)notebooks/ap3.ipynb(similar-looking object tracking)
We evaluate identity preservation under challenging conditions:
| Scenario | Baseline SAM2 | SAM2 + Spatial Re-ID |
|---|---|---|
| Similar object crossing | ID swaps | Reduced swaps |
| Occlusion | Identity loss | Preserved |
| Object splitting | Fragmentation | Improved consistency |
Qualitative improvements are observed in clutter-heavy and ambiguous scenes.
- Still dependent on mask quality from SAM2
- Proximity-based heuristics may fail in dense scenes
- No retraining performed — purely inference-time augmentation
- Learned re-identification embeddings
- Multi-camera identity consistency
- Integration with embodied control policies
This repository is based on:
Ravi et al., “SAM 2: Segment Anything in Images and Videos,” 2024
https://github.com/facebookresearch/sam2
All original SAM2 code remains under the Apache 2.0 License.
All spatial re-identification and tracklet extensions were developed for academic research.
If referencing this extension, please cite both:
- SAM2 (Meta AI)
- This spatial re-identification extension (CIS 6800 project)

