Skip to content

MaanaRajesh/sam2-spatial-reid

Repository files navigation

SAM2 + Spatial Re-Identification for Identity-Preserving Video Segmentation

This repository extends Meta AI’s SAM2 foundation model to address identity fragmentation in video segmentation under:

  • Occlusion
  • Similar-looking object interaction
  • Object splitting and merging
  • Cluttered dynamic scenes

Collaborators

This project was completed for CIS 6800 – Advanced Machine Perception (University of Pennsylvania).

Team Members:

  • Maanasa Rajeshwer
  • Marika Nishi
  • Prakriti Prasad

Spatial re-identification and tracklet extensions were developed collaboratively as part of the course final project.

Original SAM2 repository:
https://github.com/facebookresearch/sam2


Motivation

Foundation video segmentation models excel at mask propagation but often fail to preserve object identity over time when:

  • Objects become partially or fully occluded
  • Multiple identical objects interact
  • Viewpoints change significantly
  • Objects split or merge

These failures are critical in embodied AI systems where identity consistency is required for:

  • Tracking
  • Manipulation
  • Multi-object reasoning
  • Perception-to-control pipelines

This project introduces a training-free spatial re-identification pipeline that augments SAM2 with temporal reasoning and proximity-aware matching.


Key Contributions in This Fork

This repository adds the following extensions to the original SAM2 framework:

1. Tracklet-Based Temporal Memory

  • Maintains short-term identity memory
  • Aggregates mask predictions across frames
  • Enables temporal smoothing

2. Spatial Proximity-Aware Re-Identification

  • Uses geometric proximity constraints
  • Reduces ID swaps between similar objects
  • Improves identity stability in cluttered scenes

3. Optical Flow Integration

  • Integrates RAFT optical flow for motion consistency
  • Aligns masks temporally
  • Improves re-identification under fast motion

4. Evaluation on Challenging Scenarios

Tested on:

  • Cup shuffling sequences
  • Similar-looking object tracking
  • Sticky note tracking
  • Cluttered paper splitting scenarios

These experiments focus on identity fragmentation failure modes.


Architecture Overview

Multi-Frame Video

SAM2 Masklet Prediction

Optical Flow Motion Alignment (RAFT)

Tracklet Formation

Spatial Re-Identification

Identity-Consistent Mask Propagation

This design preserves SAM2’s foundation capabilities while introducing temporal reasoning without retraining the model.


Installation

Follow the original SAM2 installation instructions:

git clone https://github.com/facebookresearch/sam2.git
cd sam2
pip install -e .

## Installation Assumptions

This fork assumes:

- Python >= 3.10  
- PyTorch >= 2.5  
- CUDA-enabled GPU  

Optional (for notebooks):

```bash
pip install -e ".[notebooks]"

Running the Spatial Re-ID Extensions

Example:

python sam2/tracklets/tracklets_demo3.py

Or explore the provided notebooks:

  • notebooks/tracklets_demo.ipynb
  • notebooks/ap2.ipynb (cup shuffling)
  • notebooks/ap3.ipynb (similar-looking object tracking)

Results

We evaluate identity preservation under challenging conditions:

Scenario Baseline SAM2 SAM2 + Spatial Re-ID
Similar object crossing ID swaps Reduced swaps
Occlusion Identity loss Preserved
Object splitting Fragmentation Improved consistency

Qualitative improvements are observed in clutter-heavy and ambiguous scenes.


Limitations

  • Still dependent on mask quality from SAM2
  • Proximity-based heuristics may fail in dense scenes
  • No retraining performed — purely inference-time augmentation

Future Directions

  • Learned re-identification embeddings
  • Multi-camera identity consistency
  • Integration with embodied control policies

Attribution

This repository is based on:

Ravi et al., “SAM 2: Segment Anything in Images and Videos,” 2024
https://github.com/facebookresearch/sam2

All original SAM2 code remains under the Apache 2.0 License.

All spatial re-identification and tracklet extensions were developed for academic research.


Citation

If referencing this extension, please cite both:

  1. SAM2 (Meta AI)
  2. This spatial re-identification extension (CIS 6800 project)

About

Spatial re-identification and tracklet reasoning extension of SAM2 for identity-consistent video segmentation under occlusion and object interaction.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages