Discussion about Refining Zero-Shot Object Segmentation by Combining Vision Foundation Models #29549
Replies: 3 comments 3 replies
-
Dear Daan Krol, Klaas Dijkstra, and Samet Akcay, Hello, my name is Doeon Kim, a Master’s student at the Graduate School of AI at Soongsil University in Korea. I recently conducted research on cross-attention of heterogeneous feature maps — specifically fusing features from FPN-based backbones and segmentation model to improve object detection performance — and also I've handled various vision transformers(vit, segformer, hrformer, swinformer) That’s why the GSoC 2025 project titled “Visual Prompting-based Segmentation Refinement” stood out to me! From the project description, I understand that the core goal is to improve the segmentation quality of Visual Prompting pipelines by refining masks generated by SAM—especially in cases where they are noisy or incomplete. 💡 While I’m still learning and have much to improve, I’ve been thinking about a few ideas that might be helpful:
❓ I also had a few quick questions:
Thank you for your time and consideration. I'm genuinely excited about this project and hope to contribute with both implementation and idea exploration! I'm also very eager to learn from your expertise throughout the project! Best regards, |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@saadkhi @kimdoeon |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear Daan Krol, Klaas Dijkstra, and Samet Akcay,
I hope you're doing well. My name is Saad Ather Ali, and I am excited about the Refining Zero-Shot Object Segmentation project for Google Summer of Code. With my background in computer vision, AI, and deep learning, I believe I can contribute effectively to this research-driven initiative.
My Interest in This Project
The concept of visual prompting with foundational models like DINOv2 and Segment Anything Model (SAM) is fascinating, as it eliminates the dependency on labeled data while enabling efficient object segmentation. However, as mentioned in the project description, false positives and incomplete segmentations pose significant challenges. I am particularly interested in exploring novel filtering and merging techniques to enhance segmentation robustness and improve generalization across datasets.
How Can I Contribute?
I would love to understand how I can contribute to this project.
I have hands-on experience with Python, OpenCV, TensorFlow, and PyTorch, and I am eager to further enhance my understanding of foundational vision models like DINOv2 and CLIP while working on this project.
Looking forward to your guidance and insights on how I can get started!
Best regards,
Saad Ather Ali
GitHub | LinkedIn
Beta Was this translation helpful? Give feedback.
All reactions