Stars
Code for the paper "ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions" published at CVPR 2025
Official repository for our work on micro-budget training of large-scale diffusion models.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A concise but complete full-attention transformer with a set of promising experimental features from various papers
High-resolution models for human tasks.
[ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning".
CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!
Code for the paper "GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos" published at CVPR 2024
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
The Most Faithful Implementation of Segment Anything (SAM) in 3D
Official inference repo for FLUX.1 models
A feature-rich command-line audio/video downloader
Command-line program to download videos from YouTube.com and other video sites
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
🎥 Python and OpenCV-based scene cut/transition detection program & library.
Learn LeetCode and prepare for coding interviews with free resources.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
[CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'
A curated collections of papers related to speech, audio and music in CVPR 2024.
[NeurIPS'23] Emergent Correspondence from Image Diffusion
Official repository for "AM-RADIO: Reduce All Domains Into One"
OpenEQA Embodied Question Answering in the Era of Foundation Models
Texas DPS/DMV Automatic Scheduler