The goal of this project is to provide a set of tools to setup, train and deploy a (mostly cheap) alternative to high end motion capturing intended for personal use.
! The repository contains some legacy code, mainly in the core folder
- Basic Pipeline setup
- Test FastAPI
- Test Redis
- Test ZMQ
- Data
- Prepare dataloader for WIDER Face
- Prepare dataloader for {todo: another face detection dataset}
- Prepare dataloader for {todo: one image understanding dataset}
- Camera Calibration (only intrinsic calibration required)
- Reading
- Read Baidu's RT-DETR: A Vision Transformer-Based Real-Time Object Detector
- Read Towards Accurate Faial Landmark Detection via Cascaded Transformer
- Read Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning
- Read RePFormer: Refinement Pyramid Transformer for Robust Facial Landmark Detection
- Read Revisiting Quantization Error in Face Alignment
- Read Learnable Triangulation of Human Pose
- Read Shape Preserving Facial Landmarks with Graph Attention Networks
- Read Vision Transformer with Deformable Attention
- Detached Inference from 3. party (deployment)
- Implement detached inference with FACER toolkit (FaRL models)
- Implement detached inference with SPIGA (Shape Preserving Facial Landmarks with Graph Attention Networks)
- Face Tracking Module
- Construct a basic Vision attention based module with alterations (Backbone, Head)
- Write the training experiment for pretraining the Backbone
- Write the training experiment for training the neck/head on face recognition
- Train and confirm, different sizes and maybe adjust the model structure
- Face Keypoint Data
- Prepare dataloader for WFLW
- Prepare dataloader for COFW
- Prepare dataloader for FDDB
- Annotate self recorded videos with help of the teacher networks
- Format the face keypoint data relative to the face recognition bounding boxes (+ some padding)
- Face Keypoint Estimator
- Write the training experiment for training a neck/head on keypoint estimation
- Answer the question: in what way is my own recorded dataset semantically constraint that finetuning on it improves results, and to what extend ?
- Hand tracking
- Facial emotion recognition
- Discrete hand pose estimation (if there even is a dataset for this)
- human instance segmentation
- Face segmentation
- Human body keypoint estimation
- Multi cammera triangulation
- Human body part segmentation
- Object detection & 6DOF tracking
- Human object interactions
- A bunch of research projects all directely referenced by name with papers and links provided
- few code snippets used from ultralytics/YOLOv8
- ICP from procrustes/ICP.py