I am 栗子昂, an MTS @ humans& ai working on the full-stack of LLM performance engineering. I graduated from University of Michigan, Ann Arbor (BS), after I transfered and spent 2 years at Chinese University of Hong Kong, Shenzhen.
Some of the places I previously worked/interned at:
- NVIDIA GPU architecture simulation team
- NVIDIA DevTech Compute team
- Google Gemini GPU performance team
- Samsung OpenCL compute team
I used to enjoy cycle-level extreme GPU kernel performance optimization but I no longer consider it an important problem. My work has shfited more into low-precision numerics and model co-design.

