I build AI products end-to-end — from the business need, to the agentic flows that run them, down to the infrastructure underneath: inference, proxies, data. When the economics demand it I go all the way down to the metal — that's where the inference work below comes from.
Previously CTO at Miranda Media; before that I founded and ran Hosting90 (Czech hosting/cloud) for 18 years, exiting in 2020. That background in running infrastructure at scale informs how I think about LLM serving: reliability, latency, and resource efficiency over benchmark headlines. Open to fractional-CTO and advisory work.
- vllm-mlx — Core contributor to vllm-mlx (80+ PRs; second external contributor by join order), an OpenAI-compatible inference server built on MLX for Apple Silicon. Specific contributions:
- MTP speculative decoding for Qwen3-Next (PR #82, merged) — Multi-Token Prediction with always-advance strategy and rejection sampling; 1.43x verified / 1.76x optimistic on M3 Ultra
- Draft-model speculative decoding (PR #45, merged) — HybridEngine sharing a single model instance across speculative and batched modes; 1.2–1.4x throughput improvement
- Prefix caching, KV cache quantization, Anthropic Messages API integration, MoE model support
- Assigned collaborator on MTP roadmap for Qwen3-Next / MiMo / Qwen3.5 family alongside the repo owner
- vllm-mlx-dashboard — Real-time monitoring dashboard for local LLM inference servers (llama.cpp + vllm-mlx), built with Next.js.
- Data & access infrastructure — self-built residential LTE proxy pool (9 modems, CGNAT rotation) feeding production scraping for AI products. Writeup: https://hilgard.cz/writing/lte-proxy-pool
- M3 Ultra (256 GB) benchmarking — sustained throughput, quantization tradeoffs, batch size effects at the edge of consumer hardware.
- Now: independent — AI infrastructure builder, open to fractional-CTO & advisory
- 2026: CTO, Miranda Media (built their AI products)
- 2002–2020: Founder & CEO, Hosting90 (Czech hosting/cloud, exited)
- Focus areas: inference optimization, agentic systems, local LLM deployment, data/access infrastructure
Python TypeScript MLX vLLM Next.js Apple Silicon llama.cpp KV cache quantization



