Jan Hilgard janhilgard

Jan Hilgard — end-to-end AI builder (need → infra), ex-founder (Hosting90 exit)

I build AI products end-to-end — from the business need, to the agentic flows that run them, down to the infrastructure underneath: inference, proxies, data. When the economics demand it I go all the way down to the metal — that's where the inference work below comes from.

Previously CTO at Miranda Media; before that I founded and ran Hosting90 (Czech hosting/cloud) for 18 years, exiting in 2020. That background in running infrastructure at scale informs how I think about LLM serving: reliability, latency, and resource efficiency over benchmark headlines. Open to fractional-CTO and advisory work.

What I'm working on

vllm-mlx — Core contributor to vllm-mlx (80+ PRs; second external contributor by join order), an OpenAI-compatible inference server built on MLX for Apple Silicon. Specific contributions:
MTP speculative decoding for Qwen3-Next (PR #82, merged) — Multi-Token Prediction with always-advance strategy and rejection sampling; 1.43x verified / 1.76x optimistic on M3 Ultra
Draft-model speculative decoding (PR #45, merged) — HybridEngine sharing a single model instance across speculative and batched modes; 1.2–1.4x throughput improvement
Prefix caching, KV cache quantization, Anthropic Messages API integration, MoE model support
Assigned collaborator on MTP roadmap for Qwen3-Next / MiMo / Qwen3.5 family alongside the repo owner
vllm-mlx-dashboard — Real-time monitoring dashboard for local LLM inference servers (llama.cpp + vllm-mlx), built with Next.js.
Data & access infrastructure — self-built residential LTE proxy pool (9 modems, CGNAT rotation) feeding production scraping for AI products. Writeup: https://hilgard.cz/writing/lte-proxy-pool
M3 Ultra (256 GB) benchmarking — sustained throughput, quantization tradeoffs, batch size effects at the edge of consumer hardware.

Background

Now: independent — AI infrastructure builder, open to fractional-CTO & advisory
2026: CTO, Miranda Media (built their AI products)
2002–2020: Founder & CEO, Hosting90 (Czech hosting/cloud, exited)
Focus areas: inference optimization, agentic systems, local LLM deployment, data/access infrastructure

Tech I work with

Python TypeScript MLX vLLM Next.js Apple Silicon llama.cpp KV cache quantization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jan Hilgard janhilgard

Achievements

Achievements

Block or report janhilgard

Jan Hilgard — end-to-end AI builder (need → infra), ex-founder (Hosting90 exit)

What I'm working on

Background

Tech I work with

Reach me

Pinned Loading

Uh oh!