ModelService

ModelService declaratively provisions and maintains the Kubernetes resources needed to serve a base model for inference.

A ModelService custom resource encapsulates the desired state of workloads and routing associated with a single base model. It automates the management of Kubernetes resources, including:

Prefill and decode deployments
Inference pool and model defined by Gateway API Inference Extension
Endpoint picker (EPP) deployment and service
Relevant RBAC permissions

A ModelService may optionally reference a BaseConfig — a Kubernetes ConfigMap that defines reusable, platform-managed presets for shared behavior across multiple base models.

Typically, platform operators define a small set of BaseConfig presets, and base model owners reference them in their respective ModelService resources.

The ModelService controller reconciles the cluster state to align with the configuration declared in the ModelService custom resource. This custom resource is the source of truth for resources it owns.

⚠️ Important: Do not manually modify resources owned by a ModelService. If your use case is not yet supported, please file an issue in the ModelService repository.

Features

✅ Supports disaggregated prefill and decode workloads

🌐 Integrates with Gateway API Inference Extension for request routing

📈 Enables auto-scaling via HPA or custom controllers

🔧 Allows independent scaling and node affinity for prefill and decode deployments

📦 Supports model loading from:

HuggingFace (public or private)
Kubernetes PVCs
OCI images

🧩 Supports value templating in both BaseConfig and ModelService resources

How It Works

When a ModelService resource is reconciled:

Templating: template variables in BaseConfig and ModelService are interpolated based on the ModelService spec.
Merging: a semantic merge overlays ModelService values on top of the selected BaseConfig.
Orchestration: the controller creates or updates the following resources:

Inference workloads (prefill and decode deployments)
Routing resources (e.g., EPP deployment)
RBAC permissions

The result is a fully managed inference stack for the base model.

Best Practices

Use BaseConfig to capture platform-level defaults and shared configurations across multiple base models.
Use ModelService to define behavior specific to a given base model, and override BaseConfig values only when necessary.
Platform teams should install Baseconfig presets using the llm-d deployer.
Base model owners should prefer using these presets to streamline onboarding of base models, rather than creating their own BaseConfigs.

Roadmap

Modelservice roadmap features in no specific order.

Multiple base models: Create HTTPRoute and related routing configuration
LoRA adapters: Create LoRA controller that integrates with ModelService
Routing weights: Allow a logical model to expose multiple model versions via routing weights
In-cluster model caching: download model artifacts once into cluster and reuse
Node-level model caching: pre-load model artifacts onto nodes for fast model loading
BaseConfig CRD: migrate from the use of configmaps to CRD for baseconfig resources
Prometheus metrics exporter: Emit controller metrics
Enable multi-node inferencing: for instance, using LWS integration

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github		.github
api/v1alpha1		api/v1alpha1
cmd		cmd
config		config
deploy		deploy
docs		docs
hack		hack
hooks		hooks
internal/controller		internal/controller
perf		perf
samples		samples
test		test
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.version.json		.version.json
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
model-service-arch.excalidraw		model-service-arch.excalidraw
model-service-arch.png		model-service-arch.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ModelService

Features

How It Works

Best Practices

Docs

Install

Samples

User Guide

API Reference

Developer

Roadmap

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors 9

Languages

License

llm-d/llm-d-model-service

Folders and files

Latest commit

History

Repository files navigation

ModelService

Features

How It Works

Best Practices

Docs

Roadmap

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages