-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USHIFT-5287: RHOAI Model Serving On MicroShift #1737
base: master
Are you sure you want to change the base?
Changes from 1 commit
d36236f
55d2ae1
ff73ea4
3c66a77
fda86fc
b24707b
f820782
c1db7fb
d2e477b
ddac58b
eb36252
c888619
d678867
35875b6
6a70ebe
27e6dfb
3d02f6d
f126bc9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,303 @@ | ||
--- | ||
title: rhoai-model-serving-on-microshift | ||
authors: | ||
- pmtk | ||
reviewers: # Include a comment about what domain expertise a reviewer is expected to bring and what area of the enhancement you expect them to focus on. For example: - "@networkguru, for networking aspects, please look at IP bootstrapping aspect" | ||
- DanielFroehlich, MicroShift PM | ||
- jerpeter1, MicroShift Staff Eng, Architect | ||
- TBD, RHOAI Expert | ||
approvers: # A single approver is preferred, the role of the approver is to raise important questions, help ensure the enhancement receives reviews from all applicable areas/SMEs, and determine when consensus is achieved such that the EP can move forward to implementation. Having multiple approvers makes it difficult to determine who is responsible for the actual approval. | ||
- jerpeter1 | ||
api-approvers: | ||
- None | ||
creation-date: 2025-01-17 | ||
last-updated: 2025-01-17 | ||
tracking-link: | ||
- https://issues.redhat.com/browse/OCPSTRAT-1721 | ||
# see-also: | ||
# - "/enhancements/this-other-neat-thing.md" | ||
# replaces: | ||
# - "/enhancements/that-less-than-great-idea.md" | ||
# superseded-by: | ||
# - "/enhancements/our-past-effort.md" | ||
--- | ||
|
||
# RHOAI Model Serving on MicroShift | ||
|
||
## Summary | ||
|
||
Following enhancement describes process of enabling AI model | ||
serving on MicroShift based on Red Hat OpenShift AI (RHOAI). | ||
|
||
## Motivation | ||
|
||
Enabling users to use MicroShift for AI model serving means they will be able to | ||
train model in the cloud or datacenter on OpenShift, and serve models at the | ||
edge using MicroShift. | ||
|
||
### User Stories | ||
|
||
* As a MicroShift user, I want to serve AI models on the edge in a lightweight manner. | ||
|
||
### Goals | ||
|
||
- Prepare RHOAI-based kserve manifests that fit MicroShift's use cases and environments. | ||
- Provide RHOAI supported ServingRuntimes CRs so that users can use them. | ||
- [List of supported model-serving runtimes](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.16/html/serving_models/serving-large-models_serving-large-models#supported-model-serving-runtimes_serving-large-models) | ||
(not all might be suitable for MicroShift - e.g. intended for multi model serving) | ||
- Document how to use kserve on MicroShift. | ||
- Including reference to ["Tested and verified model-serving runtimes" that are not supported by Red Hat](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.16/html/serving_models/serving-large-models_serving-large-models#tested-verified-runtimes_serving-large-models) | ||
|
||
### Non-Goals | ||
|
||
- Deploying full RHOAI on MicroShift. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it mean "bare KServe" without RHOAI operator at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. MicroShift is intended for edge deployments so we neither have resources nor needs for whole suite of tools. Just serving the models. |
||
- Providing goodies such as RHOAI Dashboard. Using it will be more similar to upstream kserve. | ||
|
||
## Proposal | ||
|
||
Extract kserve manifests from RHOAI Operator image and adjust them for MicroShift: | ||
- Make sure that cert-manager is not required, instead leverage OpenShift's service-ca. | ||
- This might require adding some extra annotations to resources so the service-ca injects the certs. | ||
- Drop requirement for Istio as an Ingress Controller and use OpenShift Router instead. | ||
- Done by changing kserve's setting configmap to use another ingress controller. | ||
- Use 'RawDeployment' mode, so that neither Service Mesh nor Serverless are required, | ||
to minimize to make the solution suitable for edge devices. | ||
- Also done in the configmap. | ||
- Package the manifests as an RPM `microshift-kserve` for usage. | ||
|
||
Provide users with ServingRuntime definitions derived from RHOAI, so they | ||
are not forced to use upstream manifests. | ||
Decision on how to do this is pending. See open questions. | ||
|
||
### Workflow Description | ||
|
||
**User** is a human administrating and using MicroShift cluster/device. | ||
|
||
(RPM vs ostree vs bootc is skipped because it doesn't differ from any other MicroShift's RPM). | ||
|
||
1. User installs `microshift-kserve` RPM and restarts MicroShift service. | ||
1. Kserve manifest are deployed. | ||
1. User configures the hardware, the OS, and additional Kubernetes components to | ||
make use of their accelerators. | ||
1. ServingRuntimes are delivered with the kserve RPM or deployed by the user. | ||
1. User creates InferenceService CR which references ServingRuntime of their choice | ||
and reference to the model. | ||
1. Kserve creates Deployment, Ingress, and other. | ||
1. Resources from previous step become ready and user can make HTTP/GRPC calls | ||
to the model server. | ||
|
||
### API Extensions | ||
|
||
`microshift-kserve` RPM will bring following CRDs, however they're not becoming | ||
part of the core MicroShift deployment: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @pmtk, is ClusterServingRuntimes missing from this list? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not decided yet, see |
||
- InferenceServices | ||
- TrainedModels | ||
- ServingRuntimes | ||
- InferenceGraphs | ||
- ClusterStorageContainers | ||
- ClusterLocalModels | ||
- LocalModelNodeGroups | ||
|
||
Contents of these CRDs can be viewed at https://github.com/red-hat-data-services/kserve/tree/master/config/crd/full. | ||
|
||
### Topology Considerations | ||
|
||
#### Hypershift / Hosted Control Planes | ||
|
||
Enhancement is MicroShift specific. | ||
|
||
#### Standalone Clusters | ||
|
||
Enhancement is MicroShift specific. | ||
|
||
#### Single-node Deployments or MicroShift | ||
|
||
Enhancement is MicroShift specific. | ||
|
||
### Implementation Details/Notes/Constraints | ||
|
||
N/A | ||
|
||
### Risks and Mitigations | ||
|
||
At the time of writing this document, kserve's Raw Deployment mode is not fully | ||
supported by the RHOAI. For for reason, this feature will start as Tech Preview | ||
and only advance to GA when RHOAI starts supporting Raw Deployment mode. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is going to change in like 1 release so I think we can already assume here to have it fully supported. |
||
|
||
### Drawbacks | ||
|
||
At the time of writing this document, ARM architecture is not supported. | ||
If that changes in the future, rebase process and RPM building will need to be | ||
reworked (depending on how the RHOAI's manifests will look like). | ||
|
||
## Open Questions [optional] | ||
|
||
### Tweaking Kserve setting: ingress domain | ||
|
||
Kserve settings are delivered in form of a ConfigMap: | ||
- [Upstream example](https://github.com/red-hat-data-services/kserve/blob/master/config/configmap/inferenceservice.yaml) | ||
- [RHOAI's overrides](https://github.com/red-hat-data-services/kserve/blob/master/config/overlays/odh/inferenceservice-config-patch.yaml) | ||
|
||
While we can recommend users to create a manifest that will override the | ||
ConfigMap to their liking, there's one setting that we could handle better: | ||
`ingress.ingressDomain`. In the example it has value of `example.com` and it | ||
might be poor UX to require every customer to create a new ConfigMap just to change this. | ||
|
||
Possible solution is to change our manifest handling logic, so that MicroShift | ||
uses kustomize's Go package to first render the manifest, then template it, | ||
and finally apply. For this particular `ingress.ingressDomain` we could reuse value of | ||
`dns.baseDomain` from MicroShift's config.yaml. | ||
|
||
|
||
### How to deliver ServingRuntime CRs | ||
|
||
From [kserve documentation](https://kserve.github.io/website/master/modelserving/servingruntimes/): | ||
|
||
> KServe makes use of two CRDs for defining model serving environments: | ||
> | ||
> ServingRuntimes and ClusterServingRuntimes | ||
> | ||
> The only difference between the two is that one is namespace-scoped and the other is cluster-scoped. | ||
> | ||
> A ServingRuntime defines the templates for Pods that can serve one or more | ||
> particular model formats. Each ServingRuntime defines key information such as | ||
> the container image of the runtime and a list of the model formats that the | ||
> runtime supports. Other configuration settings for the runtime can be conveyed | ||
> through environment variables in the container specification. | ||
|
||
RHOAI approach to ServingRuntimes: | ||
- ClusterServingRuntimes are not supported (the CRD is not created). | ||
- Each usable ServingRuntime is wrapped in Template and resides in RHOAI's namespace. | ||
- When user uses RHOAI Dashboard to serve a model, they must select a runtime | ||
from a list (which is constructed from Templates holding ServingRuntime) | ||
and provide information about the model. | ||
- When the user sends the form, Dashboard creates a ServingRuntime from the | ||
Template in the user's Data Science Project (effectively Kubernetes namespace), | ||
and assembles the InferenceService CR. | ||
|
||
Problem: how to provide user with ServingRuntimes without creating unnecessary obstacles. | ||
|
||
In any of the following solutions, we need to drop the `Template` container to | ||
get only the `ServingRuntime` CR part. | ||
|
||
Potential solutions so far: | ||
- Include ServingRuntimes in a RPM as a kustomization manifest (might be | ||
`microshift-kserve`, `microshift-rhoai-runtimes`, or something else) using a | ||
specific predefined namespace. | ||
- This will force users to either use that namespace for serving, or they can | ||
copy the SRs to their namespace (either at runtime, or by including it in | ||
their manifests). | ||
Comment on lines
+241
to
+246
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is the easiest way to start with |
||
- Change ServingRuntimes to ClusterServingRuntimes and include them in an RPM, | ||
so they're accessible from any namespace (MicroShift is intended for | ||
single-user operation anyway). | ||
Comment on lines
+247
to
+249
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From a MicroShift user perspective this is probably the best option and we are considering to adopt ClusterServingRuntime in the future but not sure when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup, I also think it'd be the most convenient |
||
- Don't include SRs in any of the RPM. Instead include them in the documentation | ||
for users to copy and include in their manifests. | ||
- This might be prone to getting outdated very easily as documentation is not | ||
part of the MicroShift rebase procedure. | ||
|
||
|
||
### No MicroShift support for GPU Operator, Node Feature Discovery, other hardware-enabling Operators, etc... | ||
|
||
[RHOAI's how to on using Raw Deployment](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.16/html/serving_models/serving-large-models_serving-large-models#deploying-models-on-single-node-openshift-using-kserve-raw-deployment-mode_serving-large-models) lists some requirements such as: | ||
> - If you want to use graphics processing units (GPUs) with your model server, you have enabled GPU support in OpenShift AI. | ||
> - To use the vLLM runtime, you have enabled GPU support in OpenShift AI and have installed and configured the Node Feature Discovery operator on your cluster | ||
|
||
Neither GPU Operator, NFD, or Intel Gaudi AI Accelerator Operator are supported on MicroShift. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought we had enabled specific GPU support for MicroShift, e.g. https://docs.nvidia.com/datacenter/cloud-native/edge/latest/nvidia-gpu-with-device-edge.html There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, that should help then. At least with NVIDIA. |
||
|
||
Should users be instructed to use upstream releases of these components and configure them on their own? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather not - I would prefer everything being fully supported. I hate it if we point to upstream community stuff. I would rather work with partners to get them support MicroShift (like we do with NVIDIA) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link in your previous comment suggests we should be good to go with nvidia, so I don't think it's a problem (not sure about intel) |
||
|
||
## Versioning | ||
|
||
While it might be best to have RPM with RHOAI version, because RHOAI does not | ||
follow OpenShift's release schedule and the RPM will live in the MicroShift | ||
repository, it will be versioned together with MicroShift. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the release frequency of MicroShift? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MicroShift releases together with OpenShift, so I think around every 4 months |
||
|
||
It means that certain version of RHOAI's kserve will be bound to MicroShift's | ||
minor version. | ||
|
||
|
||
## Test Plan | ||
|
||
Because RHOAI is not support on ARM architecture, testing will not happen on nvidia Jetson Orin devices. | ||
pmtk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
First, a smoke test in MicroShift's test harness: | ||
- Stand up MicroShift device/cluster | ||
- Install kserve | ||
- Create an InferenceService using some basic model that can be easily handled by CPU | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok if this test case is CPU only I would cover only OpenVINO There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there something lighter just for smoke test? |
||
- Model shall be delivered using ModelCar approach (i.e. model inside an OCI container) | ||
- Make a HTTP call to the InferenceService to verify it is operational | ||
- Make a HTTP call to the InferenceService's `/metrics` endpoint | ||
(possibly before and after an inference request) | ||
|
||
This simple scenario will assert basic integration of the features: | ||
that everything deploys and starts correctly - sort of quick and easy sanity test. | ||
See [upstream kserve's "First InferenceService" guide](https://kserve.github.io/website/master/get_started/first_isvc/) | ||
which represents similar verification. | ||
|
||
Another test that should be implemented is using a hardware with an NVIDIA GPU, | ||
for example AWS EC2 instance. Goal is to assert that all the elements in the | ||
stack will work together on MicroShift. The test itself should not be much | ||
different from the sanity test (setup everything, make a call to the | ||
InferenceServing), but it should leverage a serving runtime and a model that | ||
require a GPU. | ||
Implementation of this test will reveal any additional dependencies such as | ||
device plugin, drivers, etc. | ||
This kind of test can run periodically - once or twice a week initially, | ||
with frequency adjusted later if needed. | ||
|
||
EC2 instance type candidates (ordered - chepeast first in us-west-2): | ||
- g4dn.xlarge (4 vCores 2nd gen Intel Xeon, 16 GiB, NVIDIA T4 GPU 16GiB) | ||
- g6.xlarge (4 vCores 3rd gen AMD EPYC, 16 GiB, NVIDIA L4 GPU 24GiB) | ||
- g5.xlarge (4 vCores 2nd gen AMD EPYC, 16 GiB, NVIDIA A10G GPU 24GiB) | ||
- g6e.xlarge (4 vCores 3rd gen AMD EPYC, 16 GiB, NVIDIA L40S GPU 48GiB) | ||
|
||
## Graduation Criteria | ||
|
||
### Dev Preview -> Tech Preview | ||
|
||
RHOAI's kserve on MicroShift will begin Tech Preview. | ||
|
||
### Tech Preview -> GA | ||
|
||
Advancement to GA depends on RHOAI's support for Raw Deployments. | ||
|
||
### Removing a deprecated feature | ||
|
||
N/A | ||
|
||
## Upgrade / Downgrade Strategy | ||
|
||
By having RHOAI's kserve in MicroShift spec they will share version, so it's | ||
expected that in case of MicroShift upgrade, the kserve also will be upgraded. | ||
|
||
MicroShift team might need to monitor (or simply test upgrades) version changes | ||
of the kserve CRDs, so there's always an upgrade path. | ||
|
||
## Version Skew Strategy | ||
|
||
MicroShift and kserve RPMs built from the same .spec file should not introduce | ||
a version skew. | ||
|
||
## Operational Aspects of API Extensions | ||
|
||
N/A | ||
|
||
## Support Procedures | ||
|
||
For the most part, RHOAI's and/or kserve support procedures are to be followed. | ||
|
||
Although there might some cases where debugging MicroShift might be required. | ||
One example is Ingress and Routes, as this is the element that kserve will | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about endpoint security? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point! I think that needs to be configurable by the user. And I can imagine serveral ways. It could be that the client is just another pod in another namespace, running "locall". That would either require
But it could also be "external" clients, and for those Does that make sense? |
||
integrate most with that is shipped as part of MicroShift. | ||
So procedures for debugging OCP router are to be followed. | ||
|
||
Other cases might involve hardware integration failures, those might depend | ||
on individual components. | ||
|
||
## Alternatives | ||
|
||
N/A | ||
|
||
## Infrastructure Needed [optional] | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would start with vLLM-cuda (there is an image per accelerator) to cover LLM and OpenVINO as second priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to share testing procedures for these in RHOAI? Steps and model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall we use this test suite to run many different scenarios in RHOAI as integration tests
https://github.com/red-hat-data-services/ods-ci/tree/master/ods_ci/tests/Tests/1000__model_serving
From MicroShift PoV I would say
OpenVINO Tests
vLLM Tests