-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USHIFT-5287: RHOAI Model Serving On MicroShift #1737
base: master
Are you sure you want to change the base?
Conversation
@pmtk: This pull request references USHIFT-5287 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/uncc @jhjaggars @LalatenduMohanty |
> - If you want to use graphics processing units (GPUs) with your model server, you have enabled GPU support in OpenShift AI. | ||
> - To use the vLLM runtime, you have enabled GPU support in OpenShift AI and have installed and configured the Node Feature Discovery operator on your cluster | ||
|
||
Neither GPU Operator, NFD, or Intel Gaudi AI Accelerator Operator are supported on MicroShift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had enabled specific GPU support for MicroShift, e.g. https://docs.nvidia.com/datacenter/cloud-native/edge/latest/nvidia-gpu-with-device-edge.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that should help then. At least with NVIDIA.
So one piece would be part of the procedure you linked, other part would be using nvidia triton
|
||
Neither GPU Operator, NFD, or Intel Gaudi AI Accelerator Operator are supported on MicroShift. | ||
|
||
Should users be instructed to use upstream releases of these components and configure them on their own? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather not - I would prefer everything being fully supported. I hate it if we point to upstream community stuff. I would rather work with partners to get them support MicroShift (like we do with NVIDIA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link in your previous comment suggests we should be good to go with nvidia, so I don't think it's a problem (not sure about intel)
/cc @lburgazzoli |
@pmtk: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the document, very well detailed.
I have some comments to clarify the scope but I don't see any particular issue with the proposal
- Provide RHOAI supported ServingRuntimes CRs so that users can use them. | ||
- [List of supported model-serving runtimes](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.16/html/serving_models/serving-large-models_serving-large-models#supported-model-serving-runtimes_serving-large-models) | ||
(not all might be suitable for MicroShift - e.g. intended for multi model serving) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would start with vLLM-cuda (there is an image per accelerator) to cover LLM and OpenVINO as second priority.
|
||
### Non-Goals | ||
|
||
- Deploying full RHOAI on MicroShift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean "bare KServe" without RHOAI operator at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. MicroShift is intended for edge deployments so we neither have resources nor needs for whole suite of tools. Just serving the models.
At the time of writing this document, kserve's Raw Deployment mode is not fully | ||
supported by the RHOAI. For for reason, this feature will start as Tech Preview | ||
and only advance to GA when RHOAI starts supporting Raw Deployment mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to change in like 1 release so I think we can already assume here to have it fully supported.
we shall work with partners to achieve that. We want to avoid directing users | ||
to generic upstream information without any support. | ||
|
||
### Do we need ODH Model Controller? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is something to double check probably: for example this controller has some logic about "ODH Connection" conversion (that is an "opinionated secret") to configure model storage credentials, or many other features ( see the different reconcilers as reference ).
- Include ServingRuntimes in a RPM as a kustomization manifest (might be | ||
`microshift-kserve`, `microshift-rhoai-runtimes`, or something else) using a | ||
specific predefined namespace. | ||
- This will force users to either use that namespace for serving, or they can | ||
copy the SRs to their namespace (either at runtime, or by including it in | ||
their manifests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the easiest way to start with
- Change ServingRuntimes to ClusterServingRuntimes and include them in an RPM, | ||
so they're accessible from any namespace (MicroShift is intended for | ||
single-user operation anyway). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a MicroShift user perspective this is probably the best option and we are considering to adopt ClusterServingRuntime in the future but not sure when
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I also think it'd be the most convenient
|
||
While it might be best to have RPM with RHOAI version, because RHOAI does not | ||
follow OpenShift's release schedule and the RPM will live in the MicroShift | ||
repository, it will be versioned together with MicroShift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the release frequency of MicroShift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MicroShift releases together with OpenShift, so I think around every 4 months
First, a smoke test in MicroShift's test harness: | ||
- Stand up MicroShift device/cluster | ||
- Install kserve | ||
- Create an InferenceService using some basic model that can be easily handled by CPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok if this test case is CPU only I would cover only OpenVINO
For the most part, RHOAI's and/or kserve support procedures are to be followed. | ||
|
||
Although there might some cases where debugging MicroShift might be required. | ||
One example is Ingress and Routes, as this is the element that kserve will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about endpoint security?
In RHOAI we use ServiceMesh + Authorino in the Serverless configuration while for RawDeployment we expect to use oauth-proxy (it will be replaced by something else that has not been defined yet)
No description provided.