Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USHIFT-5287: RHOAI Model Serving On MicroShift #1737

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

pmtk
Copy link
Member

@pmtk pmtk commented Jan 17, 2025

No description provided.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 17, 2025

@pmtk: This pull request references USHIFT-5287 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 17, 2025
Copy link
Contributor

openshift-ci bot commented Jan 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign syed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pmtk
Copy link
Member Author

pmtk commented Jan 20, 2025

/uncc @jhjaggars @LalatenduMohanty

> - If you want to use graphics processing units (GPUs) with your model server, you have enabled GPU support in OpenShift AI.
> - To use the vLLM runtime, you have enabled GPU support in OpenShift AI and have installed and configured the Node Feature Discovery operator on your cluster

Neither GPU Operator, NFD, or Intel Gaudi AI Accelerator Operator are supported on MicroShift.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had enabled specific GPU support for MicroShift, e.g. https://docs.nvidia.com/datacenter/cloud-native/edge/latest/nvidia-gpu-with-device-edge.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that should help then. At least with NVIDIA.
So one piece would be part of the procedure you linked, other part would be using nvidia triton


Neither GPU Operator, NFD, or Intel Gaudi AI Accelerator Operator are supported on MicroShift.

Should users be instructed to use upstream releases of these components and configure them on their own?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather not - I would prefer everything being fully supported. I hate it if we point to upstream community stuff. I would rather work with partners to get them support MicroShift (like we do with NVIDIA)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link in your previous comment suggests we should be good to go with nvidia, so I don't think it's a problem (not sure about intel)

@pmtk
Copy link
Member Author

pmtk commented Jan 24, 2025

/cc @lburgazzoli

@openshift-ci openshift-ci bot requested a review from lburgazzoli January 24, 2025 11:56
Copy link
Contributor

openshift-ci bot commented Jan 29, 2025

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

@danielezonca danielezonca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the document, very well detailed.

I have some comments to clarify the scope but I don't see any particular issue with the proposal

Comment on lines +45 to +47
- Provide RHOAI supported ServingRuntimes CRs so that users can use them.
- [List of supported model-serving runtimes](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.16/html/serving_models/serving-large-models_serving-large-models#supported-model-serving-runtimes_serving-large-models)
(not all might be suitable for MicroShift - e.g. intended for multi model serving)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would start with vLLM-cuda (there is an image per accelerator) to cover LLM and OpenVINO as second priority.


### Non-Goals

- Deploying full RHOAI on MicroShift.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean "bare KServe" without RHOAI operator at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. MicroShift is intended for edge deployments so we neither have resources nor needs for whole suite of tools. Just serving the models.

Comment on lines +139 to +141
At the time of writing this document, kserve's Raw Deployment mode is not fully
supported by the RHOAI. For for reason, this feature will start as Tech Preview
and only advance to GA when RHOAI starts supporting Raw Deployment mode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to change in like 1 release so I think we can already assume here to have it fully supported.

we shall work with partners to achieve that. We want to avoid directing users
to generic upstream information without any support.

### Do we need ODH Model Controller?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is something to double check probably: for example this controller has some logic about "ODH Connection" conversion (that is an "opinionated secret") to configure model storage credentials, or many other features ( see the different reconcilers as reference ).

Comment on lines +200 to +205
- Include ServingRuntimes in a RPM as a kustomization manifest (might be
`microshift-kserve`, `microshift-rhoai-runtimes`, or something else) using a
specific predefined namespace.
- This will force users to either use that namespace for serving, or they can
copy the SRs to their namespace (either at runtime, or by including it in
their manifests).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the easiest way to start with

Comment on lines +206 to +208
- Change ServingRuntimes to ClusterServingRuntimes and include them in an RPM,
so they're accessible from any namespace (MicroShift is intended for
single-user operation anyway).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a MicroShift user perspective this is probably the best option and we are considering to adopt ClusterServingRuntime in the future but not sure when

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I also think it'd be the most convenient


While it might be best to have RPM with RHOAI version, because RHOAI does not
follow OpenShift's release schedule and the RPM will live in the MicroShift
repository, it will be versioned together with MicroShift.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the release frequency of MicroShift?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MicroShift releases together with OpenShift, so I think around every 4 months

First, a smoke test in MicroShift's test harness:
- Stand up MicroShift device/cluster
- Install kserve
- Create an InferenceService using some basic model that can be easily handled by CPU

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok if this test case is CPU only I would cover only OpenVINO

For the most part, RHOAI's and/or kserve support procedures are to be followed.

Although there might some cases where debugging MicroShift might be required.
One example is Ingress and Routes, as this is the element that kserve will

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about endpoint security?
In RHOAI we use ServiceMesh + Authorino in the Serverless configuration while for RawDeployment we expect to use oauth-proxy (it will be replaced by something else that has not been defined yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants