-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add support for kserve #877
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request adds support for generating KServe YAML definitions, enabling users to deploy AI models as KServe services within a Kubernetes environment. It introduces a new Sequence diagram for generating KServe configurationsequenceDiagram
participant CLI
participant Model
participant Kserve
CLI->>Model: serve(args)
Model->>Model: execute_command(model_path, exec_args, args)
alt args.generate == "kserve"
Model->>Model: kserve(model_path, args, exec_args)
Model->>Kserve: __init__(model_path, image, args, exec_args)
Kserve->>Kserve: generate()
Kserve-->>Model: True
Model-->>CLI: None
else
Model-->>CLI: None
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rhatdan - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider generating the KServe runtime file only when necessary, instead of every time.
- The KServe code duplicates some logic from other modules; consider refactoring to share code.
Here's what I looked at during the review
- 🟡 General issues: 3 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
env_var_string += f"Environment={k}={v}\n" | ||
|
||
_gpu = "" | ||
if os.getenv("CUDA_VISIBLE_DEVICES") != "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): GPU env var check may be flawed.
The condition os.getenv("CUDA_VISIBLE_DEVICES") != "" will return True even if the variable is not set (i.e. returns None). Consider using a check such as if os.getenv("CUDA_VISIBLE_DEVICES") to better capture whether the variable is defined and non-empty.
resources: | ||
limits: | ||
cpu: "6" | ||
memory: 24Gi{gpu} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Potential undefined variable 'gpu'.
If neither CUDA_VISIBLE_DEVICES nor HIP_VISIBLE_DEVICES is set, the variable 'gpu' will not be defined before it's used in the f-string. Initializing 'gpu' to an empty string by default would prevent a potential NameError.
@@ -60,8 +60,9 @@ Generate specified configuration format for running the AI Model as a service | |||
|
|||
| Key | Description | | |||
| ------------ | -------------------------------------------------------------------------| | |||
| quadlet | Podman supported container definition for running AI Model under systemd | | |||
| kserve | Kserve YAML definition for running the AI Model as a kserve service in Kubernetes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (typo): Typo: "Kserve" should be "KServe".
| kserve | Kserve YAML definition for running the AI Model as a kserve service in Kubernetes | | |
| kserve | KServe YAML definition for running the AI Model as a KServe service in Kubernetes | |
outfile = self.name + "-kserve-runtime.yaml" | ||
outfile = outfile.replace(":", "-") | ||
print(f"Generating kserve runtime file: {outfile}") | ||
with open(outfile, 'w') as c: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider using a templating engine like Jinja2 to generate the YAML files, which will reduce code duplication and improve readability.
Consider abstracting YAML creation into a dedicated templating helper to reduce the inline formatting repetition. For example, you might use Jinja2 templates (or PyYAML with dictionaries) to consolidate and reuse the YAML structure. Here’s a concise example using Jinja2:
from jinja2 import Template
def create_yaml(template_str, **params):
return Template(template_str).render(**params)
# Define your runtime YAML template once.
KSERVE_RUNTIME_TMPL = """
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: {{ runtime }}-runtime
annotations:
openshift.io/display-name: "KServe ServingRuntime for {{ model }}"
opendatahub.io/recommended-accelerators: '["{{ gpu }}"]'
labels:
opendatahub.io/dashboard: 'true'
spec:
annotations:
prometheus.io/port: '{{ port }}'
prometheus.io/path: '/metrics'
multiModel: false
supportedModelFormats:
- autoSelect: true
name: vLLM
containers:
- name: kserve-container
image: {{ image }}
command: ["python", "-m", "vllm.entrypoints.openai.api_server"]
args: ["--port={{ port }}", "--model=/mnt/models", "--served-model-name={{ name }}"]
env:
- name: HF_HOME
value: /tmp/hf_home
ports:
- containerPort: {{ port }}
protocol: TCP
"""
# In your generate() method:
yaml_content = create_yaml(KSERVE_RUNTIME_TMPL,
runtime=self.runtime,
model=self.model,
gpu=_gpu if _gpu else "",
port=self.args.port,
image=self.image,
name=self.name)
with open(self.name + "-kserve-runtime.yaml".replace(":", "-"), 'w') as c:
c.write(yaml_content)
Repeat a similar approach for the second YAML. This not only reduces repetition but also improves readability and maintainability.
Signed-off-by: Daniel J Walsh <[email protected]>
annotations: | ||
openshift.io/display-name: KServe ServingRuntime for {self.model} | ||
opendatahub.io/recommended-accelerators: '["{_gpu}"]' | ||
labels: | ||
opendatahub.io/dashboard: 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotations: | |
openshift.io/display-name: KServe ServingRuntime for {self.model} | |
opendatahub.io/recommended-accelerators: '["{_gpu}"]' | |
labels: | |
opendatahub.io/dashboard: 'true' |
Let's remove them for now, they are openshift/openshift ai specific.
Sorry I should have excluded them in the example I have shared
name: vLLM | ||
containers: | ||
- name: kserve-container | ||
image: {self.image} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks wrong, as far as I understand checking the example code, it will return ramalama as image and not vLLM.
Moreover vLLM is specific per accelerator so the same if/else condition to produce the gpu requirements should be applied here to select the right image
apiVersion: serving.kserve.io/v1alpha1 | ||
kind: ServingRuntime | ||
metadata: | ||
name: {self.runtime}-runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is misleading, it produces a llama.cpp-runtime
value but this is supposed to be vLLM
@@ -60,8 +60,9 @@ Generate specified configuration format for running the AI Model as a service | |||
|
|||
| Key | Description | | |||
| ------------ | -------------------------------------------------------------------------| | |||
| quadlet | Podman supported container definition for running AI Model under systemd | | |||
| kserve | Kserve YAML definition for running the AI Model as a kserve service in Kubernetes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| kserve | Kserve YAML definition for running the AI Model as a kserve service in Kubernetes | | |
| kserve | Kserve YAML definition for running the AI Model as a kserve service in Kubernetes using vLLM | |
Summary by Sourcery
This pull request introduces support for generating KServe configurations, allowing users to deploy AI models on Kubernetes using the KServe framework. It adds a new
kserve
option to theramalama serve --generate
command, which generates the necessary YAML files for deploying a model as a KServe service.New Features:
Enhancements:
ramalama serve
command now accepts a--generate kserve
option to generate KServe YAML files.