[WIP] Add support for kserve #877

rhatdan · 2025-02-24T18:12:52Z

Summary by Sourcery

This pull request introduces support for generating KServe configurations, allowing users to deploy AI models on Kubernetes using the KServe framework. It adds a new kserve option to the ramalama serve --generate command, which generates the necessary YAML files for deploying a model as a KServe service.

New Features:

Adds support for generating KServe YAML definitions for running AI models as a service in Kubernetes, enabling deployment and management of models using the KServe framework.

Enhancements:

The ramalama serve command now accepts a --generate kserve option to generate KServe YAML files.
The generated KServe YAML files include resource requests and limits for CPU, memory, and GPU (if available).

sourcery-ai · 2025-02-24T18:13:00Z

Reviewer's Guide by Sourcery

This pull request adds support for generating KServe YAML definitions, enabling users to deploy AI models as KServe services within a Kubernetes environment. It introduces a new Kserve class responsible for generating the necessary YAML files and integrates it into the existing ramalama serve command-line interface.

Sequence diagram for generating KServe configuration

sequenceDiagram
    participant CLI
    participant Model
    participant Kserve

    CLI->>Model: serve(args)
    Model->>Model: execute_command(model_path, exec_args, args)
    alt args.generate == "kserve"
        Model->>Model: kserve(model_path, args, exec_args)
        Model->>Kserve: __init__(model_path, image, args, exec_args)
        Kserve->>Kserve: generate()
        Kserve-->>Model: True
        Model-->>CLI: None
    else
        Model-->>CLI: None
    end

File-Level Changes

Change	Details	Files
Introduces support for generating KServe YAML definitions for deploying AI models as KServe services in Kubernetes.	Adds 'kserve' as a choice to the `--generate` argument in the `serve` command. Creates a new `Kserve` class to handle the generation of KServe YAML files. Implements the `generate` method in the `Kserve` class to create the necessary YAML files for deploying a model with KServe. Adds logic to the `generate_container_config` method to call the new KServe functionality. Adds documentation for generating kserve service.	`docs/ramalama-serve.1.md` `ramalama/model.py` `ramalama/cli.py` `ramalama/kserve.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @rhatdan - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider generating the KServe runtime file only when necessary, instead of every time.
The KServe code duplicates some logic from other modules; consider refactoring to share code.

Here's what I looked at during the review

🟡 General issues: 3 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-02-24T18:14:00Z

ramalama/kserve.py

+            env_var_string += f"Environment={k}={v}\n"
+
+        _gpu = ""
+        if os.getenv("CUDA_VISIBLE_DEVICES") != "":


issue (bug_risk): GPU env var check may be flawed.

The condition os.getenv("CUDA_VISIBLE_DEVICES") != "" will return True even if the variable is not set (i.e. returns None). Consider using a check such as if os.getenv("CUDA_VISIBLE_DEVICES") to better capture whether the variable is defined and non-empty.

sourcery-ai · 2025-02-24T18:14:00Z

ramalama/kserve.py

+      resources:
+        limits:
+          cpu: "6"
+          memory: 24Gi{gpu}


issue (bug_risk): Potential undefined variable 'gpu'.

If neither CUDA_VISIBLE_DEVICES nor HIP_VISIBLE_DEVICES is set, the variable 'gpu' will not be defined before it's used in the f-string. Initializing 'gpu' to an empty string by default would prevent a potential NameError.

sourcery-ai · 2025-02-24T18:14:00Z

docs/ramalama-serve.1.md

@@ -60,8 +60,9 @@ Generate specified configuration format for running the AI Model as a service

 | Key          | Description                                                              |
 | ------------ | -------------------------------------------------------------------------|
-| quadlet      | Podman supported container definition for running AI Model under systemd |
+| kserve       | Kserve YAML definition for running the AI Model as a kserve service in Kubernetes        |


issue (typo): Typo: "Kserve" should be "KServe".

Suggested change

| kserve | Kserve YAML definition for running the AI Model as a kserve service in Kubernetes |

| kserve | KServe YAML definition for running the AI Model as a KServe service in Kubernetes |

sourcery-ai · 2025-02-24T18:14:01Z

ramalama/kserve.py

+        outfile = self.name + "-kserve-runtime.yaml"
+        outfile = outfile.replace(":", "-")
+        print(f"Generating kserve runtime file: {outfile}")
+        with open(outfile, 'w') as c:


issue (complexity): Consider using a templating engine like Jinja2 to generate the YAML files, which will reduce code duplication and improve readability.

Consider abstracting YAML creation into a dedicated templating helper to reduce the inline formatting repetition. For example, you might use Jinja2 templates (or PyYAML with dictionaries) to consolidate and reuse the YAML structure. Here’s a concise example using Jinja2:

from jinja2 import Template def create_yaml(template_str, **params): return Template(template_str).render(**params) # Define your runtime YAML template once. KSERVE_RUNTIME_TMPL = """ apiVersion: serving.kserve.io/v1alpha1 kind: ServingRuntime metadata: name: {{ runtime }}-runtime annotations: openshift.io/display-name: "KServe ServingRuntime for {{ model }}" opendatahub.io/recommended-accelerators: '["{{ gpu }}"]' labels: opendatahub.io/dashboard: 'true' spec: annotations: prometheus.io/port: '{{ port }}' prometheus.io/path: '/metrics' multiModel: false supportedModelFormats: - autoSelect: true name: vLLM containers: - name: kserve-container image: {{ image }} command: ["python", "-m", "vllm.entrypoints.openai.api_server"] args: ["--port={{ port }}", "--model=/mnt/models", "--served-model-name={{ name }}"] env: - name: HF_HOME value: /tmp/hf_home ports: - containerPort: {{ port }} protocol: TCP """ # In your generate() method: yaml_content = create_yaml(KSERVE_RUNTIME_TMPL, runtime=self.runtime, model=self.model, gpu=_gpu if _gpu else "", port=self.args.port, image=self.image, name=self.name) with open(self.name + "-kserve-runtime.yaml".replace(":", "-"), 'w') as c: c.write(yaml_content)

Repeat a similar approach for the second YAML. This not only reduces repetition but also improves readability and maintainability.

Signed-off-by: Daniel J Walsh <[email protected]>

danielezonca · 2025-02-24T18:28:54Z

ramalama/kserve.py

+  annotations:
+    openshift.io/display-name: KServe ServingRuntime for {self.model}
+    opendatahub.io/recommended-accelerators: '["{_gpu}"]'
+  labels:
+    opendatahub.io/dashboard: 'true'


Suggested change

annotations:

openshift.io/display-name: KServe ServingRuntime for {self.model}

opendatahub.io/recommended-accelerators: '["{_gpu}"]'

labels:

opendatahub.io/dashboard: 'true'

Let's remove them for now, they are openshift/openshift ai specific.
Sorry I should have excluded them in the example I have shared

danielezonca · 2025-02-24T18:35:08Z

ramalama/kserve.py

+      name: vLLM
+  containers:
+    - name: kserve-container
+      image: {self.image}


This code looks wrong, as far as I understand checking the example code, it will return ramalama as image and not vLLM.
Moreover vLLM is specific per accelerator so the same if/else condition to produce the gpu requirements should be applied here to select the right image

danielezonca · 2025-02-24T18:37:15Z

ramalama/kserve.py

+apiVersion: serving.kserve.io/v1alpha1
+kind: ServingRuntime
+metadata:
+  name: {self.runtime}-runtime


I think this is misleading, it produces a llama.cpp-runtime value but this is supposed to be vLLM

danielezonca · 2025-02-24T18:37:39Z

docs/ramalama-serve.1.md

@@ -60,8 +60,9 @@ Generate specified configuration format for running the AI Model as a service

 | Key          | Description                                                              |
 | ------------ | -------------------------------------------------------------------------|
-| quadlet      | Podman supported container definition for running AI Model under systemd |
+| kserve       | Kserve YAML definition for running the AI Model as a kserve service in Kubernetes        |


rhatdan requested review from ericcurtin, bmahabirbu, maxamillion, swarajpande5, jhjaggars, cgruver, slp and engelmi as code owners February 24, 2025 18:12

sourcery-ai bot reviewed Feb 24, 2025

View reviewed changes

Add support for kserve

40cd706

Signed-off-by: Daniel J Walsh <[email protected]>

rhatdan force-pushed the kserve branch from 775232c to 40cd706 Compare February 24, 2025 18:16

danielezonca reviewed Feb 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add support for kserve #877

[WIP] Add support for kserve #877

rhatdan commented Feb 24, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 24, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Feb 24, 2025

sourcery-ai bot Feb 24, 2025

sourcery-ai bot Feb 24, 2025

sourcery-ai bot Feb 24, 2025

danielezonca Feb 24, 2025

danielezonca Feb 24, 2025

danielezonca Feb 24, 2025

danielezonca Feb 24, 2025

	\| kserve \| Kserve YAML definition for running the AI Model as a kserve service in Kubernetes \|
	\| kserve \| KServe YAML definition for running the AI Model as a KServe service in Kubernetes \|

[WIP] Add support for kserve #877

Are you sure you want to change the base?

[WIP] Add support for kserve #877

Conversation

rhatdan commented Feb 24, 2025 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Feb 24, 2025 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for generating KServe configuration

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Feb 24, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 24, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 24, 2025

Choose a reason for hiding this comment

sourcery-ai bot Feb 24, 2025

Choose a reason for hiding this comment

danielezonca Feb 24, 2025

Choose a reason for hiding this comment

danielezonca Feb 24, 2025

Choose a reason for hiding this comment

danielezonca Feb 24, 2025

Choose a reason for hiding this comment

danielezonca Feb 24, 2025

Choose a reason for hiding this comment

rhatdan commented Feb 24, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 24, 2025 •

edited

Loading