Skip to content

backend; LM Deploy#142

Merged
thushan merged 9 commits intomainfrom
feature/backend/lmdeploy
Apr 27, 2026
Merged

backend; LM Deploy#142
thushan merged 9 commits intomainfrom
feature/backend/lmdeploy

Conversation

@thushan
Copy link
Copy Markdown
Owner

@thushan thushan commented Apr 27, 2026

Adds native support for LMDeploy finally.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added native support for LMDeploy as an inference backend, including OpenAI-compatible chat/completions endpoints alongside LMDeploy-specific features (token encoding, reward pooling).
  • Documentation

    • Added comprehensive integration guide, API reference, and backend configuration documentation for LMDeploy.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@thushan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 45 minutes and 6 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d6c4ea85-4709-43f1-82a7-f1faea847978

📥 Commits

Reviewing files that changed from the base of the PR and between bbdd5f0 and 19b712b.

📒 Files selected for processing (4)
  • config/profiles/lmdeploy.yaml
  • docs/content/api-reference/lmdeploy.md
  • docs/content/index.md
  • readme.md

Walkthrough

This pull request adds comprehensive support for LMDeploy as a new LLM backend integration to Olla. The change includes a configuration profile defining API endpoints and metrics extraction, converter and parser implementations for model response handling, factory and route registration, provider constants, and extensive documentation covering integration, API reference, and usage examples.

Changes

Cohort / File(s) Summary
LMDeploy Configuration Profile
config/profiles/lmdeploy.yaml
New profile declaring routing signals, endpoint paths (health, models, chat/completions, encode, generate, pooling), request/response parsing behavior, model heuristics, and metrics extraction rules.
Core Constants and Domain
internal/core/constants/providers.go, internal/core/domain/profile.go, internal/version/version.go
Added LMDeploy provider identifier constants, profile domain constant, and registered lmdeploy in supported backends list.
Converter Implementation
internal/adapter/converter/lmdeploy_converter.go, internal/adapter/converter/lmdeploy_converter_test.go
New converter producing OpenAI-like model list responses; resolves model IDs from provider aliases, sets ownership/creation metadata, and generates permission objects with capability flags.
Converter Registration
internal/adapter/converter/factory.go, internal/adapter/converter/factory_test.go
Registered new LMDeploy converter in factory and updated factory tests to recognise and validate the new format.
Profile Parser Implementation
internal/adapter/registry/profile/lmdeploy.go, internal/adapter/registry/profile/lmdeploy_parser.go, internal/adapter/registry/profile/lmdeploy_parser_test.go
Added response types modelling LMDeploy model hierarchy and permissions; implemented parser converting LMDeploy JSON to unified ModelInfo with conditional detail mapping (created timestamp, publisher, parent model).
Parser Registration
internal/adapter/registry/profile/parsers.go
Registered LMDeploy parser in format dispatcher to enable response parsing.
Route and Handler Configuration
internal/app/handlers/handler_common.go, internal/app/handlers/server_routes.go
Added LMDeploy to static provider validation; registered OpenAI-compatible /v1/models handler and proxy routing for all LMDeploy requests.
API Reference Documentation
docs/content/api-reference/lmdeploy.md, docs/content/api-reference/overview.md, docs/content/index.md
New API reference page documenting LMDeploy routes, streaming behavior, unsupported endpoints, response headers, and curl examples; updated overview sections to include lmdeploy backend type.
Integration Documentation
docs/content/integrations/backend/lmdeploy.md, docs/content/integrations/overview.md
New integration guide covering registration, configuration, auth, supported/unsupported endpoints, examples, startup commands, sleep/wake behaviour, and OpenAI SDK usage; added LMDeploy row to backend table.
General Documentation Updates
docs/content/concepts/overview.md, docs/content/faq.md, readme.md, docs/mkdocs.yml
Updated documentation lists and examples to include LMDeploy; changed integration badge from "OpenAI Compatible" to "Native Support"; added MkDocs navigation entries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #44: Adds a parallel VLLMBackend integration following the same factory/registry/parser/constants registration pattern, making similar multi-file changes across converter and route handler layers.
  • PR #70: Introduces another new provider backend integration with equivalent code-level changes (converter factory registration, provider/profile constants, parser/registry types, route handler updates, SupportedBackends entry).
  • PR #33: Establishes the foundational provider-profile routing and provider-specific handler registration machinery that this LMDeploy integration extends and builds upon.

Suggested labels

enhancement, llm-backend, documentation

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'backend; LM Deploy' directly relates to the changeset, which adds native support for LMDeploy as a backend provider across the codebase and documentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/backend/lmdeploy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
docs/content/api-reference/lmdeploy.md (1)

257-257: Sample response is not valid JSON.

[0.0123, -0.0456, 0.0789, ...] will fail any JSON validator readers paste it into. Either drop the trailing ... or annotate it as a non-parseable excerpt (e.g., // truncated). Same pattern is used elsewhere in the docs (LM Studio embeddings) so it's not a hard requirement, but worth tightening for new docs.

📝 Suggested edit
-      "embedding": [0.0123, -0.0456, 0.0789, ...]
+      "embedding": [0.0123, -0.0456, 0.0789]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/content/api-reference/lmdeploy.md` at line 257, The sample JSON under
the "embedding" field is invalid because it includes a trailing ellipsis; update
the example so it's valid JSON by either removing the "..." (e.g., "embedding":
[0.0123, -0.0456, 0.0789]) or marking it as a truncated excerpt using a non-JSON
comment/annotation (e.g., add a note like "// truncated" outside the JSON block
or replace array with a clearly labeled excerpt), and apply the same fix to
other LM Studio embedding examples to ensure all samples validate.
internal/adapter/registry/profile/lmdeploy_parser.go (1)

32-46: Optional: replace hard-coded "lmdeploy" literals with constants.ProviderTypeLMDeploy.

The provider name appears twice as a string literal (Lines 32 and 46). Using the existing constants.ProviderTypeLMDeploy would prevent silent drift if the constant is ever renamed. Note that the sibling vllm_parser.go uses literals as well, so this is a consistency-vs-DRY trade-off — feel free to defer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/registry/profile/lmdeploy_parser.go` around lines 32 - 46,
Replace the hard-coded provider string "lmdeploy" with the shared constant
constants.ProviderTypeLMDeploy in the lmdeploy parser: set Type to
constants.ProviderTypeLMDeploy where currently Type: "lmdeploy" (in the created
profile object) and change the ownership check to compare model.OwnedBy !=
constants.ProviderTypeLMDeploy instead of the literal; this keeps the symbol
names centralized and prevents drift (mirror this change only in
lmdeploy_parser.go; vllm_parser.go can remain as-is if you prefer consistency
choice).
internal/adapter/registry/profile/lmdeploy_parser_test.go (1)

168-194: Optional: this test does not actually exercise the "if it somehow appears" case.

The comment on Line 172 promises tolerance "if it somehow appears", but the JSON payload omits max_model_len entirely. To actually validate the claim, include the field in the payload (e.g. "max_model_len": 32768) and assert that parsing still succeeds and MaxContextLength remains nil/zero. As-is, this case overlaps with the earlier "details nil when no metadata beyond default owned_by" test.

🧪 Suggested payload tweak
 		response := `{
 			"object": "list",
 			"data": [
 				{
 					"id": "qwen/Qwen2-7B-Instruct",
 					"object": "model",
 					"created": 1754535990,
-					"owned_by": "lmdeploy"
+					"owned_by": "lmdeploy",
+					"max_model_len": 32768
 				}
 			]
 		}`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/registry/profile/lmdeploy_parser_test.go` around lines 168 -
194, Update the test "no max_model_len field — LMDeploy differs from vLLM here"
to actually exercise the "if it somehow appears" case by adding a
"max_model_len" field (e.g. "max_model_len": 32768) to the JSON response string,
then call parser.Parse as before and assert parsing still succeeds
(require.NoError) and that the parsed model's Details.MaxContextLength remains
nil (or zero/unpopulated) — reference the existing parser.Parse call and
model.Details.MaxContextLength assertion to locate where to add the new payload
and additional assertion.
internal/adapter/converter/lmdeploy_converter.go (2)

52-91: OwnedBy derivation uses the resolved modelID, which can be an upstream alias.

When findLMDeployNativeName returns "" and model.Aliases[0].Name is e.g. "alias-from-ollama" (no slash), OwnedBy becomes "lmdeploy" — which then gets suppressed by the parser on the way back in (Line 46 of lmdeploy_parser.go). That's coherent. However, if the unified model.ID carries a more accurate org/... prefix than the alias, that information is lost. Consider deriving OwnedBy from model.ID (or a publisher field on the unified model) rather than from the chosen wire ID.

♻️ Possible refactor
-	m := &profile.LMDeployModel{
-		ID:      modelID,
-		Object:  "model",
-		Created: now,
-		OwnedBy: c.determineOwner(modelID),
-	}
+	owner := c.determineOwner(modelID)
+	if owner == constants.ProviderTypeLMDeploy {
+		// Fall back to the unified ID's org segment when the wire ID lacks one.
+		owner = c.determineOwner(model.ID)
+	}
+	m := &profile.LMDeployModel{
+		ID:      modelID,
+		Object:  "model",
+		Created: now,
+		OwnedBy: owner,
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/converter/lmdeploy_converter.go` around lines 52 - 91, In
convertModel (LMDeployConverter.convertModel) the OwnedBy is computed from the
resolved wire modelID (after findLMDeployNativeName / alias fallback) which can
strip org prefix; instead derive OwnedBy from the unified model's original
identifier or publisher field (e.g., use model.ID or model.Publisher if
available) and pass that into determineOwner while still using the resolved
modelID for the LMDeploy wire ID; update the code so you capture originalID :=
model.ID (or model.Publisher) before you replace modelID and call
c.determineOwner(originalID) for the OwnedBy field, leaving modelID assignment
and Permission generation unchanged.

13-15: Remove unused type aliases.

The type aliases LMDeployModelResponse and LMDeployModelData are not referenced anywhere in the codebase outside their definitions. The comment stating they exist "for backward compatibility with tests" is inaccurate—the test file directly uses profile.LMDeployResponse and profile.LMDeployModel. These aliases unnecessarily broaden the public surface and can be safely removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/adapter/converter/lmdeploy_converter.go` around lines 13 - 15,
Remove the two unused backward-compatibility type aliases LMDeployModelResponse
and LMDeployModelData from the converter file: delete the alias declarations
"type LMDeployModelResponse = profile.LMDeployResponse" and "type
LMDeployModelData = profile.LMDeployModel" and any accompanying misleading
comment, ensuring no other code references these symbols (tests already use
profile.LMDeployResponse/profile.LMDeployModel); run tests and remove any
now-unused imports if the file becomes free of profile usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/profiles/lmdeploy.yaml`:
- Around line 167-174: The calculations block uses generation_time_ms in
tokens_per_second but lmdeploy.yaml's metrics.extraction.paths lacks extraction
for generation_time_ms; update the metrics.extraction.paths section to include
generation_time_ms with the same extraction key used in other profiles (e.g.,
generation_time_ms: "$.metrics.generation_time_ms") so the tokens_per_second
calculation in calculations.tokens_per_second can compute correctly.
- Around line 27-35: The config lists an LMDeploy-only probe endpoint
`/is_sleeping` which upstream does not expose; remove `/is_sleeping` from
api.paths and any detection.path_indicators, or verify and replace it with a
real LMDeploy probe if you confirm one exists. Specifically, update the entries
that mention `/is_sleeping` in the profile (look for api.paths and
detection.path_indicators) so detection relies only on verified endpoints like
`/generate` and `/pooling`, avoiding a probe that will 404.

In `@docs/content/api-reference/lmdeploy.md`:
- Around line 123-131: The streaming SSE fenced code block showing the "data:
{...}" chunks should include a language identifier to satisfy markdownlint
MD040; update the triple-backtick fence around the SSE example in lmdeploy.md so
the opening fence is ```text (rather than just ```), keeping the block contents
unchanged; locate the fenced block that contains the sample lines starting with
"data: {\"id\":\"chatcmpl-lmdeploy-abc123\"" and add the language tag to the
opening fence.

In `@readme.md`:
- Line 13: Update the companion badge in docs/content/index.md by replacing the
old image/alt text "LM Deploy-openai-lightblue" and alt="LM Deploy: OpenAI
Compatible" with the new badge used in the README: image name "LM
Deploy-openai-lightgreen" and alt="LM Deploy: Native Support", and ensure the
anchor/img markup and href match the README's badge markup so the docs and
README display the same badge.
- Line 13: Update the shields.io image URL so the badge message shows "native"
instead of "openai": in the README anchor/img element where the src is
"https://img.shields.io/badge/LM Deploy-openai-lightgreen.svg" (and the
corresponding alt currently "LM Deploy: Native Support"), change the "openai"
segment to "native" (i.e., use "LM Deploy-native-lightgreen.svg") so the
rendered badge text matches the alt text and color convention used by other
native badges.

---

Nitpick comments:
In `@docs/content/api-reference/lmdeploy.md`:
- Line 257: The sample JSON under the "embedding" field is invalid because it
includes a trailing ellipsis; update the example so it's valid JSON by either
removing the "..." (e.g., "embedding": [0.0123, -0.0456, 0.0789]) or marking it
as a truncated excerpt using a non-JSON comment/annotation (e.g., add a note
like "// truncated" outside the JSON block or replace array with a clearly
labeled excerpt), and apply the same fix to other LM Studio embedding examples
to ensure all samples validate.

In `@internal/adapter/converter/lmdeploy_converter.go`:
- Around line 52-91: In convertModel (LMDeployConverter.convertModel) the
OwnedBy is computed from the resolved wire modelID (after findLMDeployNativeName
/ alias fallback) which can strip org prefix; instead derive OwnedBy from the
unified model's original identifier or publisher field (e.g., use model.ID or
model.Publisher if available) and pass that into determineOwner while still
using the resolved modelID for the LMDeploy wire ID; update the code so you
capture originalID := model.ID (or model.Publisher) before you replace modelID
and call c.determineOwner(originalID) for the OwnedBy field, leaving modelID
assignment and Permission generation unchanged.
- Around line 13-15: Remove the two unused backward-compatibility type aliases
LMDeployModelResponse and LMDeployModelData from the converter file: delete the
alias declarations "type LMDeployModelResponse = profile.LMDeployResponse" and
"type LMDeployModelData = profile.LMDeployModel" and any accompanying misleading
comment, ensuring no other code references these symbols (tests already use
profile.LMDeployResponse/profile.LMDeployModel); run tests and remove any
now-unused imports if the file becomes free of profile usage.

In `@internal/adapter/registry/profile/lmdeploy_parser_test.go`:
- Around line 168-194: Update the test "no max_model_len field — LMDeploy
differs from vLLM here" to actually exercise the "if it somehow appears" case by
adding a "max_model_len" field (e.g. "max_model_len": 32768) to the JSON
response string, then call parser.Parse as before and assert parsing still
succeeds (require.NoError) and that the parsed model's Details.MaxContextLength
remains nil (or zero/unpopulated) — reference the existing parser.Parse call and
model.Details.MaxContextLength assertion to locate where to add the new payload
and additional assertion.

In `@internal/adapter/registry/profile/lmdeploy_parser.go`:
- Around line 32-46: Replace the hard-coded provider string "lmdeploy" with the
shared constant constants.ProviderTypeLMDeploy in the lmdeploy parser: set Type
to constants.ProviderTypeLMDeploy where currently Type: "lmdeploy" (in the
created profile object) and change the ownership check to compare model.OwnedBy
!= constants.ProviderTypeLMDeploy instead of the literal; this keeps the symbol
names centralized and prevents drift (mirror this change only in
lmdeploy_parser.go; vllm_parser.go can remain as-is if you prefer consistency
choice).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 28bdd97d-e339-47cd-b972-1c1ba3e1a41e

📥 Commits

Reviewing files that changed from the base of the PR and between a500060 and bbdd5f0.

📒 Files selected for processing (23)
  • config/profiles/lmdeploy.yaml
  • docs/content/api-reference/lmdeploy.md
  • docs/content/api-reference/overview.md
  • docs/content/concepts/overview.md
  • docs/content/faq.md
  • docs/content/index.md
  • docs/content/integrations/backend/lmdeploy.md
  • docs/content/integrations/overview.md
  • docs/mkdocs.yml
  • internal/adapter/converter/factory.go
  • internal/adapter/converter/factory_test.go
  • internal/adapter/converter/lmdeploy_converter.go
  • internal/adapter/converter/lmdeploy_converter_test.go
  • internal/adapter/registry/profile/lmdeploy.go
  • internal/adapter/registry/profile/lmdeploy_parser.go
  • internal/adapter/registry/profile/lmdeploy_parser_test.go
  • internal/adapter/registry/profile/parsers.go
  • internal/app/handlers/handler_common.go
  • internal/app/handlers/server_routes.go
  • internal/core/constants/providers.go
  • internal/core/domain/profile.go
  • internal/version/version.go
  • readme.md

Comment on lines +27 to +35
# LMDeploy-specific endpoints
- /v1/encode # 4: token encoding (LMDeploy-specific)
- /generate # 5: native generation endpoint
- /pooling # 6: reward/score pooling (the real embeddings path)
# NOTE: /v1/embeddings is intentionally excluded — LMDeploy returns HTTP 400
# unconditionally for all backends on that path.

# Sleep state probe (LMDeploy-specific, used for auto-detection)
- /is_sleeping # 7: sleep state probe
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What are the exact URL paths exposed by lmdeploy serve api_server for /generate, /pooling, and /is_sleeping endpoints? Are they prefixed with /v1?

💡 Result:

The lmdeploy serve api_server exposes the following endpoints for the specified paths, based on source code and documentation: - /generate: This is an LMDeploy-specific endpoint for raw generation requests (e.g., POST /generate with prompt, input_ids, etc.). It is NOT prefixed with /v1. - /pooling: This endpoint is for reward model pooling (e.g., POST /pooling with model and input). It is NOT prefixed with /v1. Available when using PyTorch backend with reward models like internlm2-1_8b-reward. - /is_sleeping: No evidence of this exact endpoint in api_server. There is a "sleep engine" feature in PyTorch backend for resource management (e.g., PR fixing sleep for dp=1), but no exposed HTTP endpoint named /is_sleeping. Likely internal or not publicly exposed. Not prefixed with /v1 if it exists. Primary endpoints are OpenAI-compatible under /v1 (e.g., /v1/chat/completions, /v1/models, /v1/completions), plus custom ones like /generate and /pooling at root level. Full list available via /openapi.json after starting the server.

Citations:


Remove or verify the /is_sleeping endpoint with LMDeploy upstream.

The web search against LMDeploy's source code confirms that /generate and /pooling are correctly listed without the /v1 prefix—they are custom root-level endpoints, not OpenAI-compatible ones. However, there is no evidence that /is_sleeping is an actual exposed HTTP endpoint in lmdeploy serve api_server. LMDeploy does have an internal sleep engine feature, but it does not appear to expose an HTTP endpoint for it. Including /is_sleeping in the config's api.paths and detection.path_indicators may cause detection to fail or requests to 404.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/profiles/lmdeploy.yaml` around lines 27 - 35, The config lists an
LMDeploy-only probe endpoint `/is_sleeping` which upstream does not expose;
remove `/is_sleeping` from api.paths and any detection.path_indicators, or
verify and replace it with a real LMDeploy probe if you confirm one exists.
Specifically, update the entries that mention `/is_sleeping` in the profile
(look for api.paths and detection.path_indicators) so detection relies only on
verified endpoints like `/generate` and `/pooling`, avoiding a probe that will
404.

Comment thread config/profiles/lmdeploy.yaml Outdated
Comment on lines +167 to +174
model: "$.model"
finish_reason: "$.choices[0].finish_reason"
input_tokens: "$.usage.prompt_tokens"
output_tokens: "$.usage.completion_tokens"
total_tokens: "$.usage.total_tokens"
calculations:
is_complete: 'len(finish_reason) > 0'
tokens_per_second: "generation_time_ms > 0 ? (output_tokens * 1000.0) / generation_time_ms : 0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether generation_time_ms is supplied by Olla's metrics layer
# (rather than being parsed from the response body).
rg -nP --type=go -C3 '\bgeneration_time_ms\b'
rg -nP --type=yaml -C2 '\bgeneration_time_ms\b' config/profiles

Repository: thushan/olla

Length of output: 2560


🏁 Script executed:

cat -n config/profiles/lmdeploy.yaml | sed -n '160,180p'

Repository: thushan/olla

Length of output: 673


generation_time_ms is missing from paths but used in calculations.

The tokens_per_second calculation references generation_time_ms, which is not extracted in the metrics.extraction.paths section of this profile. Other profiles (vllm, vllm-mlx, sglang) define generation_time_ms: "$.metrics.generation_time_ms" in their paths sections. Add the missing extraction to lmdeploy.yaml, otherwise the calculation will always evaluate to 0.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/profiles/lmdeploy.yaml` around lines 167 - 174, The calculations block
uses generation_time_ms in tokens_per_second but lmdeploy.yaml's
metrics.extraction.paths lacks extraction for generation_time_ms; update the
metrics.extraction.paths section to include generation_time_ms with the same
extraction key used in other profiles (e.g., generation_time_ms:
"$.metrics.generation_time_ms") so the tokens_per_second calculation in
calculations.tokens_per_second can compute correctly.

Comment thread docs/content/api-reference/lmdeploy.md Outdated
Comment thread readme.md Outdated
@thushan thushan merged commit 6d6ac4d into main Apr 27, 2026
8 checks passed
@thushan thushan deleted the feature/backend/lmdeploy branch April 27, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant