Skip to content

Conversation

achimnol
Copy link
Member

@achimnol achimnol commented Sep 27, 2025

resolves #5926 (BA-2404)

Core changes:

image image image
  • Rewrite SlotName to be a UserString subclass which lazily parses the slot name format.
    • {device_name}.{major_type}[:{minor_type}]
  • Update gpu_allocated calculation to use SlotName.is_accelerator() method to sum all allocated accelerators
    • Centralize the logic to guess if the slot name is an accelerator or not, for the future.
  • Update ResourceSlot to use UserDict with proper generic type arguments
    • Since typeshed and the stdlib documentation indicates initialization should accept a single dict instance rather than kwargs, update all of such usage.

Warning

When there are multiple different accelerators allocated in a single container or installed in a single agent node, this simple "sum" for gpu_allocated may have a non-sense value when mixing different units and fraction scales. We need some design discussion here.

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation

@github-actions github-actions bot added size:XL 500~ LoC comp:manager Related to Manager component comp:agent Related to Agent component comp:common Related to Common component labels Sep 27, 2025
@github-actions github-actions bot added size:L 100~500 LoC and removed size:XL 500~ LoC labels Sep 29, 2025
return core_schema.no_info_after_validator_function(
cls._validate,
core_schema.dict_schema(
core_schema.any_schema(), # TODO: make SlotName also compatible with pydantic schema

Check notice

Code scanning / devskim

A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note

Suspicious comment
@github-actions github-actions bot added the area:docs Documentations label Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs Documentations comp:agent Related to Agent component comp:common Related to Common component comp:manager Related to Manager component size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gpu_allocated field should include all accelerators

1 participant