Skip to content

Conversation

@kyujin-cho
Copy link
Member

@kyujin-cho kyujin-cho commented Nov 18, 2025

resolves #6798 (BA-3071).

This pull request refactors how device allocation and unified device management are handled in the agent resource specification and kernel lifecycle. The main improvements are the introduction of the DeviceView dataclass and the device_list property, which provide a unified and clearer way to access all devices (including unified devices) attached to a kernel. This change simplifies device iteration logic throughout the codebase and ensures unified devices are consistently included in resource specifications and related operations.

Device allocation and unified device management:

  • Added the DeviceView dataclass and the device_list property to KernelResourceSpec, enabling a unified view of all devices (including unified devices) attached to a kernel. Unified devices are now tracked via a new unified_devices attribute, and the resource spec serialization/deserialization logic is updated to include this information. (src/ai/backend/agent/resources.py) [1] [2] [3] [4]
  • Introduced the SlotTypes.UNIFIED enum value to distinguish unified device slots, supporting the new unified device logic. (src/ai/backend/common/types.py)

API and method refactoring:

  • Added the generate_resource_spec async method to the agent, which builds a complete resource spec including unified devices, replacing direct calls to prepare_resource_spec throughout the codebase. (src/ai/backend/agent/agent.py) [1] [2]
  • Updated device iteration logic in methods responsible for mounting static binaries, preparing hook mounts, and container hook selection to use the new device_list property instead of iterating over allocations and manually checking for nonzero allocations. This improves clarity and ensures unified devices are handled correctly. (src/ai/backend/agent/agent.py, src/ai/backend/agent/stage/kernel_lifecycle/docker/environ.py, src/ai/backend/agent/stage/kernel_lifecycle/docker/mount/krunner.py) [1] [2] [3]

Documentation and typing improvements:

  • Added docstrings and type hints to new and updated methods, clarifying the purpose and usage of resource spec generation and device aggregation. (src/ai/backend/agent/agent.py, src/ai/backend/agent/resources.py) [1] [2]

These changes collectively improve device management consistency, reduce code duplication, and make the agent's resource handling logic easier to maintain and extend.

Copilot AI review requested due to automatic review settings November 18, 2025 06:14
@github-actions github-actions bot added size:S 10~30 LoC comp:agent Related to Agent component comp:common Related to Common component labels Nov 18, 2025
Copilot finished reviewing on behalf of kyujin-cho November 18, 2025 06:15
@kyujin-cho kyujin-cho marked this pull request as draft November 18, 2025 06:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for DGX Spark devices by introducing a new "unified" slot type that allows implicit attachment of certain accelerator devices to all created kernels.

Key changes:

  • Added UNIFIED slot type enum value to distinguish unified accelerators
  • Modified kernel creation logic to automatically attach unified-type accelerators to every kernel
  • Removed x86_64 platform-specific wheel build configuration from build script

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/ai/backend/common/types.py Added UNIFIED enum value to SlotTypes for unified accelerator devices
src/ai/backend/agent/agent.py Added logic in create_kernel to automatically attach unified accelerators to all kernels
scripts/build-wheels.sh Removed x86_64 platform-specific wheel build configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added size:M 30~100 LoC and removed size:S 10~30 LoC labels Nov 18, 2025
@github-actions github-actions bot added size:L 100~500 LoC and removed size:M 30~100 LoC labels Nov 20, 2025
Copilot finished reviewing on behalf of kyujin-cho November 20, 2025 10:03
@kyujin-cho kyujin-cho changed the title feature (BA-3071): add support for DGX spark devices feat (BA-3071): add support for DGX spark devices Nov 20, 2025
@kyujin-cho kyujin-cho marked this pull request as ready for review November 20, 2025 10:05
@kyujin-cho kyujin-cho changed the title feat (BA-3071): add support for DGX spark devices feat(BA-3071): add support for DGX spark devices Nov 20, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kyujin-cho
Copy link
Member Author

@copilot Update PR description to abstract changes introduced in this PR.

Copy link

Copilot AI commented Nov 20, 2025

@kyujin-cho I've opened a new pull request, #6850, to work on those changes. Once the pull request is ready, I'll request review from you.

@HyeockJinKim HyeockJinKim force-pushed the main branch 4 times, most recently from 1a10632 to 2d8c9ea Compare November 23, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:agent Related to Agent component comp:common Related to Common component size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unified Device Type

2 participants