Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 20 additions & 10 deletions kubernetes/README-ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,22 +25,29 @@ BatchSandbox 自定义资源允许您创建和管理多个相同的沙箱环境

### 资源池化
Pool 自定义资源维护一个预热的计算资源池,以实现快速沙箱供应:
- 可配置的缓冲区大小(最小和最大)以平衡资源可用性和成本
- 池容量限制以控制总体资源消耗
- 基于需求的自动资源分配和释放
- 实时状态监控,显示总数、已分配和可用资源
- **可配置的缓冲区大小**:设置最小和最大缓冲区,以确保资源可用性同时控制成本。
- **池容量限制**:通过池范围的最小和最大限制来控制总体资源消耗。
- **回收策略 (Recycle Policies)**:支持不同的 Pod 回收策略:
- **Delete (默认)**:Pod 在返回池时会被删除并根据模板重新创建,确保环境绝对纯净。
- **Restart**:通过向所有容器的 PID 1 发送 SIGTERM 信号优雅终止进程,并依赖 Kubernetes 的 `restartPolicy` 触发重启。这种方式比 `Delete` 更快,但要求 `PodTemplateSpec` 中的 `restartPolicy` 设置为 `Always`。
- **自动扩展**:基于当前需求和缓冲区设置进行动态资源分配和释放。
- **实时状态监控**:显示总数、已分配、可用以及正在重启中的 Pod 数量。

### 任务编排
集成的任务管理系统,在沙箱内执行自定义工作负载:
- **可选执行**:任务调度完全可选 - 可以在不带任务的情况下创建沙箱
- **基于进程的任务**:支持在沙箱环境中执行基于进程的任务
- **异构任务分发**:使用 shardTaskPatches 为批处理中的每个沙箱定制单独的任务
- **可选执行**:任务调度完全可选 - 可以在不带任务的情况下创建沙箱。
- **基于进程的任务**:支持在沙箱环境中执行基于进程的任务。
- **异构任务分发**:使用 `shardTaskPatches` 为批处理中的每个沙箱定制单独的任务。
- **资源释放策略**:通过 `taskResourcePolicyWhenCompleted` 控制任务完成后资源何时返回池:
- **Retain (默认)**:保持沙箱资源,直到 `BatchSandbox` 被删除或过期。
- **Release**:在任务达到终态(SUCCEEDED 或 FAILED)后,立即自动将沙箱释放回资源池。

### 高级调度
智能资源管理功能:
- 最小和最大缓冲区设置,以确保资源可用性同时控制成本
- 池范围的容量限制,防止资源耗尽
- 基于需求的自动扩展
- **基于需求的自动扩展**:根据实时的沙箱分配请求,自动扩展和收缩资源池中的 Pod 数量。
- **缓冲区管理**:通过 `bufferMin` 和 `bufferMax` 设置平衡即时可用性与资源开销。
- **池约束**:使用 `poolMin` 和 `poolMax` 设置资源使用的硬边界。
- **滚动更新**:当修改 `PodTemplateSpec` 时,自动进行池更新和 Pod 轮转。

## 运行时 API 支持说明

Expand Down Expand Up @@ -390,6 +397,7 @@ spec:
bufferMin: 2
poolMax: 20
poolMin: 5
podRecyclePolicy: Delete
```

应用资源池配置:
Expand Down Expand Up @@ -442,6 +450,7 @@ spec:
bufferMin: 2
poolMax: 20
poolMin: 5
podRecyclePolicy: Delete
```

使用我们刚刚创建的资源池创建一批带有基于进程的异构任务的沙箱:
Expand All @@ -454,6 +463,7 @@ metadata:
spec:
replicas: 2
poolRef: task-example-pool
taskResourcePolicyWhenCompleted: Release
taskTemplate:
spec:
process:
Expand Down
30 changes: 20 additions & 10 deletions kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,22 +25,29 @@ The BatchSandbox custom resource allows you to create and manage multiple identi

### Resource Pooling
The Pool custom resource maintains a pool of pre-warmed compute resources to enable rapid sandbox provisioning:
- Configurable buffer sizes (minimum and maximum) to balance resource availability and cost
- Pool capacity limits to control overall resource consumption
- Automatic resource allocation and deallocation based on demand
- Real-time status monitoring showing total, allocated, and available resources
- **Configurable Buffer Sizes**: Minimum and maximum buffer settings to ensure resource availability while controlling costs.
- **Pool Capacity Limits**: Overall resource consumption control with pool-wide minimum and maximum limits.
- **Recycle Policies**: Support for different pod recycling strategies:
- **Delete (Default)**: Pods are deleted and recreated from the template when returned to the pool, ensuring a completely clean environment.
- **Restart**: PID 1 in all containers is gracefully terminated (SIGTERM), and the Kubernetes `restartPolicy` triggers a restart. This is faster than `Delete` but requires the `restartPolicy` in `PodTemplateSpec` to be set to `Always`.
- **Automatic Scaling**: Dynamic resource allocation and deallocation based on current demand and buffer settings.
- **Real-time Status**: Monitoring of total, allocated, available, and restarting pods.

### Task Orchestration
Integrated task management system that executes custom workloads within sandboxes:
- **Optional Execution**: Task scheduling is completely optional - sandboxes can be created without tasks
- **Process-Based Tasks**: Support for process-based tasks that execute within the sandbox environment
- **Heterogeneous Task Distribution**: Customize individual tasks for each sandbox in a batch using shardTaskPatches
- **Optional Execution**: Task scheduling is completely optional - sandboxes can be created without tasks.
- **Process-Based Tasks**: Support for process-based tasks that execute within the sandbox environment.
- **Heterogeneous Task Distribution**: Customize individual tasks for each sandbox in a batch using `shardTaskPatches`.
- **Resource Release Policy**: Control when resources are returned to the pool after task completion via `taskResourcePolicyWhenCompleted`:
- **Retain (Default)**: Keeps the sandbox resources until the `BatchSandbox` is deleted or expires.
- **Release**: Automatically releases the sandbox back to the pool immediately after the task reaches a terminal state (SUCCEEDED or FAILED).

### Advanced Scheduling
Intelligent resource management features:
- Minimum and maximum buffer settings to ensure resource availability while controlling costs
- Pool-wide capacity limits to prevent resource exhaustion
- Automatic scaling based on demand
- **Demand-based Scaling**: Automatically scales the number of pods in the pool based on real-time sandbox allocation requests.
- **Buffer Management**: `bufferMin` and `bufferMax` settings to balance instant availability with resource overhead.
- **Pool Constraints**: `poolMin` and `poolMax` to set hard boundaries on resource usage.
- **Rolling Updates**: Automatic pool update and pod rotation when the `PodTemplateSpec` is modified.

## Runtime API Support Notes

Expand Down Expand Up @@ -389,6 +396,7 @@ spec:
bufferMin: 2
poolMax: 20
poolMin: 5
podRecyclePolicy: Delete
```

Apply the pool configuration:
Expand Down Expand Up @@ -441,6 +449,7 @@ spec:
bufferMin: 2
poolMax: 20
poolMin: 5
podRecyclePolicy: Delete
```

Create a batch of sandboxes with process-based heterogeneous tasks using the pool we just created:
Expand All @@ -453,6 +462,7 @@ metadata:
spec:
replicas: 2
poolRef: task-example-pool
taskResourcePolicyWhenCompleted: Release
taskTemplate:
spec:
process:
Expand Down
18 changes: 18 additions & 0 deletions kubernetes/apis/sandbox/v1alpha1/pool_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,23 @@ type PoolSpec struct {
// CapacitySpec controls the size of the resource pool.
// +kubebuilder:validation:Required
CapacitySpec CapacitySpec `json:"capacitySpec"`
// PodRecyclePolicy controls the recycle policy for Pods released from BatchSandbox.
// +optional
// +kubebuilder:default=Delete
PodRecyclePolicy PodRecyclePolicy `json:"podRecyclePolicy,omitempty"`
}

// PodRecyclePolicy defines the recycle policy for Pods released from BatchSandbox.
// +kubebuilder:validation:Enum=Delete;Restart
type PodRecyclePolicy string

const (
// PodRecyclePolicyDelete deletes the Pod directly when released from BatchSandbox.
PodRecyclePolicyDelete PodRecyclePolicy = "Delete"
// PodRecyclePolicyRestart restarts containers before reusing the Pod.
PodRecyclePolicyRestart PodRecyclePolicy = "Restart"
)

type CapacitySpec struct {
// BufferMax is the maximum number of nodes kept in the warm buffer.
// +kubebuilder:validation:Minimum=0
Expand Down Expand Up @@ -66,6 +81,9 @@ type PoolStatus struct {
Allocated int32 `json:"allocated"`
// Available is the number of nodes currently available in the pool.
Available int32 `json:"available"`
// Restarting is the number of Pods that are being restarted for recycle.
// +optional
Restarting int32 `json:"restarting,omitempty"`
}

// +genclient
Expand Down
1 change: 1 addition & 0 deletions kubernetes/charts/opensandbox-controller/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ spec:
bufferMin: 2
poolMax: 20
poolMin: 5
podRecyclePolicy: Delete
```

### Create a Batch Sandbox
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,12 @@ rules:
- get
- patch
- update
- apiGroups:
- ""
resources:
- pods/exec
verbs:
- create
- apiGroups:
- sandbox.opensandbox.io
resources:
Expand Down
17 changes: 13 additions & 4 deletions kubernetes/cmd/controller/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,15 @@ import (
"flag"
"os"
"path/filepath"
"time"

// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"

"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
"k8s.io/client-go/kubernetes"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/certwatcher"
Expand Down Expand Up @@ -77,6 +79,9 @@ func main() {
var kubeClientQPS float64
var kubeClientBurst int

// Restart timeout configuration
var restartTimeout time.Duration

flag.StringVar(&metricsAddr, "metrics-bind-address", "0", "The address the metrics endpoint binds to. "+
"Use :8443 for HTTPS or :8080 for HTTP, or leave as 0 to disable the metrics service.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
Expand Down Expand Up @@ -104,6 +109,7 @@ func main() {
flag.BoolVar(&logCompress, "log-compress", true, "Compress determines if the rotated log files should be compressed using gzip")
flag.Float64Var(&kubeClientQPS, "kube-client-qps", 100, "QPS for Kubernetes client rate limiter.")
flag.IntVar(&kubeClientBurst, "kube-client-burst", 200, "Burst for Kubernetes client rate limiter.")
flag.DurationVar(&restartTimeout, "restart-timeout", 90*time.Second, "Timeout for Pod restart operations. If a Pod fails to restart within this duration, it will be deleted.")

opts := zap.Options{}
opts.BindFlags(flag.CommandLine)
Expand Down Expand Up @@ -259,11 +265,14 @@ func main() {
setupLog.Error(err, "unable to create controller", "controller", "BatchSandbox")
os.Exit(1)
}
kubeClient := kubernetes.NewForConfigOrDie(mgr.GetConfig())
restartTracker := controller.NewRestartTracker(mgr.GetClient(), kubeClient, mgr.GetConfig(), restartTimeout)
if err := (&controller.PoolReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Recorder: mgr.GetEventRecorderFor("pool-controller"),
Allocator: controller.NewDefaultAllocator(mgr.GetClient()),
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Recorder: mgr.GetEventRecorderFor("pool-controller"),
Allocator: controller.NewDefaultAllocator(mgr.GetClient()),
RestartTracker: restartTracker,
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "Pool")
os.Exit(1)
Expand Down
13 changes: 13 additions & 0 deletions kubernetes/config/crd/bases/sandbox.opensandbox.io_pools.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,14 @@ spec:
- poolMax
- poolMin
type: object
podRecyclePolicy:
default: Delete
description: PodRecyclePolicy controls the recycle policy for Pods
released from BatchSandbox.
enum:
- Delete
- Restart
type: string
template:
description: Pod Template used to create pre-warmed nodes in the pool.
x-kubernetes-preserve-unknown-fields: true
Expand All @@ -109,6 +117,11 @@ spec:
BatchSandbox's generation, which is updated on mutation by the API Server.
format: int64
type: integer
restarting:
description: Restarting is the number of Pods that are being restarted
for recycle.
format: int32
type: integer
revision:
description: Revision is the latest version of pool
type: string
Expand Down
6 changes: 6 additions & 0 deletions kubernetes/config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ rules:
- patch
- update
- watch
- apiGroups:
- ""
resources:
- pods/exec
verbs:
- create
- apiGroups:
- ""
resources:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ spec:
- -f
- /dev/null
expireTime: "2025-12-03T12:55:41Z"
taskResourcePolicyWhenCompleted: Release
taskTemplate:
spec:
process:
Expand Down
1 change: 1 addition & 0 deletions kubernetes/config/samples/sandbox_v1alpha1_pool.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,4 @@ spec:
bufferMin: 1
poolMax: 5
poolMin: 0
podRecyclePolicy: Delete
6 changes: 6 additions & 0 deletions kubernetes/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ require (
sigs.k8s.io/controller-runtime v0.21.0
)

require (
github.com/gorilla/websocket v1.5.4-0.20250319132907-e064f32e3674 // indirect
github.com/moby/spdystream v0.5.0 // indirect
github.com/mxk/go-flowrate v0.0.0-20140419014527-cca7078d478f // indirect
)

require (
cel.dev/expr v0.19.1 // indirect
github.com/antlr4-go/antlr/v4 v4.13.0 // indirect
Expand Down
8 changes: 8 additions & 0 deletions kubernetes/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ cel.dev/expr v0.19.1 h1:NciYrtDRIR0lNCnH1LFJegdjspNx9fI59O7TWcua/W4=
cel.dev/expr v0.19.1/go.mod h1:MrpN08Q+lEBs+bGYdLxxHkZoUSsCp0nSKTs0nTymJgw=
github.com/antlr4-go/antlr/v4 v4.13.0 h1:lxCg3LAv+EUK6t1i0y1V6/SLeUi0eKEKdhQAlS8TVTI=
github.com/antlr4-go/antlr/v4 v4.13.0/go.mod h1:pfChB/xh/Unjila75QW7+VU4TSnWnnk9UTnmpPaOR2g=
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio=
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/blang/semver/v4 v4.0.0 h1:1PFHFE6yCCTv8C1TeyNNarDzntLi7wMI5i/pzqYIsAM=
Expand Down Expand Up @@ -66,6 +68,8 @@ github.com/google/pprof v0.0.0-20241029153458-d1b30febd7db h1:097atOisP2aRj7vFgY
github.com/google/pprof v0.0.0-20241029153458-d1b30febd7db/go.mod h1:vavhavw2zAxS5dIdcRluK6cSGGPlZynqzFM8NdvU144=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/gorilla/websocket v1.5.4-0.20250319132907-e064f32e3674 h1:JeSE6pjso5THxAzdVpqr6/geYxZytqFMBCOtn/ujyeo=
github.com/gorilla/websocket v1.5.4-0.20250319132907-e064f32e3674/go.mod h1:r4w70xmWCQKmi1ONH4KIaBptdivuRPyosB9RmPlGEwA=
github.com/grpc-ecosystem/grpc-gateway/v2 v2.24.0 h1:TmHmbvxPmaegwhDubVz0lICL0J5Ka2vwTzhoePEXsGE=
github.com/grpc-ecosystem/grpc-gateway/v2 v2.24.0/go.mod h1:qztMSjm835F2bXf+5HKAPIS5qsmQDqZna/PgVt4rWtI=
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
Expand All @@ -89,13 +93,17 @@ github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
github.com/mailru/easyjson v0.7.7 h1:UGYAvKxe3sBsEDzO8ZeWOSlIQfWFlxbzLZe7hwFURr0=
github.com/mailru/easyjson v0.7.7/go.mod h1:xzfreul335JAWq5oZzymOObrkdz5UnU4kGfJJLY9Nlc=
github.com/moby/spdystream v0.5.0 h1:7r0J1Si3QO/kjRitvSLVVFUjxMEb/YLj6S9FF62JBCU=
github.com/moby/spdystream v0.5.0/go.mod h1:xBAYlnt/ay+11ShkdFKNAG7LsyK/tmNBVvVOwrfMgdI=
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M=
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
github.com/mxk/go-flowrate v0.0.0-20140419014527-cca7078d478f h1:y5//uYreIhSUg3J1GEMiLbxo1LJaP8RfCpH6pymGZus=
github.com/mxk/go-flowrate v0.0.0-20140419014527-cca7078d478f/go.mod h1:ZdcZmHo+o7JKHSa8/e818NopupXU1YMK5fe1lsApnBw=
github.com/onsi/ginkgo/v2 v2.22.0 h1:Yed107/8DjTr0lKCNt7Dn8yQ6ybuDRQoMGrNFKzMfHg=
github.com/onsi/ginkgo/v2 v2.22.0/go.mod h1:7Du3c42kxCUegi0IImZ1wUQzMBVecgIHjR1C+NkhLQo=
github.com/onsi/gomega v1.36.1 h1:bJDPBO7ibjxcbHMgSCoo4Yj18UWbKDlLwX1x9sybDcw=
Expand Down
Loading
Loading