Skip to content

Commit 34299f5

Browse files
committed
scheduler: feasibility check that memory_max fits in total
The `resource.memory_max` field is intended to allow memory oversubscription, so we don't check it in the `AllocsFit` method where we're totalling up all the request memory for all allocs on a node. But we never check that the value can even fit in the maximum amount of memory on the node, which can result in nonsensical placements. When iterating nodes in the feasibility check phase, check that the `memory_max` field doesn't exceed the total amount of memory on the node. Note that this deliberately ignores over "reserved memory", as the feature is intended to allow oversubscription. Fixes: #26360
1 parent 501608c commit 34299f5

File tree

3 files changed

+28
-3
lines changed

3 files changed

+28
-3
lines changed

scheduler/feasible/rank.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -383,6 +383,13 @@ NEXTNODE:
383383
if iter.memoryOversubscription {
384384
taskResources.Memory.MemoryMaxMB = safemath.Add(
385385
int64(task.Resources.MemoryMaxMB), int64(task.Resources.SecretsMB))
386+
387+
if taskResources.Memory.MemoryMaxMB > option.Node.NodeResources.Memory.MemoryMB {
388+
iter.ctx.Metrics().FilterNode(option.Node,
389+
"task memory_max exceeds maximum available memory")
390+
netIdx.Release()
391+
continue NEXTNODE
392+
}
386393
}
387394

388395
// Check if we need a network resource

scheduler/feasible/rank_test.go

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,18 @@ func TestBinPackIterator_NoExistingAlloc(t *testing.T) {
110110
},
111111
},
112112
},
113+
{
114+
Node: &structs.Node{
115+
// Empty but memory_max won't fit
116+
NodeResources: &structs.NodeResources{
117+
Processors: processorResources4096,
118+
Cpu: legacyCpuResources4096,
119+
Memory: structs.NodeMemoryResources{
120+
MemoryMB: 1024,
121+
},
122+
},
123+
},
124+
},
113125
}
114126
static := NewStaticRankIterator(ctx, nodes)
115127

@@ -119,8 +131,9 @@ func TestBinPackIterator_NoExistingAlloc(t *testing.T) {
119131
{
120132
Name: "web",
121133
Resources: &structs.Resources{
122-
CPU: 1024,
123-
MemoryMB: 1024,
134+
CPU: 1024,
135+
MemoryMB: 1024,
136+
MemoryMaxMB: 2048,
124137
},
125138
},
126139
},

website/content/docs/upgrade/upgrade-specific.mdx

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,12 @@ used to document those details separately from the standard upgrade flow.
1818

1919
In Nomad 1.11.0, submitting a sysbatch job with a `reschedule` block returns
2020
an error instead of being silently ignored, as it was in previous versions. The
21-
same behavior applies to system jobs.
21+
same behavior applies to system jobs.
22+
23+
#### Memory oversubscription checked against total memory on node
24+
25+
In Nomad 1.11.0, the scheduler checks that that the `resources.memory_max` of a
26+
task doesn't exceed the total memory on a given node.
2227

2328
## Nomad 1.10.2
2429

0 commit comments

Comments
 (0)