Avoid assigning meaning to `arch_target_map` keys #294

casparvl · 2025-01-31T20:18:56Z

Currently, the arch_target_map keys have meaning: they are interpreted by the bot/build.sh from software-layer to represent the OS/SUBDIR (but not accelerator). This is problematic, because if I have a system with e.g. zen4 CPU nodes and zen4+H100 GPU nodes, those would normally both be encoded as linux/x86_64/amd/zen4 in the architecture map - and clearly I cannot do that because keys have to be unique.

It would be better if the keys were meaningless, and if the bot/build.sh would get it's information elsewhere. For example:

arch_target_map = {
    'virtual_partition_1': {
        'os': 'linux',
        'subdir': 'x86_64/amd/zen4',
        'slurm_params': '-p genoa <etc>',
    },
    'virtual_partition_2': {
        'os': 'linux',
        'subdir': 'x86_64/amd/zen4',
        'accel': 'nvidia/cc90',
        'slurm_params': '-p gpu_h100 <etc>',
    },
}

would then configure the cpu-only zen4 partition and zen4+H100 partition respectively.

This would require changes in two places:

The bot code, because the bot currently assumes that the value of the arch_target_map is the slurm parameters to be used for submission. That should change, and it should extract one level deeper, i.e. arch_target_map['some_partition']['slurm_params'] instead of arch_target_map['some_os_subdir'].
The bot/build.sh should extract the relevant information from a more deeply nested dict.

The text was updated successfully, but these errors were encountered:

casparvl · 2025-04-09T12:10:23Z

Changes will be needed at least at around

eessi-bot-software-layer/tasks/build.py

Line 603 in d4b4811

for arch, slurm_opt in arch_map.items():

GOAL

Suppose we have

arch_target_map = {
    # zen4 CPU nodes
    'virtual_partition_1': {
        'os': 'linux',
        'subdir': 'x86_64/amd/zen4',
        'slurm_params': '-p genoa <etc>',
    },
    # Zen4 CPU + H100 GPU nodes
    'virtual_partition_2': {
        'os': 'linux',
        'subdir': 'x86_64/amd/zen4',
        'accel': 'nvidia/cc90',
        'slurm_params': '-p gpu_h100 <etc>',
    },
    # Icelake + A100 GPU nodes
    'virtual_partition_3': {
        'os': 'linux',
        'subdir': 'x86_64/intel/icelake',
        'accel': 'nvidia/cc80',
        'slurm_params': '-p gpu_a100 <etc>',
    },
    # Icelake + A100 GPU nodes, pretending to be Iclake CPU only
    'virtual_partition_4': {
        'os': 'linux',
        'subdir': 'x86_64/intel/icelake',
        'slurm_params': '-p gpu_a100 <etc>',
    },
}

Then we have the following four scenarios:

Building for zen4 CPU:

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4

There is no accel defined. Thus, the bot should match this to the arch_target_map, and figure out this has to be build on virtual_partition_1, i.e. using the slurm_params as defined there. The build prefix is inferred from the bot build command, and will thus be x86_64/amd/zen4, as intended.

Building for zen4+CC90:

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

Bot matches this to virtual_partition_2 and uses those slurm_params upon submission. The prefix again will be determined based on the build command, and thus be x86_64/amd/zen4/nvidia/cc90 as intended.

Building for icelake+CC80:

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:icelake accel:nvidia/cc80

Bot matches this to virtual_partition_3 and uses those slurm_params upon submission. The prefix again will be determined based on the build command, and thus be x86_64/intel/icelake/nvidia/cc80 as intended.

Building for icelake:

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:icelake accel:nvidia/cc80

Bot matches this to virtual_partition_4 and uses those slurm_params upon submission. The prefix again will be determined based on the build command, and thus be x86_64/intel/icelake as intended.

casparvl mentioned this issue Feb 17, 2025

{2023.06}[foss/2023a] waLBerla 6.1 w/ CUDA 12.1.1 EESSI/software-layer#780

Open

casparvl mentioned this issue Apr 17, 2025

Build CUDA + OSU-Micro-Benchmarks GPU software for supported combinations of CPU and CUDA compute capability 70 EESSI/software-layer#1030

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid assigning meaning to `arch_target_map` keys #294

Avoid assigning meaning to `arch_target_map` keys #294

casparvl commented Jan 31, 2025

casparvl commented Apr 9, 2025 •

edited

Loading

Uh oh!

Avoid assigning meaning to arch_target_map keys #294

Avoid assigning meaning to arch_target_map keys #294

Comments

casparvl commented Jan 31, 2025

casparvl commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Avoid assigning meaning to `arch_target_map` keys #294

Avoid assigning meaning to `arch_target_map` keys #294

casparvl commented Apr 9, 2025 •

edited

Loading