Skip to content

Build CUDA + OSU-Micro-Benchmarks GPU software for supported combinations of CPU and CUDA compute capability 70 #1030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

TopRichard
Copy link
Collaborator

@TopRichard TopRichard commented Apr 15, 2025

  •  x86_64_generic
  •  cascadelake
  •  haswell
  •  icelake
  •  sapphirerapids
  •  skylake
  •  zen2
  •  zen3
  •  zen4
  •  aarch64_generic
  •  neoverse_n1
  •  neoverse_v1
  •  nvidia/grace Built on gpu-node

First attempt using the rebuild precedure:

  • For the builds with accelerator:nvidia/cc70 and CUDA-Samples-12.1-GCC-12.3.0, the build process on a gpu and none gpu node of CUDA-Samples-12.1-GCC-12.3.0 fails with error :
make[1]: Entering directory '/tmp/.../easybuild/build/CUDASamples/12.1/GCC-12.3.0-CUDA-12.1.1/cuda-samples-12.1/Samples/3_CUDA_Features/immaTensorCoreGemm'
/../software/CUDA/12.1.1/bin/nvcc -ccbin g++ -I../../../Common -m64 -maxrregcount=255 --threads 0 --std=c++11 -gencode arch=compute_70,code=sm_70 -genco
de arch=compute_70,code=compute_70 -o immaTensorCoreGemm.o -c immaTensorCoreGemm.cu
immaTensorCoreGemm.cu(260): error: incomplete type is not allowed
      wmma::fragment<wmma::accumulator, 16, 16, 16, int> c[2]
                                                         ^

immaTensorCoreGemm.cu(271): error: no instance of overloaded function "nvcuda::wmma::load_matrix_sync" matches the argument list
            argument types are: (<error-type>, const int *, int, nvcuda::wmma::layout_t)
          wmma::load_matrix_sync(c[i][j], tile_ptr, (16 * (4 * 2)), wmma::mem_row_major);
          ^

immaTensorCoreGemm.cu(336): error: incomplete type is not allowed
              a[2];

16 errors detected in the compilation of "immaTensorCoreGemm.cu".

Skipping CUDA-Samples-12.1-GCC-12.3.0, results in a successful build

Second attempt setting the cc70 yml file in accel/nvidia, thus no rebuild required

Copy link

eessi-bot bot commented Apr 15, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot-deucalion
Copy link

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Apr 15, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat

@eessi-bot-trz42
Copy link

Instance trz42-GH200-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@TopRichard TopRichard marked this pull request as draft April 15, 2025 11:32
@TopRichard TopRichard added 2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia labels Apr 15, 2025
@TopRichard
Copy link
Collaborator Author

bot: build inst:rt-Grace-jr arch:aarch64/nvidia/grace repo:eessi.io-2023.06-software accelerator:nvidia/cc70

Copy link

eessi-bot bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build inst:rt-Grace-jr arch:aarch64/nvidia/grace repo:eessi.io-2023.06-software accelerator:nvidia/cc70 from TopRichard

    • expanded format: build instance:rt-Grace-jr architecture:aarch64/nvidia/grace repository:eessi.io-2023.06-software accelerator:nvidia/cc70
  • handling command build instance:rt-Grace-jr architecture:aarch64/nvidia/grace repository:eessi.io-2023.06-software accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build inst:rt-Grace-jr arch:aarch64/nvidia/grace repo:eessi.io-2023.06-software accelerator:nvidia/cc70 from TopRichard

    • expanded format: build instance:rt-Grace-jr architecture:aarch64/nvidia/grace repository:eessi.io-2023.06-software accelerator:nvidia/cc70
  • handling command build instance:rt-Grace-jr architecture:aarch64/nvidia/grace repository:eessi.io-2023.06-software accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command build inst:rt-Grace-jr arch:aarch64/nvidia/grace repo:eessi.io-2023.06-software accelerator:nvidia/cc70 from TopRichard

    • expanded format: build instance:rt-Grace-jr architecture:aarch64/nvidia/grace repository:eessi.io-2023.06-software accelerator:nvidia/cc70
  • handling command build instance:rt-Grace-jr architecture:aarch64/nvidia/grace repository:eessi.io-2023.06-software accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build inst:rt-Grace-jr arch:aarch64/nvidia/grace repo:eessi.io-2023.06-software accelerator:nvidia/cc70 from TopRichard

    • expanded format: build instance:rt-Grace-jr architecture:aarch64/nvidia/grace repository:eessi.io-2023.06-software accelerator:nvidia/cc70
  • handling command build instance:rt-Grace-jr architecture:aarch64/nvidia/grace repository:eessi.io-2023.06-software accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

eessi-bot-toprichard bot commented Apr 15, 2025

Updates by the bot instance rt-Grace-jr (click for details)

@eessi-bot-toprichard
Copy link

eessi-bot-toprichard bot commented Apr 15, 2025

New job on instance rt-Grace-jr for CPU micro-architecture aarch64-nvidia-grace and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /p/project1/ceasybuilders/bot-rt/jobs/2025.04/pr_1030/13613238

date job status comment
Apr 15 12:29:47 UTC 2025 submitted job id 13613238 awaits release by job manager
Apr 15 12:30:04 UTC 2025 released job awaits launch by Slurm scheduler
Apr 15 12:31:07 UTC 2025 running job 13613238 is running
Apr 15 13:04:05 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13613238.out
❌ found message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-nvidia-grace-1744721408.tar.gzsize: 2010 MiB (2107824492 bytes)
entries: 3777
modules under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70/software
CUDA/12.1.1
other under 2023.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc70
no other files in tarball
Apr 15 13:04:05 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-13613238.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Collaborator Author

bot: build inst:eessi-bot-surf repo:eessi.io-2023.06-software accelerator:nvidia/cc70

Copy link

eessi-bot bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build inst:eessi-bot-surf repo:eessi.io-2023.06-software accelerator:nvidia/cc70 from TopRichard

    • expanded format: build instance:eessi-bot-surf repository:eessi.io-2023.06-software accelerator:nvidia/cc70
  • handling command build instance:eessi-bot-surf repository:eessi.io-2023.06-software accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build inst:eessi-bot-surf repo:eessi.io-2023.06-software accelerator:nvidia/cc70 from TopRichard

    • expanded format: build instance:eessi-bot-surf repository:eessi.io-2023.06-software accelerator:nvidia/cc70
  • handling command build instance:eessi-bot-surf repository:eessi.io-2023.06-software accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command build inst:eessi-bot-surf repo:eessi.io-2023.06-software accelerator:nvidia/cc70 from TopRichard

    • expanded format: build instance:eessi-bot-surf repository:eessi.io-2023.06-software accelerator:nvidia/cc70
  • handling command build instance:eessi-bot-surf repository:eessi.io-2023.06-software accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-surf (click for details)

@eessi-bot-toprichard
Copy link

eessi-bot-toprichard bot commented Apr 15, 2025

Updates by the bot instance rt-Grace-jr (click for details)
  • received bot command build inst:eessi-bot-surf repo:eessi.io-2023.06-software accelerator:nvidia/cc70 from TopRichard

    • expanded format: build instance:eessi-bot-surf repository:eessi.io-2023.06-software accelerator:nvidia/cc70
  • handling command build instance:eessi-bot-surf repository:eessi.io-2023.06-software accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 15, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.04/pr_1030/11191697

date job status comment
Apr 15 13:01:37 UTC 2025 submitted job id 11191697 will be eligible to start in about 20 seconds
Apr 15 13:01:43 UTC 2025 received job awaits launch by Slurm scheduler
Apr 15 13:02:00 UTC 2025 running job 11191697 is running
Apr 15 13:12:33 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-11191697.out
❌ found message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-1744722429.tar.gzsize: 2067 MiB (2167673447 bytes)
entries: 5518
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc70/software
CUDA/12.1.1
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc70
no other files in tarball
Apr 15 13:12:33 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-11191697.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 15, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.04/pr_1030/11191721

date job status comment
Apr 15 13:01:40 UTC 2025 submitted job id 11191721 will be eligible to start in about 20 seconds
Apr 15 13:01:46 UTC 2025 received job awaits launch by Slurm scheduler
Apr 15 13:02:15 UTC 2025 running job 11191721 is running
Apr 15 13:15:54 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-11191721.out
❌ found message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1744722600.tar.gzsize: 2067 MiB (2167671608 bytes)
entries: 5518
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc70/software
CUDA/12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc70
no other files in tarball
Apr 15 13:15:54 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-11191721.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Collaborator

laraPPr commented Apr 15, 2025

bot: help instance:eessi-bot-vsc-ugent

Copy link

eessi-bot bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command help instance:eessi-bot-vsc-ugent from laraPPr

    • expanded format: help instance:eessi-bot-vsc-ugent
  • handling command help instance:eessi-bot-vsc-ugent resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

Copy link

eessi-bot bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command help instance:eessi-bot-vsc-ugent from laraPPr

    • expanded format: help instance:eessi-bot-vsc-ugent
  • handling command help instance:eessi-bot-vsc-ugent resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 15, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command help instance:eessi-bot-vsc-ugent from laraPPr

    • expanded format: help instance:eessi-bot-vsc-ugent
  • handling command help instance:eessi-bot-vsc-ugent resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account laraPPr has NO permission to send commands to the bot

Copy link

eessi-bot bot commented May 8, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/cascadelake accel:nvidia/cc70 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc70
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/icelake accel:nvidia/cc70 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc70
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented May 8, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/cascadelake accel:nvidia/cc70 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc70
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/icelake accel:nvidia/cc70 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc70
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

eessi-bot-toprichard bot commented May 8, 2025

Updates by the bot instance rt-Grace-jr (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/cascadelake accel:nvidia/cc70 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc70
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/icelake accel:nvidia/cc70 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc70
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc70 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented May 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-cascadelake and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.05/pr_1030/61709

date job status comment
May 08 18:03:33 UTC 2025 submitted job id 61709 awaits release by job manager
May 08 18:03:58 UTC 2025 released job awaits launch by Slurm scheduler
May 08 18:09:32 UTC 2025 running job 61709 is running
May 08 19:18:53 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-61709.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-cascadelake-17467297290.tar.gzsize: 4495 MiB (4713417378 bytes)
entries: 12167
modules under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
GDRCopy/2.4-GCCcore-13.2.0.lua
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1.lua
NCCL/2.20.5-GCCcore-13.2.0-CUDA-12.4.0.lua
OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1.lua
OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0.lua
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua
UCC-CUDA/1.2.0-GCCcore-13.2.0-CUDA-12.4.0.lua
UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1.lua
UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.4.0.lua
software under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
GDRCopy/2.4-GCCcore-13.2.0
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1
NCCL/2.20.5-GCCcore-13.2.0-CUDA-12.4.0
OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1
OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1
UCC-CUDA/1.2.0-GCCcore-13.2.0-CUDA-12.4.0
UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1
UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.4.0
other under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc70
no other files in tarball
May 08 19:18:53 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-61709.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented May 8, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-icelake and accelerator nvidia/cc70 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.05/pr_1030/61710

date job status comment
May 08 18:03:38 UTC 2025 submitted job id 61710 awaits release by job manager
May 08 18:04:03 UTC 2025 released job awaits launch by Slurm scheduler
May 08 18:09:38 UTC 2025 running job 61710 is running
May 08 19:05:17 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-61710.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-icelake-17467293450.tar.gzsize: 4495 MiB (4713433171 bytes)
entries: 12167
modules under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc70/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
GDRCopy/2.4-GCCcore-13.2.0.lua
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1.lua
NCCL/2.20.5-GCCcore-13.2.0-CUDA-12.4.0.lua
OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1.lua
OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0.lua
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua
UCC-CUDA/1.2.0-GCCcore-13.2.0-CUDA-12.4.0.lua
UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1.lua
UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.4.0.lua
software under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc70/software
CUDA/12.1.1
CUDA/12.4.0
GDRCopy/2.4-GCCcore-13.2.0
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1
NCCL/2.20.5-GCCcore-13.2.0-CUDA-12.4.0
OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1
OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1
UCC-CUDA/1.2.0-GCCcore-13.2.0-CUDA-12.4.0
UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1
UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.4.0
other under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc70
no other files in tarball
May 08 19:05:17 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-61710.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Collaborator

Hm, this version of GDRcopy wasn't in there yet, it is now, so rebuilding again

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/cascadelake accel:nvidia/cc80
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/icelake accel:nvidia/cc80

Copy link

eessi-bot bot commented May 12, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented May 12, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/cascadelake accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/icelake accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account casparvl has NO permission to send commands to the bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented May 12, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/cascadelake accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/icelake accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented May 12, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-cascadelake and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.05/pr_1030/62582

date job status comment
May 12 08:33:50 UTC 2025 submitted job id 62582 awaits release by job manager
May 12 08:34:44 UTC 2025 released job awaits launch by Slurm scheduler
May 12 08:40:49 UTC 2025 running job 62582 is running
May 12 09:50:11 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-62582.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-cascadelake-17470411870.tar.gzsize: 4497 MiB (4716216371 bytes)
entries: 12144
modules under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1.lua
NCCL/2.20.5-GCCcore-13.2.0-CUDA-12.4.0.lua
OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1.lua
OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0.lua
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua
UCC-CUDA/1.2.0-GCCcore-13.2.0-CUDA-12.4.0.lua
UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1.lua
UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.4.0.lua
software under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1
NCCL/2.20.5-GCCcore-13.2.0-CUDA-12.4.0
OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1
OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1
UCC-CUDA/1.2.0-GCCcore-13.2.0-CUDA-12.4.0
UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1
UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.4.0
other under 2023.06/software/linux/x86_64/intel/cascadelake/accel/nvidia/cc80
no other files in tarball
May 12 09:50:11 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-62582.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
May 12 10:34:47 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-cascadelake-17470411870.tar.gz to S3 bucket succeeded
May 12 14:12:29 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-cascadelake-17470411870.tar.gz to S3 bucket succeeded

Copy link

eessi-bot bot commented May 12, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-intel-icelake and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.05/pr_1030/62583

date job status comment
May 12 08:33:55 UTC 2025 submitted job id 62583 awaits release by job manager
May 12 08:34:48 UTC 2025 released job awaits launch by Slurm scheduler
May 12 08:40:59 UTC 2025 running job 62583 is running
May 12 09:35:55 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-62583.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-intel-icelake-17470407390.tar.gzsize: 4497 MiB (4716169066 bytes)
entries: 12144
modules under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
CUDA/12.1.1.lua
CUDA/12.4.0.lua
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1.lua
NCCL/2.20.5-GCCcore-13.2.0-CUDA-12.4.0.lua
OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1.lua
OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0.lua
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua
UCC-CUDA/1.2.0-GCCcore-13.2.0-CUDA-12.4.0.lua
UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1.lua
UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.4.0.lua
software under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
CUDA/12.1.1
CUDA/12.4.0
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1
NCCL/2.20.5-GCCcore-13.2.0-CUDA-12.4.0
OSU-Micro-Benchmarks/7.2-gompi-2023a-CUDA-12.1.1
OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1
UCC-CUDA/1.2.0-GCCcore-13.2.0-CUDA-12.4.0
UCX-CUDA/1.14.1-GCCcore-12.3.0-CUDA-12.1.1
UCX-CUDA/1.15.0-GCCcore-13.2.0-CUDA-12.4.0
other under 2023.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
May 12 09:35:55 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-62583.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
May 12 10:36:09 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-icelake-17470407390.tar.gz to S3 bucket succeeded
May 12 14:13:59 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-intel-icelake-17470407390.tar.gz to S3 bucket succeeded

@casparvl casparvl added bot:deploy Ask bot to deploy missing software installations to EESSI and removed ready-to-review labels May 12, 2025
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user casparvl, but this person does not have permission to trigger deployments

@TopRichard TopRichard added bot:deploy Ask bot to deploy missing software installations to EESSI and removed bot:deploy Ask bot to deploy missing software installations to EESSI labels May 12, 2025
Copy link

eessi-bot bot commented May 12, 2025

Label bot:deploy has been set by user TopRichard, which has no permission to trigger the action

@eessi-bot-deucalion
Copy link

Label bot:deploy has been set by user TopRichard, but this person does not have permission to trigger deployments

Copy link

eessi-bot bot commented May 12, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/cascadelake accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc80
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/intel/icelake accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/cascadelake accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/intel/icelake accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented May 12, 2025

Label bot:deploy has been set by user TopRichard, which has no permission to trigger the action

@casparvl casparvl added bot:deploy Ask bot to deploy missing software installations to EESSI and removed bot:deploy Ask bot to deploy missing software installations to EESSI labels May 12, 2025
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user casparvl, but this person does not have permission to trigger deployments

@casparvl casparvl merged commit 39bffa2 into EESSI:2023.06-software.eessi.io May 12, 2025
59 checks passed
Copy link

eessi-bot bot commented May 12, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.04/pr_1030/56936', '/project/def-users/SHARED/jobs/2025.04/pr_1030/56937', '/project/def-users/SHARED/jobs/2025.04/pr_1030/56939', '/project/def-users/SHARED/jobs/2025.04/pr_1030/56940', '/project/def-users/SHARED/jobs/2025.04/pr_1030/57149', '/project/def-users/SHARED/jobs/2025.04/pr_1030/57150', '/project/def-users/SHARED/jobs/2025.04/pr_1030/58375', '/project/def-users/SHARED/jobs/2025.04/pr_1030/58376', '/project/def-users/SHARED/jobs/2025.04/pr_1030/58377', '/project/def-users/SHARED/jobs/2025.04/pr_1030/58378', '/project/def-users/SHARED/jobs/2025.04/pr_1030/58379', '/project/def-users/SHARED/jobs/2025.04/pr_1030/58386', '/project/def-users/SHARED/jobs/2025.05/pr_1030/61252', '/project/def-users/SHARED/jobs/2025.05/pr_1030/61253', '/project/def-users/SHARED/jobs/2025.05/pr_1030/61709', '/project/def-users/SHARED/jobs/2025.05/pr_1030/61710', '/project/def-users/SHARED/jobs/2025.05/pr_1030/62582', '/project/def-users/SHARED/jobs/2025.05/pr_1030/62583'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.05.12

Copy link

eessi-bot bot commented May 12, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.04/pr_1030/2424'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.05.12

@eessi-bot-surf
Copy link

PR merged! Moved ['/projects/eessibot/eessi-bot-surf/jobs/2025.04/pr_1030/11191721', '/projects/eessibot/eessi-bot-surf/jobs/2025.04/pr_1030/11191697'] to /projects/eessibot/eessi-bot-surf/trash_bin/EESSI/software-layer/2025.05.12

@eessi-bot-toprichard
Copy link

PR merged! Moved ['/p/project1/ceasybuilders/bot-rt/jobs/2025.04/pr_1030/13615161', '/p/project1/ceasybuilders/bot-rt/jobs/2025.04/pr_1030/13613825', '/p/project1/ceasybuilders/bot-rt/jobs/2025.04/pr_1030/13613238'] to /p/project1/ceasybuilders/bot-rt/trash_bin/EESSI/software-layer/2025.05.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bot:deploy Ask bot to deploy missing software installations to EESSI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants