Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bwa-mem2 SIGILLs with SVE instruction cntd #1286

Closed
brainstorm opened this issue Feb 26, 2025 · 5 comments · Fixed by bioconda/bioconda-recipes#54134
Closed

bwa-mem2 SIGILLs with SVE instruction cntd #1286

brainstorm opened this issue Feb 26, 2025 · 5 comments · Fixed by bioconda/bioconda-recipes#54134

Comments

@brainstorm
Copy link

brainstorm commented Feb 26, 2025

Hello dear @mr-c (long time no see!) and SIMDE community!

I'm porting a bioinformatics pipeline and its dependencies to ARMv8.2-a ISA (AWS Graviton2, c6g instance) and I believe I've hit an issue with SVE instructions leaking where they shouldn't?:

$ uname -a
Linux ip-172-31-19-10.ap-southeast-2.compute.internal 6.1.128-136.201.amzn2023.aarch64 #1 SMP Mon Feb 10 16:17:41 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

$ cat /proc/cpuinfo
(...)
processor       : 15
BogoMIPS        : 243.75
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

Here's my issue with bwa-mem2 as it comes out of bioconda, allegedly patched for Aarch64:

$ which bwa-mem2
~/miniconda3/envs/libzlib/bin/bwa-mem2   (created to debug this, installed from bioconda)
$ bwa-mem2 version
2.2.1
$  gdb --args bwa-mem2 mem -Y -K 100000000 -R '@RG\tID:subject_a.normal.library_1.001\tSM:subject_a.normal' -t 1 GCA_000001405.15_GRCh38_no_alt_analysis_set.fna subject_a.normal.dna.R1.fastq.gz subject_a.normal.dna.R2.fastq.gz
(...)
Reading symbols from bwa-mem2...
(gdb) bt
No stack.
(gdb) run
Starting program: /home/ec2-user/miniconda3/envs/libzlib/bin/bwa-mem2 mem -Y -K 100000000 -R @RG\\tID:subject_a.normal.library_1.001\\tSM:subject_a.normal -t 1 GCA_000001405.15_GRCh38_no_alt_analysis_set.fna subject_a.normal.dna.R1.fastq.gz subject_a.normal.dna.R2.fastq.gz
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
-----------------------------
Executing in Scalar mode!!
-----------------------------
* SA compression enabled with xfactor: 8
* Ref file: GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
* Entering FMI_search
* Index file found. Loading index from GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bwt.2bit.64
* Reference seq len for bi-index = 6199845083

Program received signal SIGILL, Illegal instruction.
0x0000aaaaaaac122c in FMI_search::load_index (this=this@entry=0xaaaaaab23620) at src/FMI_search.cpp:433
433     src/FMI_search.cpp: No such file or directory.
(gdb) bt
#0  0x0000aaaaaaac122c in FMI_search::load_index (this=this@entry=0xaaaaaab23620) at src/FMI_search.cpp:433
#1  0x0000aaaaaaaa5db0 in main_mem (argc=argc@entry=11, argv=argv@entry=0xfffffffff140) at src/fastmap.cpp:854
#2  0x0000aaaaaaaa4584 in main (argc=12, argv=0xfffffffff138) at src/main.cpp:104

(gdb) disas
(...)
   0x0000aaaaaaac1220 <+396>:   mov     x3, x22
   0x0000aaaaaaac1224 <+400>:   blr     x21
   0x0000aaaaaaac1228 <+404>:   mov     x0, #0x0                        // #0
=> 0x0000aaaaaaac122c <+408>:   cntd    x2
   0x0000aaaaaaac1230 <+412>:   mov     w1, #0x5                        // #5
   0x0000aaaaaaac1234 <+416>:   whilelo p0.d, wzr, w1
   0x0000aaaaaaac1238 <+420>:   ld1d    {z0.d}, p0/z, [x20, x0, lsl #3]
(...)

So, what could be the reason to have cntd (AFAIK a SVE instruction), when SIMDE is told to compile as native, which in this (AWS) instance (no pun intended) shouldn't bring in SVE?

Perhaps /proc/cpuinfo doesn't give enough info to tell if this machine has SVE or not (wild guess)? I'll be trying to understand the guts of SIMDE next and testing tomorrow, just wanted to document this somewhere just in case this is a known issue and somebody else is already working on it?

/cc @ohofmann @scwatts @martin-g @delagoya

@brainstorm
Copy link
Author

brainstorm commented Feb 26, 2025

Just realised that bioconda's Aarch64 CI builders must be the ones having SVE and therefore the resulting binary has those instructions, will check but makes sense!

That'd mean only AWS Graviton 3 and above are supported.

@martin-g
Copy link

martin-g commented Feb 26, 2025

@brainstorm
Copy link
Author

brainstorm commented Feb 26, 2025

See https://github.com/bioconda/bioconda-recipes/blob/6c6b2171c937b03a334733d53bdff87c43ac16d5/recipes/bwa-mem2/build.sh#L24

Indeed it builds with arch=native

Yes, Martin, that's exactly one of my links on the original post.

TL;DR: SIMDE's arch=native will build with armv8.2-a+sve instead of just armv8.2-a on the Bioconda CI builder machines when the binary is packaged.

So SIMDE is not at fault, but parametrization of bwa-mem2 vector instructions is, together with how bioconda packages it.

I didn't realise until later that bioconda CI builder machines must have Aarch64 processors with SVE, so AWS Graviton2 instances (which don't have SVE) and below will SIGILL with bioconda packages.

Closing, I'll focus on AWS Graviton 4 instances (i.e R8g.4xlarge) for now since they are more price/performance competitive anyways and it'd be confusing to parametrise this on build time through Bioconda to have Graviton 1 and Graviton 2 (non-SVE processors) working without issues.

@brainstorm
Copy link
Author

OTOH, I suspect that we are leaving quite some perf on the table by not adopting the work done by bwa-mem2/bwa-mem2#248 (instead of SIMDE), but I digress.

@mr-c
Copy link
Collaborator

mr-c commented Feb 26, 2025

Repeating my comment here, for posterity:

Is there an official architecture baseline policy for bioconda?

Perhaps compiling for more than one ARM version and choosing one of them at runtime would be better.

https://wiki.debian.org/InstructionSelection (https://github.com/ekg/subarch-select could be easily taught about ARM features, it is just a thin wrapper around https://github.com/google/cpu_features/ which already knows about aarch64 CPU features on linux)

Or if a "fat" binary is desired, bwa-mem2 could be a good candidate to use in documenting how to combine SIMDe with CPU feature dispatch: #1268

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants