ACFL23 Instruction Support #425

JosephMoore25 · 2024-08-30T16:44:16Z

Merging work done towards enabling support for a few codes for ACFL 23, namely STREAM, Minibude, Cloverleaf, Tealeaf, and Minisweep.

This PR is mostly made up of added instruction support. 58 instructions have been added, with 24 unique instructions with the remainder being variants. Most instructions are SVE, with some NEON added.

An additional feature of "infinite loop checking" has been added. This adds a counter in the ROB which throws an error if the same address has been at the head of the ROB for a very long time. This catches a few errors previously found where an erroneous config or broken logic can cause SimEng to get caught in a loop and sometimes eventually hit OOM.

This also fixes an OpenMP bug that has previously popped up for ACFL 23 support, work that Jack had done in a separate branch.

Tests are still being added, and the new group tests need to be added for all instructions. The PR will leave draft stage once all tests have been added.

Here are a list of instructions added:

Opcode	Inst Format	General Test added?	Group Test added?
Opcode::AArch64_UADDLVv8i8v: {	// uaddlv hd, vn.8b	Yes	Yes
Opcode::AArch64_FTSMUL_ZZZ_S: {	// ftsmul zd.s, zn.s, zm.s	Yes	Yes
Opcode::AArch64_FTSMUL_ZZZ_D: {	// ftsmul zd.d, zn.d, zm.d	Yes	Yes
Opcode::AArch64_FTSSEL_ZZZ_S: {	// ftssel zd.s, zn.s, zm.s	Yes	Yes
Opcode::AArch64_FTSSEL_ZZZ_D: {	// ftssel zd.d, zn.d, zm.d	Yes	Yes
Opcode::AArch64_FTMAD_ZZI_D: {	// ftmad zd.s, zn.s, zm.s, #imm	Yes	Yes
Opcode::AArch64_CMEQv2i32rz: {	// cmeq vd.2s, vn.2s, #0	Yes	Yes
Opcode::AArch64_CMHIv2i32: {	// cmhi vd.2s, vn.2s, vm.2s	Yes	Yes
Opcode::AArch64_CMPHS_PPzZZ_B: {	// cmphs pd.b, pg/z, zn.b, zm.b	Yes	Yes
Opcode::AArch64_CMPHS_PPzZZ_D: {	// cmphs pd.d, pg/z, zn.d, zm.d	Yes	Yes
Opcode::AArch64_CMPHS_PPzZZ_H: {	// cmphs pd.h, pg/z, zn.h, zm.h	Yes	Yes
Opcode::AArch64_CMPHS_PPzZZ_S: {	// cmphs pd.s, pg/z, zn.s, zm.s	Yes	Yes
Opcode::AArch64_CPY_ZPmV_B: {	// cpy zd.b, pg/m, vn.b	Yes	Yes
Opcode::AArch64_CPY_ZPmV_D: {	// cpy zd.d, pg/m, vn.d	Yes	Yes
Opcode::AArch64_CPY_ZPmV_H: {	// cpy zd.h, pg/m, vn.h	Yes	Yes
Opcode::AArch64_CPY_ZPmV_S: {	// cpy zd.s, pg/m, vn.s	Yes	Yes
Opcode::AArch64_FDIVv4f32: {	// fdiv vd.4s, vn.4s, vm.4s	Yes	Yes
Opcode::AArch64_LASTB_VPZ_D: {	// lastb dd, pg, zn.d	Yes	Yes
Opcode::AArch64_LASTB_VPZ_S: {	// lastb sd, pg, zn.s	Yes	Yes
Opcode::AArch64_LASTB_VPZ_H: {	// lastb hd, pg, zn.h	Yes	Yes
Opcode::AArch64_LASTB_VPZ_B: {	// lastb bd, pg, zn.b	Yes	Yes
Opcode::AArch64_CLASTB_VPZ_D: {	// clastb dd, pg, dn, zn.d	Yes	Yes
Opcode::AArch64_CLASTB_VPZ_S: {	// clastb sd, pg, sn, zn.s	Yes	Yes
Opcode::AArch64_CLASTB_VPZ_H: {	// clastb hd, pg, hn, zn.h	Yes	Yes
Opcode::AArch64_CLASTB_VPZ_B: {	// clastb bd, pg, bn, zn.b	Yes	Yes
Opcode::AArch64_LDAXRB: {	// ldaxrb wt, [xn]	Yes	Yes
Opcode::AArch64_LDRSWroW: {	// ldrsw xt, [xn, wm, {extend {#amount}}]	Yes	Yes
Opcode::AArch64_ORNv8i8: {	// orn vd.8b, vn.8b, vn.8b	Yes	Yes
Opcode::AArch64_PFIRST_B: {	// pfirst pdn.b, pg, pdn.b	Yes	Yes
Opcode::AArch64_PNEXT_B: {	// pnext pdn.b, pv, pdn.b	Yes	Yes
Opcode::AArch64_PNEXT_H: {	// pnext pdn.h, pv, pdn.h	Yes	Yes
Opcode::AArch64_PNEXT_S: {	// pnext pdn.s, pv, pdn.s	Yes	Yes
Opcode::AArch64_PNEXT_D: {	// pnext pdn.d, pv, pdn.d	Yes	Yes
Opcode::AArch64_SMAX_ZI_D: {	// smax zdn.d, zdn.d, #imm	Yes	Yes
Opcode::AArch64_SMAX_ZI_H: {	// smax zdn.h, zdn.h, #imm	Yes	Yes
Opcode::AArch64_SMAX_ZI_B: {	// smax zdn.b, zdn.b, #imm	Yes	Yes
Opcode::AArch64_SMAX_ZPmZ_D: {	// smax zd.d, pg/m, zn.d, zm.d	Yes	Yes
Opcode::AArch64_SMAX_ZPmZ_H: {	// smax zd.h, pg/m, zn.h, zm.h	Yes	Yes
Opcode::AArch64_SMAX_ZPmZ_B: {	// smax zd.b, pg/m, zn.b, zm.b	Yes	Yes
Opcode::AArch64_SMINV_VPZ_D: {	// sminv sd, pg, zn.d	Yes	Yes
Opcode::AArch64_SMINV_VPZ_H: {	// sminv sd, pg, zn.h	Yes	Yes
Opcode::AArch64_SMINV_VPZ_B: {	// sminv sd, pg, zn.b	Yes	Yes
Opcode::AArch64_SMIN_ZPmZ_D: {	// smin zd.d, pg/m, zn.d, zm.d	Yes	Yes
Opcode::AArch64_SMIN_ZPmZ_H: {	// smin zd.h, pg/m, zn.h, zm.h	Yes	Yes
Opcode::AArch64_SMIN_ZPmZ_B: {	// smin zd.b, pg/m, zn.b, zm.b	Yes	Yes
Opcode::AArch64_SPLICE_ZPZ_D: {	// splice zdn.d, pv, zdn.t, zm.d	Yes	Yes
Opcode::AArch64_SPLICE_ZPZ_S: {	// splice zdn.s, pv, zdn.t, zm.s	Yes	Yes
Opcode::AArch64_STLXRB:	// stlxrb ws, wt, [xn]	Yes	Yes
Opcode::AArch64_STLXRH:	// stlxrh ws, wt, [xn]	Yes	Yes
Opcode::AArch64_STLXR:	// stlxrb ws, {w,x}t, [xn]	Yes	Yes
Opcode::AArch64_UMAXVv16i8v: {	// umaxv bd, vn.16b	Yes	Yes
Opcode::AArch64_UMAXVv4i16v: {	// umaxv hd, vn.4h	Yes	Yes
Opcode::AArch64_UMAXVv4i32v: {	// umaxv sd, vn.4s	Yes	Yes
Opcode::AArch64_UMAXVv8i16v: {	// umaxv hd, vn.8h	Yes	Yes
Opcode::AArch64_UMAXVv8i8v: {	// umaxv bd, vn.8b	Yes	Yes
Opcode::AArch64_WHILELS_PXX_B: {	// whilels pd.b, xn, xm	Yes	Yes
Opcode::AArch64_WHILELS_PXX_D: {	// whilels pd.d, xn, xm	Yes	Yes
Opcode::AArch64_WHILELS_PXX_H: {	// whilels pd.h, xn, xm	Yes	Yes
Opcode::AArch64_WHILELS_PXX_S: {	// whilels pd.s, xn, xm	Yes	Yes

ABenC377

Code all looks good. Though, I'm not sure about the infinite loop checker. If I've missed a discussion about this then please ignore me. But if it is being added as a work around for problems encountered because of erroneous configs or broken logic, shouldn't we be fixing the causes not the symptoms? Erroneous configs are kind of a user error, but we could update the documentation to help them avoid this, and if there is broken logic in SimEng we should be fixing it not plastering over the resulting problem. I suppose I can see the value of this type of check in debug mode to flag a problem to the user if they are running into issues in release, but it seems like unnecessary overhead for Release.

FinnWilkinson

Most bits look good. Comments mainly about adhearing to the project's style and some confusion on SVE helpers

src/include/simeng/pipeline/ReorderBuffer.hh

src/lib/arch/aarch64/ExceptionHandler.cc

src/lib/arch/aarch64/InstructionMetadata.cc

src/include/simeng/arch/aarch64/helpers/sve.hh

src/lib/arch/aarch64/Instruction_execute.cc

FinnWilkinson

All looking pretty good - just a few changes needed and some pedantic comments on comments 😅

src/include/simeng/arch/aarch64/helpers/neon.hh

src/include/simeng/arch/aarch64/helpers/sve.hh

FinnWilkinson · 2024-12-18T15:00:22Z

src/lib/pipeline/ReorderBuffer.cc

+        std::cerr << "[SimEng:ReorderBuffer] Infinite loop detected in rob "
+                     "commit at instruction address "
+                  << std::hex << uop->getInstructionAddress() << std::dec
+                  << " (" << uop->getMicroOpIndex() << ")." << std::endl;


Whats the rational for printing the Micro-op index?

This may give additional context to the user what exactly is stuck at the head of the ROB if the instruction is uopd. I have updated the comment generally, though we should have a discussion offline on what exactly we want to print out in one of these cases.

Following offline discussion, we agree to keep this message in release mode, but add more detail s.t. the user is aware of why this is being triggered, what's triggering it, and what to do to resolve the issue. The Micro-op index in particular just adds additional verbosity so remains in the message.

dANW34V3R · 2024-12-18T14:18:01Z

test/regression/aarch64/Syscall.cc

@@ -1080,7 +1080,7 @@ TEST_P(Syscall, sched_getaffinity) {
    )");
  EXPECT_EQ(getGeneralRegister<int64_t>(21), -1);
  EXPECT_EQ(getGeneralRegister<int64_t>(22), -1);
-  EXPECT_EQ(getGeneralRegister<int64_t>(23), 1);


What has caused this to change?

dANW34V3R · 2024-12-18T18:12:42Z

src/lib/arch/aarch64/ExceptionHandler.cc

-          stateChange = {ChangeType::REPLACEMENT, {R0}, {retval}};
-          stateChange.memoryAddresses.push_back({mask, 1});
+          uint64_t retval = static_cast<uint64_t>(bitmask);
+          stateChange = {ChangeType::REPLACEMENT, {R0}, {sizeof(retval)}};


The man page for sched_getaffinity states that the function returns 0 on success and -1 on failure. This seems to be returning the size of a uint64_t which will always be 8. I think this is incorrect.

What I think you have done is update the value being set in memory correctly on 434 (updating the size). But also updated the value returned to the program to be the size also on 433. Depending on the behaviour we want, 433 should be updated potentially in the way it was done previously i.e. set to 0 if pid == 0 and -1 otherwise.

What was the reason for the update?

This is worth @jj16791 investigating as it was his find/fix so he will know more than I do on the issue.

The reason given at the time was:

The assert you were triggering was KMP_ASSERT(__kmp_avail_proc == __kmp_topology->get_num_hw_threads());. Newer LLVM OMP runtimes require the affinity mask to be at least 8 bytes in length otherwise it will read the number of available cores out as 0 due to some casting. The affinity mask we were returning was 1 byte in length hence the assert triggered as __kmp_avail_proc was 0. Figured it out from a combination of isolating the instructions run leading up to this assert and then from GodBolt/SimEng figuring out why our mask was being converted to 0 procs available

I've been testing using a STREAM binary (with OpenMP support) compiled with ACFL23. With the current fix, this works. Removing the sizeof on 433 means that this fails. I do agree though that the current implementation doesn't line up with what I'd expect should work.

Not sure what manpage you found but most should say something along the lines of "but see "C library/kernel differences" below, which notes that the underlying sched_getaffinity() differs in its return value". The difference is that is returns the number of bytes used to represent the mask. With a 64bit mask that's 8 bytes

Always check for stuff like this re the non-wrapped version of the syscall does something different

… size

…nstructions

…instructions/helpers from neoverse-v2 branch.

…to come

…clang23!

…o do with cmphs

…s a store

…xed a few metadata issues

…Updated comment for infinite loop detector

jj16791 · 2025-01-24T13:25:53Z

src/lib/pipeline/ReorderBuffer.cc

+                     "variable `robHeadRepeatLimit_`. Please raise "
+                     "an issue on GitHub if the problem persists."
+                  << std::endl;
+        exit(1);


Anything to be said here for resetting the counter and letting the simulation continue? Just thinking about a possible scenario where this triggers after days of simulation but it wasn't actually a failing state.

jj16791 · 2025-01-24T16:07:16Z

src/include/simeng/arch/aarch64/helpers/neon.hh

+  const U* n = sourceValues[0].getAsVector<U>();
+  T out = 0;
+  for (int i = 0; i < I; i++) {
+    out += n[i];


Should n[i] be explicitly cast here? Just aware we've had some issues with relying on implicit casting in prior years

jj16791 · 2025-01-24T16:57:12Z

src/include/simeng/arch/aarch64/helpers/sve.hh

+  const uint16_t partition_num = VL_bits / (sizeof(T) * 8);
+  T out[256 / sizeof(T)] = {0};
+
+  U bit_0_mask = static_cast<U>(1) << (sizeof(T) * 8 - 1);


We should be getting bit 0 for the sign not bit N-1

Check if there are any other instances of this

jj16791 · 2025-01-30T15:07:50Z

src/include/simeng/arch/aarch64/helpers/sve.hh

+    // If no active lane has been found, select highest element instead
+    if (i == 0) lastElem = partition_num - 1;
+  }
+  return {n[lastElem], 256};


Need to make sure we zero-extend here

jj16791 · 2025-01-30T15:11:54Z

src/include/simeng/arch/aarch64/helpers/sve.hh

+  return {n[lastElem], 256};
+}
+
+/** Helper function for SVE instructions with the format `clastb zd, pg, zd,


I think the mnemonic is wrong here (reflects CLASTB (vectors)

jj16791 · 2025-01-30T15:12:12Z

src/include/simeng/arch/aarch64/helpers/sve.hh

+  } else {
+    out = n[lastElem];
+  }
+  return {out, 256};


Need to make sure we zero-extend here

jj16791 · 2025-01-30T15:43:36Z

src/include/simeng/arch/aarch64/helpers/sve.hh

+ * T represents the type of sourceValues (e.g. for zn.d, T = uint64_t).
+ * Returns correctly formatted RegisterValue. */
+template <typename T>
+RegisterValue sveSplice(srcValContainer& sourceValues, const uint16_t VL_bits) {


I assume this is the destructive variant of splice? If so, we should denote this somewhere

jj16791 · 2025-01-30T15:47:22Z

src/lib/arch/aarch64/Instruction_execute.cc

@@ -4359,6 +4588,14 @@ void Instruction::execute() {
                            sourceValues_[1].get<uint64_t>());
        break;
      }
+      case Opcode::AArch64_SPLICE_ZPZ_D: {  // splice zdn.d, pv, zdn.t, zm.d


Change .t in mnemonic to actual data type in use

jj16791 · 2025-01-30T15:47:28Z

src/lib/arch/aarch64/Instruction_execute.cc

+        results_[0] = sveSplice<double>(sourceValues_, VL_bits);
+        break;
+      }
+      case Opcode::AArch64_SPLICE_ZPZ_S: {  // splice zdn.s, pv, zdn.t, zm.s


Change .t in mnemonic to actual data type in use

JosephMoore25 added the 0.9.7 Part of SimEng Release 0.9.7 label Aug 30, 2024

JosephMoore25 self-assigned this Aug 30, 2024

JosephMoore25 marked this pull request as ready for review December 10, 2024 13:35

JosephMoore25 requested review from jj16791, ABenC377, FinnWilkinson and dANW34V3R December 10, 2024 13:35

ABenC377 reviewed Dec 10, 2024

View reviewed changes

FinnWilkinson requested changes Dec 10, 2024

View reviewed changes

FinnWilkinson requested changes Dec 18, 2024

View reviewed changes

dANW34V3R reviewed Dec 18, 2024

View reviewed changes

JosephMoore25 and others added 19 commits December 20, 2024 12:41

Added LDRSWroW, LDAXRB, stlxrb insts

9a9ca3f

Magic OMP affinity fix (thanks Jack)

9adaeee

Added Cpy (Simd&FP scalar) instruction and alias, with tests for each…

70f0387

… size

Fixed OMP getaffinity syscall for new fix. Fixed tests for CPY_ZPmV i…

1873378

…nstructions

Added more instructions so stream+sve compiles with armclang23. Some …

3518327

…instructions/helpers from neoverse-v2 branch.

Added a couple more instructions, working towards minibude armclang23

81889ab

Added ClastB instructions with tests that (finally) pass. More tests …

c6c6000

…to come

Cleaned up clastb tests and added S,H,B cases

240ae68

Dirty WIP for pnext instruction

5e79850

Added pnext inst along with tests

f8ea7f2

Added NZCV changes to pnext and updated tests

5992cd1

Added weird FP Trig SVE insts (untested). Minibude now works with arm…

49dbbbe

…clang23!

Supported minisweep

2716a71

Added instructions to support CloverLeaf armclang23. Numerical error :O

8c56ee5

Added a test to start investigating what's wrong with cloverleaf

32d0d6c

Added test for LDRSWroW

2bb065b

Added mechanism to detect ROB loops. Also added FDIVv4f32 inst

b40d011

Clang format

14cc2e1

Fixed a couple build issues/warnings

bd3bfc8

JosephMoore25 added 19 commits December 20, 2024 12:41

Added uaddlv test, as well as rolled back a ROB fix

3e45c86

Added tests for cmphs and a couple other insts. Fixed a couple bugs t…

96034a5

…o do with cmphs

Added tests for FDIV and LASTB. Fixed LASTB logic.

466fc3d

Finally got smax tests

0aa2584

Also added smin tests

75f0d9f

Added tests for umaxv and whilels

02386a3

Added (or fixed) tests for pfirst and splice

4712ea4

Added tests for ftsmul and fixed some broken logic

c73b2d2

Added comment to ftsmul test

6fe35ec

Added FTSSEL tests. Nasty bugger....

f22be5a

Finally got ftmad sorted. Had issues with 32 bit for some reason

dad0467

Added LDAXRB and STLXR insts. STLXR took some fix in decode to flag a…

5a611d3

…s a store

Added test for ORN. Finished all base tests

a58409b

Added group tests to all added insts

0ffcd51

Cleaned up infinite ROB check and OpenMP bug

4361eab

Responded to PR comments. Cleaned up a lot of helper functions and fi…

6345f08

…xed a few metadata issues

Responded to more comments

6119ade

Updated naming for confusing lastb helper

6da7f5c

Fixed issues arising from merge conflicts on Capstone Update branch. …

c9f708b

…Updated comment for infinite loop detector

JosephMoore25 force-pushed the spec-hpc branch from 62561bc to c9f708b Compare December 20, 2024 12:43

jj16791 requested changes Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACFL23 Instruction Support #425

ACFL23 Instruction Support #425

JosephMoore25 commented Aug 30, 2024 •

edited

Loading

ABenC377 left a comment

FinnWilkinson left a comment

FinnWilkinson left a comment

FinnWilkinson Dec 18, 2024

JosephMoore25 Dec 18, 2024

JosephMoore25 Dec 20, 2024

dANW34V3R Dec 18, 2024

dANW34V3R Dec 18, 2024

dANW34V3R Dec 18, 2024

JosephMoore25 Dec 20, 2024

jj16791 Jan 24, 2025

jj16791 Jan 24, 2025

jj16791 Jan 24, 2025

jj16791 Jan 24, 2025

jj16791 Jan 24, 2025

jj16791 Jan 24, 2025

jj16791 Jan 30, 2025

jj16791 Jan 30, 2025

jj16791 Jan 30, 2025

jj16791 Jan 30, 2025

jj16791 Jan 30, 2025

jj16791 Jan 30, 2025

ACFL23 Instruction Support #425

Are you sure you want to change the base?

ACFL23 Instruction Support #425

Conversation

JosephMoore25 commented Aug 30, 2024 • edited Loading

ABenC377 left a comment

Choose a reason for hiding this comment

FinnWilkinson left a comment

Choose a reason for hiding this comment

FinnWilkinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JosephMoore25 commented Aug 30, 2024 •

edited

Loading