-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACFL23 Instruction Support #425
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code all looks good. Though, I'm not sure about the infinite loop checker. If I've missed a discussion about this then please ignore me. But if it is being added as a work around for problems encountered because of erroneous configs or broken logic, shouldn't we be fixing the causes not the symptoms? Erroneous configs are kind of a user error, but we could update the documentation to help them avoid this, and if there is broken logic in SimEng we should be fixing it not plastering over the resulting problem. I suppose I can see the value of this type of check in debug mode to flag a problem to the user if they are running into issues in release, but it seems like unnecessary overhead for Release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most bits look good. Comments mainly about adhearing to the project's style and some confusion on SVE helpers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looking pretty good - just a few changes needed and some pedantic comments on comments 😅
src/lib/pipeline/ReorderBuffer.cc
Outdated
std::cerr << "[SimEng:ReorderBuffer] Infinite loop detected in rob " | ||
"commit at instruction address " | ||
<< std::hex << uop->getInstructionAddress() << std::dec | ||
<< " (" << uop->getMicroOpIndex() << ")." << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the rational for printing the Micro-op index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may give additional context to the user what exactly is stuck at the head of the ROB if the instruction is uopd. I have updated the comment generally, though we should have a discussion offline on what exactly we want to print out in one of these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following offline discussion, we agree to keep this message in release mode, but add more detail s.t. the user is aware of why this is being triggered, what's triggering it, and what to do to resolve the issue. The Micro-op index in particular just adds additional verbosity so remains in the message.
@@ -1080,7 +1080,7 @@ TEST_P(Syscall, sched_getaffinity) { | |||
)"); | |||
EXPECT_EQ(getGeneralRegister<int64_t>(21), -1); | |||
EXPECT_EQ(getGeneralRegister<int64_t>(22), -1); | |||
EXPECT_EQ(getGeneralRegister<int64_t>(23), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What has caused this to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below
stateChange = {ChangeType::REPLACEMENT, {R0}, {retval}}; | ||
stateChange.memoryAddresses.push_back({mask, 1}); | ||
uint64_t retval = static_cast<uint64_t>(bitmask); | ||
stateChange = {ChangeType::REPLACEMENT, {R0}, {sizeof(retval)}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The man page for sched_getaffinity
states that the function returns 0 on success and -1 on failure. This seems to be returning the size of a uint64_t which will always be 8. I think this is incorrect.
What I think you have done is update the value being set in memory correctly on 434 (updating the size). But also updated the value returned to the program to be the size also on 433. Depending on the behaviour we want, 433 should be updated potentially in the way it was done previously i.e. set to 0 if pid == 0
and -1 otherwise.
What was the reason for the update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth @jj16791 investigating as it was his find/fix so he will know more than I do on the issue.
The reason given at the time was:
The assert you were triggering was KMP_ASSERT(__kmp_avail_proc == __kmp_topology->get_num_hw_threads());. Newer LLVM OMP runtimes require the affinity mask to be at least 8 bytes in length otherwise it will read the number of available cores out as 0 due to some casting. The affinity mask we were returning was 1 byte in length hence the assert triggered as __kmp_avail_proc was 0. Figured it out from a combination of isolating the instructions run leading up to this assert and then from GodBolt/SimEng figuring out why our mask was being converted to 0 procs available
I've been testing using a STREAM binary (with OpenMP support) compiled with ACFL23. With the current fix, this works. Removing the sizeof
on 433 means that this fails. I do agree though that the current implementation doesn't line up with what I'd expect should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what manpage you found but most should say something along the lines of "but see "C library/kernel differences" below, which notes that the underlying sched_getaffinity() differs in its return value". The difference is that is returns the number of bytes used to represent the mask. With a 64bit mask that's 8 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always check for stuff like this re the non-wrapped version of the syscall does something different
…instructions/helpers from neoverse-v2 branch.
…xed a few metadata issues
…Updated comment for infinite loop detector
62561bc
to
c9f708b
Compare
"variable `robHeadRepeatLimit_`. Please raise " | ||
"an issue on GitHub if the problem persists." | ||
<< std::endl; | ||
exit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything to be said here for resetting the counter and letting the simulation continue? Just thinking about a possible scenario where this triggers after days of simulation but it wasn't actually a failing state.
const U* n = sourceValues[0].getAsVector<U>(); | ||
T out = 0; | ||
for (int i = 0; i < I; i++) { | ||
out += n[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should n[i]
be explicitly cast here? Just aware we've had some issues with relying on implicit casting in prior years
const uint16_t partition_num = VL_bits / (sizeof(T) * 8); | ||
T out[256 / sizeof(T)] = {0}; | ||
|
||
U bit_0_mask = static_cast<U>(1) << (sizeof(T) * 8 - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be getting bit 0 for the sign not bit N-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if there are any other instances of this
// If no active lane has been found, select highest element instead | ||
if (i == 0) lastElem = partition_num - 1; | ||
} | ||
return {n[lastElem], 256}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make sure we zero-extend here
return {n[lastElem], 256}; | ||
} | ||
|
||
/** Helper function for SVE instructions with the format `clastb zd, pg, zd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mnemonic is wrong here (reflects CLASTB (vectors)
} else { | ||
out = n[lastElem]; | ||
} | ||
return {out, 256}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make sure we zero-extend here
* T represents the type of sourceValues (e.g. for zn.d, T = uint64_t). | ||
* Returns correctly formatted RegisterValue. */ | ||
template <typename T> | ||
RegisterValue sveSplice(srcValContainer& sourceValues, const uint16_t VL_bits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is the destructive variant of splice
? If so, we should denote this somewhere
@@ -4359,6 +4588,14 @@ void Instruction::execute() { | |||
sourceValues_[1].get<uint64_t>()); | |||
break; | |||
} | |||
case Opcode::AArch64_SPLICE_ZPZ_D: { // splice zdn.d, pv, zdn.t, zm.d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change .t
in mnemonic to actual data type in use
results_[0] = sveSplice<double>(sourceValues_, VL_bits); | ||
break; | ||
} | ||
case Opcode::AArch64_SPLICE_ZPZ_S: { // splice zdn.s, pv, zdn.t, zm.s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change .t in mnemonic to actual data type in use
Merging work done towards enabling support for a few codes for ACFL 23, namely STREAM, Minibude, Cloverleaf, Tealeaf, and Minisweep.
This PR is mostly made up of added instruction support. 58 instructions have been added, with 24 unique instructions with the remainder being variants. Most instructions are SVE, with some NEON added.
An additional feature of "infinite loop checking" has been added. This adds a counter in the ROB which throws an error if the same address has been at the head of the ROB for a very long time. This catches a few errors previously found where an erroneous config or broken logic can cause SimEng to get caught in a loop and sometimes eventually hit OOM.
This also fixes an OpenMP bug that has previously popped up for ACFL 23 support, work that Jack had done in a separate branch.
Tests are still being added, and the new group tests need to be added for all instructions. The PR will leave draft stage once all tests have been added.
Here are a list of instructions added: