-
Notifications
You must be signed in to change notification settings - Fork 84
[CORE] Reduce cost of XferCRC::xferImplementation by 60% #1228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Perhaps we could go for even fewer branches by using the data size as a template parameter. Most of the call sites use a size that's known at compile-time. |
Replacing the winsock call is good. Removing the hibit branch also makes sense, because the branch predictor likely does not work well there (50% chance). I would like to see a performance comparison when this works without any remaining logical mismatch. |
Fixed the logic on validity handling of the leftover bytes. |
9949da4
to
aa01517
Compare
rebased with main, just going to make some other tweaks |
639521e
to
11539cd
Compare
Tested and tweaked with the switch replacing the second loop, it doesn't mismatch with the golden replays and you would expect it to mismatch pretty quckly if it was going to. |
Are you seeing any improvements in overall performance? |
I still need to look into performance testing, need to make a game replay under debug, might just do it with some AI since they can really hit different systems hard. |
11539cd
to
f3c4219
Compare
tweaked the switch to catch when data is null, so we don't have valid data. |
Is this meant to be a Draft Review? |
Initially while the implementation was being checked, but i think it is in an okay place now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make XferCRC::xferImplementation a bit more readable? Essentially we trade a bit more runtime speed for a bit less maintainability.
Performance testXferCRC crc1;
XferCRC crc2;
{
std::srand(111);
int64_t begin = GetTickCount64();
UnsignedInt var2 = (UnsignedInt)std::rand();
Short var1 = (Short)var2;
struct S
{
char buf[255];
} var3;
memset(&var3, var2, sizeof(var3));
for (Int i = 0; i < 100000000; ++i)
{
crc1.xferShort_ORIGINAL(&var1);
crc1.xferUnsignedInt_ORIGINAL(&var2);
crc1.xferUser_ORIGINAL(&var3, sizeof(var3));
crc1.xferUser_ORIGINAL(NULL, 1);
crc1.xferUser_ORIGINAL(&var3, 0);
}
int64_t time = GetTickCount64() - begin;
DEBUG_LOG(("time old %lld ms\n", time));
}
{
std::srand(111);
int64_t begin = GetTickCount64();
UnsignedInt var2 = (UnsignedInt)std::rand();
Short var1 = (Short)var2;
struct S
{
char buf[255];
} var3;
memset(&var3, var2, sizeof(var3));
for (Int i = 0; i < 100000000; ++i)
{
crc2.xferShort(&var1);
crc2.xferUnsignedInt(&var2);
crc2.xferUser(&var3, sizeof(var3));
crc2.xferUser(NULL, 1);
crc2.xferUser(&var3, 0);
}
int64_t time = GetTickCount64() - begin;
DEBUG_LOG(("time new %lld ms\n", time));
}
DEBUG_ASSERTCRASH(crc1.getCRC() == crc2.getCRC(), ("ohoh")); Result
New implementation is about 55% faster. Bravo. |
f3c4219
to
cbb279b
Compare
Just a rebase with main first. |
cbb279b
to
5b755e5
Compare
So people thought this was too complicated, so i took offence to that and made it faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks better now indeed. There is more we can do I think.
5b755e5
to
153b540
Compare
A quick rebase before making other tweaks. |
153b540
to
58d9071
Compare
Updated with further simplifications and small refactors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good to the eye now.
Did you test it again for mismatch or performance?
I tested for mismatches before every push and all versions ran without mismatching. |
I can run the benchmark one more time. |
New measurement passed with:
Around 60% faster now. Code is also reasonably understandable. |
This PR refactors the XferCRC to make the code branchless and to remove the winsock dependency within it.
There may be minimal performance improvement from this as other factors will need changing in the data being passed to the CRC to further improve the performance.
The winsock dependency has been replaced with the endian compat library instead.