Skip to content

[CORE] Reduce cost of XferCRC::xferImplementation by 60% #1228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 19, 2025

Conversation

Mauller
Copy link

@Mauller Mauller commented Jul 5, 2025

This PR refactors the XferCRC to make the code branchless and to remove the winsock dependency within it.

There may be minimal performance improvement from this as other factors will need changing in the data being passed to the CRC to further improve the performance.

The winsock dependency has been replaced with the endian compat library instead.

@Mauller Mauller self-assigned this Jul 5, 2025
@Mauller Mauller added Minor Severity: Minor < Major < Critical < Blocker Gen Relates to Generals ZH Relates to Zero Hour Refactor Edits the code with insignificant behavior changes, is never user facing labels Jul 5, 2025
@Caball009
Copy link

Caball009 commented Jul 5, 2025

Perhaps we could go for even fewer branches by using the data size as a template parameter. Most of the call sites use a size that's known at compile-time.

@xezon
Copy link

xezon commented Jul 5, 2025

Replacing the winsock call is good. Removing the hibit branch also makes sense, because the branch predictor likely does not work well there (50% chance). I would like to see a performance comparison when this works without any remaining logical mismatch.

@Mauller Mauller force-pushed the refactor-xfercrc branch from fa239a4 to 9949da4 Compare July 6, 2025 09:14
@Mauller
Copy link
Author

Mauller commented Jul 6, 2025

Fixed the logic on validity handling of the leftover bytes.

@Mauller Mauller force-pushed the refactor-xfercrc branch from 9949da4 to aa01517 Compare July 10, 2025 19:41
@Mauller
Copy link
Author

Mauller commented Jul 10, 2025

rebased with main, just going to make some other tweaks

@Mauller Mauller force-pushed the refactor-xfercrc branch 2 times, most recently from 639521e to 11539cd Compare July 10, 2025 19:53
@Mauller
Copy link
Author

Mauller commented Jul 10, 2025

Tested and tweaked with the switch replacing the second loop, it doesn't mismatch with the golden replays and you would expect it to mismatch pretty quckly if it was going to.

@Caball009
Copy link

Are you seeing any improvements in overall performance?

@Mauller
Copy link
Author

Mauller commented Jul 12, 2025

Are you seeing any improvements in overall performance?

I still need to look into performance testing, need to make a game replay under debug, might just do it with some AI since they can really hit different systems hard.

@Mauller Mauller force-pushed the refactor-xfercrc branch from 11539cd to f3c4219 Compare July 12, 2025 17:26
@Mauller
Copy link
Author

Mauller commented Jul 12, 2025

tweaked the switch to catch when data is null, so we don't have valid data.

@xezon
Copy link

xezon commented Jul 13, 2025

Is this meant to be a Draft Review?

@Mauller
Copy link
Author

Mauller commented Jul 13, 2025

Is this meant to be a Draft Review?

Initially while the implementation was being checked, but i think it is in an okay place now.

@Mauller Mauller marked this pull request as ready for review July 13, 2025 10:14
Copy link

@xezon xezon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make XferCRC::xferImplementation a bit more readable? Essentially we trade a bit more runtime speed for a bit less maintainability.

@xezon xezon changed the title [CORE] Refactor XferCRC to make it branchless and to remove winsock dependency [CORE] Reduce cost of XferCRC by 55% Jul 14, 2025
@xezon
Copy link

xezon commented Jul 14, 2025

Performance test

XferCRC crc1;
XferCRC crc2;
{
	std::srand(111);
	int64_t begin = GetTickCount64();
	UnsignedInt var2 = (UnsignedInt)std::rand();
	Short var1 = (Short)var2;
	struct S
	{
		char buf[255];
	} var3;
	memset(&var3, var2, sizeof(var3));

	for (Int i = 0; i < 100000000; ++i)
	{
		crc1.xferShort_ORIGINAL(&var1);
		crc1.xferUnsignedInt_ORIGINAL(&var2);
		crc1.xferUser_ORIGINAL(&var3, sizeof(var3));
		crc1.xferUser_ORIGINAL(NULL, 1);
		crc1.xferUser_ORIGINAL(&var3, 0);
	}

	int64_t time = GetTickCount64() - begin;
	DEBUG_LOG(("time old %lld ms\n", time));
}
{
	std::srand(111);
	int64_t begin = GetTickCount64();
	UnsignedInt var2 = (UnsignedInt)std::rand();
	Short var1 = (Short)var2;
	struct S
	{
		char buf[255];
	} var3;
	memset(&var3, var2, sizeof(var3));

	for (Int i = 0; i < 100000000; ++i)
	{
		crc2.xferShort(&var1);
		crc2.xferUnsignedInt(&var2);
		crc2.xferUser(&var3, sizeof(var3));
		crc2.xferUser(NULL, 1);
		crc2.xferUser(&var3, 0);
	}

	int64_t time = GetTickCount64() - begin;
	DEBUG_LOG(("time new %lld ms\n", time));
}

DEBUG_ASSERTCRASH(crc1.getCRC() == crc2.getCRC(), ("ohoh"));

Result

time old 20547 ms
time new 9203 ms

New implementation is about 55% faster. Bravo.

@xezon xezon added Performance Is a performance concern and removed Refactor Edits the code with insignificant behavior changes, is never user facing labels Jul 15, 2025
@xezon xezon changed the title [CORE] Reduce cost of XferCRC by 55% [CORE] Reduce cost of XferCRC::xferImplementation by 55% Jul 15, 2025
@Mauller Mauller force-pushed the refactor-xfercrc branch from f3c4219 to cbb279b Compare July 15, 2025 17:30
@Mauller
Copy link
Author

Mauller commented Jul 15, 2025

Just a rebase with main first.

@Mauller Mauller force-pushed the refactor-xfercrc branch from cbb279b to 5b755e5 Compare July 15, 2025 18:24
@Mauller
Copy link
Author

Mauller commented Jul 15, 2025

So people thought this was too complicated, so i took offence to that and made it faster.

Copy link

@xezon xezon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks better now indeed. There is more we can do I think.

@Mauller Mauller force-pushed the refactor-xfercrc branch from 5b755e5 to 153b540 Compare July 16, 2025 17:32
@Mauller
Copy link
Author

Mauller commented Jul 16, 2025

A quick rebase before making other tweaks.

@Mauller Mauller force-pushed the refactor-xfercrc branch from 153b540 to 58d9071 Compare July 16, 2025 17:55
@Mauller
Copy link
Author

Mauller commented Jul 16, 2025

Updated with further simplifications and small refactors.

Copy link

@xezon xezon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to the eye now.

Did you test it again for mismatch or performance?

@Mauller
Copy link
Author

Mauller commented Jul 16, 2025

Looks very good to the eye now.

Did you test it again for mismatch or performance?

I tested for mismatches before every push and all versions ran without mismatching.
whenever it was going to mismatch it always happened within the first few hundred frames.

@xezon
Copy link

xezon commented Jul 16, 2025

I can run the benchmark one more time.

@xezon
Copy link

xezon commented Jul 19, 2025

New measurement passed with:

time old 20484 ms
time new 8360 ms

Around 60% faster now.

Code is also reasonably understandable.

@xezon xezon changed the title [CORE] Reduce cost of XferCRC::xferImplementation by 55% [CORE] Reduce cost of XferCRC::xferImplementation by 60% Jul 19, 2025
@xezon xezon merged commit 3eda4ba into TheSuperHackers:main Jul 19, 2025
15 checks passed
@xezon xezon deleted the refactor-xfercrc branch July 19, 2025 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gen Relates to Generals Minor Severity: Minor < Major < Critical < Blocker Performance Is a performance concern ZH Relates to Zero Hour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants