Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: platform: introduce for Platform_NanoSleep, to be used for better sleeping in between frames for lowering CPU usage #2019

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

a1batross
Copy link
Member

@a1batross a1batross commented Feb 11, 2025

There were reports about high CPU usage with high framerates, especially on the servers.

As it seems, it might've been caused by insufficient accuracy of Platform_Sleep, which only accepts milliseconds. If we imagine game running at 500+ FPS, Platform_Sleep instantly becomes useless.

With this PR, sleeptime cvar resolution changes from milliseconds down to microseconds.

…tter sleeping in between frames for lowering CPU usage
@a1batross
Copy link
Member Author

Not sure if Windows version of nanosleep will even work. SetWaitableTimer wants resolution of 100 ns, which is fine by us.

@SNMetamorph
Copy link
Member

Interesting stuff according to this topic:
https://blog.bearcats.nl/perfect-sleep-function/

@a1batross
Copy link
Member Author

On my Linux server, that expectedly helped me to reduce CPU load from 100% to 20-ish% with sys_ticrate set to 1000.

@SNMetamorph
Copy link
Member

On my Linux server, that expectedly helped me to reduce CPU load from 100% to 20-ish% with sys_ticrate set to 1000.

Pretty good result. This is very important in case when hosting more than 1 server on the VPS/VDS.

@a1batross
Copy link
Member Author

On my Linux server, that expectedly helped me to reduce CPU load from 100% to 20-ish% with sys_ticrate set to 1000.

Pretty good result. This is very important in case when hosting more than 1 server on the VPS/VDS.

What should we do on Windows though, is there a reliable way to at least have a microsecond timer? That will also help decrease load on the CPU.

@SNMetamorph
Copy link
Member

On my Linux server, that expectedly helped me to reduce CPU load from 100% to 20-ish% with sys_ticrate set to 1000.

Pretty good result. This is very important in case when hosting more than 1 server on the VPS/VDS.

What should we do on Windows though, is there a reliable way to at least have a microsecond timer? That will also help decrease load on the CPU.

CreateWaitableTimerEx with CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag, provides accuracy something like 1-5 nanoseconds according to benchmarks. But it's main downside that it's available only since Windows 10, version 1803 (cheers for anybody who still uses Windows XP in 2025). So we need to somehow check Windows version in runtime to decide should we use this flag or shouldn't.

@a1batross
Copy link
Member Author

On my Linux server, that expectedly helped me to reduce CPU load from 100% to 20-ish% with sys_ticrate set to 1000.

Pretty good result. This is very important in case when hosting more than 1 server on the VPS/VDS.

What should we do on Windows though, is there a reliable way to at least have a microsecond timer? That will also help decrease load on the CPU.

CreateWaitableTimerEx with CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag, provides accuracy something like 1-5 nanoseconds according to benchmarks. But it's main downside that it's available only since Windows 10, version 1803 (cheers for anybody who still uses Windows XP in 2025). So we need to somehow check Windows version in runtime to decide should we use this flag or shouldn't.

Right now, I'm getting CreateWaitableTimerEx with GetProcAddress and passing 3 as flags, which equals to that high resolution flag + auto reset.

I think it should be fine for those who use Windows XP or 7.

@SNMetamorph
Copy link
Member

For Windows versions older that mentioned, we can fallback to timeBeginPeriod. It provides decent accuracy, but their main flaw is that it affects OS scheduler to tick more frequently, and consequently it affects other running programs too. Sometimes this can lead to noticable worse battery usage on laptops.

@a1batross
Copy link
Member Author

@SNMetamorph timeBeginPeriod has millisecond accuracy, and that's useless for what I'm trying to do here.

We need at least microsecond accurate sleep function or, better, nanosecond.

@SNMetamorph
Copy link
Member

I'm not sure about this.

#include <windows.h>
#include <chrono>
#include <math.h>

void timerSleep(double seconds) {
    using namespace std::chrono;

    static HANDLE timer = CreateWaitableTimer(NULL, FALSE, NULL);
    static double estimate = 5e-3;
    static double mean = 5e-3;
    static double m2 = 0;
    static int64_t count = 1;

    while (seconds - estimate > 1e-7) {
        double toWait = seconds - estimate;
        LARGE_INTEGER due;
        due.QuadPart = -int64_t(toWait * 1e7);
        auto start = high_resolution_clock::now();
        SetWaitableTimerEx(timer, &due, 0, NULL, NULL, NULL, 0);
        WaitForSingleObject(timer, INFINITE);
        auto end = high_resolution_clock::now();

        double observed = (end - start).count() / 1e9;
        seconds -= observed;

        ++count;
        double error = observed - toWait;
        double delta = error - mean;
        mean += delta / count;
        m2   += delta * (error - mean);
        double stddev = sqrt(m2 / (count - 1));
        estimate = mean + stddev;
    }

    // spin lock
    auto start = high_resolution_clock::now();
    auto spinNs = int64_t(seconds * 1e9);
    auto delay = nanoseconds(spinNs);
    while (high_resolution_clock::now() - start < delay);
}

изображение

@SNMetamorph
Copy link
Member

SNMetamorph commented Feb 11, 2025

Ah, I'm wrong here. This benchmark uses spin-lock for providing better accuracy, so it isn't relevant here :)

P.S: but why don't we do same trick?

@SNMetamorph
Copy link
Member

SNMetamorph commented Feb 11, 2025

Also, there is some hack with undocumented ntdll functions for Windows 7 and older, that makes possible to increase timeBeginPeriod accuracy even more, up to ~0.5 ms

void SetTimerHighResolution()
{
#if 0
	// undocumented NT API features
	HMODULE hndl = GetModuleHandle("ntdll.dll");
	pNtSetTimerResolution = (pfnNtSetTimerResolution)GetProcAddress(hndl, "NtSetTimerResolution");
	pNtQueryTimerResolution = (pfnNtQueryTimerResolution)GetProcAddress(hndl, "NtQueryTimerResolution");

	ULONG min, max, cur;
	pNtQueryTimerResolution(&min, &max, &cur);
	pNtSetTimerResolution(max, 1, &cur);
#else
	TIMECAPS tc;
	timeGetDevCaps(&tc, sizeof(TIMECAPS));
	timeBeginPeriod(tc.wPeriodMin);
#endif
}

P.S: For some reason, timeBeginPeriod behavior was significantly changed since Windows 10 2004:
https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants