Skip to content

New fps limiter mode: Accurate (sleep-yield) #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 17, 2022

Conversation

anzz1
Copy link
Contributor

@anzz1 anzz1 commented Mar 17, 2022

  • Less resource-intensive way of limiting fps
  • New configuration option FPSLimitMode
    • 1: original realtime-mode (busy-wait thread-lock) [default]
    • 2: new accurate-mode (sleep-yield)

Technical details:

The current way of timing the fps limiter using a thread-locking busy-wait loop, albeit super-accurate, is very CPU-intensive.

As it calls QueryPerformanceCounter and does the calculations as many times as it possibly can in the given time-window between frames, it leads to a situation where, rather counterintuitively, it uses more resources
-the lighter the game is
-the more powerful your CPU is
-the lower your target framerate is
i.e. when the delta between the set fps limit and your cpu's ability to churn out frames is larger.

It essentially locks the CPU usage to 100% while waiting until the next frame can be served. The faster your CPU is, the more QPC calculations it does and it will use 100% of the available processing power on a single thread regardless of the CPU. It might not be as noticeable on CPU's with hyperthreading as it (in oversimplified terms) locks the core at "only" 50% instead of 100%.

Here is my (not very scientific) research, using Core Temp for profiling. Its' minimum polling interval is 100ms so it's not super-accurate and there are better tools but it serves this demonstration purpose just fine.

The game I used for testing was THPS4.
FPS Limit was set to 60 fps.
My CPU has 4 cores and 4 threads (no hyperthreading).

As you can see in the all-core "realtime" image, the cpu which is being locked to 100% is jumping from one to another, presumably because of thread context switching. CPU stays at it's maximum power level and consumes 48W , which is 25W over the baseline of 23W. Not too bad, but not great either.

The issue becomes more pronounced when locking the process affinity to a single core (and thread). C0 stays at 100% all the time and power usage jumps to a whopping 187W.

Using the "accurate" aka sleep-yield method, CPU stays at a lower power level even when locking affinity to C0. Power usage is 32W which is much better. Using all cores, power (and CPU) usage drops even lower, hovering at 29W, which is only 6W over the baseline.

RT_1Core   RT_Allcore

SLP_1Core   SLP_Allcore

On a desktop processor with adequate cooling this might not matter so much. Where this definitely does matter though, is when gaming on laptops. First of all, more power equals more heat, and you can quickly run into a issue where the heat output cannot be dissipated and CPU throttling begins, meaning FPS limiter will make your performance worse at that point. Using more power also means that battery will be killed quickly when ran unplugged. This is why I actually even found about this issue, when I tried to figure out why my new-ish laptop was struggling playing an old game and fans were running at 100% all the time, and the underlying cause was the fps limiter (tests above were obviously not done on this aforementioned laptop).

Okay, so it uses (an order of magnitude) less cpu-resources, but how about the accuracy then?

Well, while in theory, the realtime thread-lock is obviously more accurate, in my testing I couldn't even produce a measurable difference in accuracy. Using both methods and a 60fps limit the frametime was constantly 16.67 ms without any variance whatsoever. That would mean it seems to be extremely accurate up to 10μs (0.01ms). Beyond that, as even a single call to QueryPerformanceCounter takes ~1μs and at that level of precision there are so many variables in play that I wouldn't even know how to measure correctly. In any case the accuracy difference seems to be completely negligible.

That being said, I cannot say with 100% certainty that it is and always will be the case so YMMV. My testing was done on the last good version of Windows, 7, with most of the crap removed resulting in an environment where not much is happening in the background. Seeing the current trend of Windows adding an ever more increasing amount of crap to the OS with each iteration, I can't possibly know if using sleep-yield will always be accurate. Frametime variances and general unpredictability of performance has been certainly growing so far in newer versions of Windows. But it will probably be accurate enough anyway.

Why Sleep(1) instead of more?

  1. Even when using timeBeginPeriod(1) to supposedly set timer resolution to 1 millisecond, Sleep is inherently NOT accurate.

    This inaccuracy adds up so it's better to only sleep for ~1ms at a time and measure the actual time yourself with QPC to maintain accuracy even though this comes at a cost of some CPU time.

    This is also why the last Sleep(1) happens only when there is >2ms of waiting left, otherwise the thread just yields its' time-slice using Sleep(0) for those last iterations of the loop.

  2. Sleeping larger amounts of time at once could result the CPU constantly switching between lower and higher power states.

Conclusion:

Using sleep-yield instead of busy-wait for timing the FPS limiter seems to be a better solution overall. It uses way less resources without a measurable penalty in accuracy. I set the original "realtime" busy-wait method as default for the sake of compatibility and not changing things for now, but it could be wise to set it as the default option.

anzz1 added 2 commits March 17, 2022 01:41
* Less resource-intensive way of limiting fps
* New configuration option FPSLimitMode
  - 1: original realtime-mode (busy-wait thread-lock) [default]
  - 2: new accurate-mode (sleep-yield)
Don't use FPS limiter until it's initialized
@ThirteenAG
Copy link
Owner

Oh, very nice, thanks. May I ask you to also pull request these changes to d3d9 wrapper?

@ThirteenAG ThirteenAG merged commit 7664747 into ThirteenAG:master Mar 17, 2022
@anzz1
Copy link
Contributor Author

anzz1 commented Mar 17, 2022

Oh, very nice, thanks. May I ask you to also pull request these changes to d3d9 wrapper?

Sure

@ThirteenAG
Copy link
Owner

@anzz1 actually I just remembered something, I think it was dxwnd that had the option "Do not notify app on alt tab", any idea how it can be implemented for windowed mode?

@anzz1
Copy link
Contributor Author

anzz1 commented Jul 10, 2022

@ThirteenAG

One way to do it is to hook either the message queue or the WndProc callback (hook RegisterClass/RegisterClassEx and replace the WndProc with your own) and toss any appropriate messages which are sent to the window on focus loss. From the top of my head I can remember that at least WM_ACTIVATE (0x06) with the wParam WA_INACTIVE (0x0) needs to be tossed, there are probably others which get sent too, it's easy to find out with a tool like Spy++.

edit:
some example code from some project on how the WndProc could look like

long __stdcall WndProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    if(uMsg == WM_ACTIVATE)
    {
        switch(LOWORD(wParam))
        {
            case WA_INACTIVE:
                SetWindowPos(hWnd, HWND_NOTOPMOST, 0, 0, 0, 0, SWP_NOACTIVATE | SWP_NOMOVE | SWP_NOSIZE);
                break;
            default: // WA_ACTIVE or WA_CLICKACTIVE
                SetWindowPos(hWnd, HWND_TOPMOST, 0, 0, 0, 0, SWP_NOACTIVATE | SWP_NOMOVE | SWP_NOSIZE);
                break;
        }
        // This snippet is from a project for a single particular app, 
        // better probably not to return 0 and completely skip app's own "on active" logic 
        // on a more general solution or at least let it run once before hooking or something
        return 0;
    }

    return ((long (__stdcall *)(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)) (void*)((DWORD)g_hModule + 0x001314A0))(hWnd, uMsg, wParam, lParam);
}

@anzz1
Copy link
Contributor Author

anzz1 commented Jul 10, 2022

Here is my super simple C/C++ compatible iat hooking library which I use a lot and plan to eventually clean up and release but haven't gotten around to it. It covers pretty much every use-case for iat hooking (hook in process/in loaded module, hook by function name / module name&ordinal; except the rare edge case of duplicate iat entries isn't covered yet)

Hooking GetMessage/PeekMessage for the msg queue or RegisterClass/RegisterClassEx for the WndProc inside the process should be sufficient in this case. I would recommend going this route since iat hooking only modifies the process space and pretty much always works, unlike system-wide hooks like the ms detours route which are a lot more finicky for multitude of reasons.

iathook.h

#ifndef __IATHOOK_H
#define __IATHOOK_H

#include <windows.h>
#include <stdint.h>

/*
 * simple iathook, unreleased version
 */
#ifdef __cplusplus
namespace Iat_hook
{
#endif
    void** find_iat_func(const char* function, HMODULE hModule, const char* chModule, const DWORD ordinal)
    {
        if (!hModule)
            hModule = GetModuleHandle(0);

        PIMAGE_DOS_HEADER img_dos_headers = (PIMAGE_DOS_HEADER)hModule;
        PIMAGE_NT_HEADERS img_nt_headers = (PIMAGE_NT_HEADERS)((BYTE*)img_dos_headers + img_dos_headers->e_lfanew);
        PIMAGE_IMPORT_DESCRIPTOR img_import_desc = (PIMAGE_IMPORT_DESCRIPTOR)((BYTE*)img_dos_headers + img_nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress);
        //if (img_dos_headers->e_magic != IMAGE_DOS_SIGNATURE)
            //printf("ERROR: e_magic is not a valid DOS signature\n");

        for (IMAGE_IMPORT_DESCRIPTOR* iid = img_import_desc; iid->Name != 0; iid++) {
            if (chModule != NULL)
            {
                char* mod_name = (char*)((size_t*)(iid->Name + (size_t)hModule));
                if (lstrcmpiA(chModule, mod_name))
                    continue;
            }
            for (int func_idx = 0; *(func_idx + (void**)(iid->FirstThunk + (size_t)hModule)) != NULL; func_idx++) {
                size_t mod_func_ptr_ord = (size_t)(*(func_idx + (size_t*)(iid->OriginalFirstThunk + (size_t)hModule)));
                char* mod_func_name = (char*)(mod_func_ptr_ord + (size_t)hModule + 2);
                const intptr_t nmod_func_name = (intptr_t)mod_func_name;
                if (nmod_func_name >= 0) {
                    //printf("%s %s\n", mod_name, mod_func_name);
                    if (function != NULL && !lstrcmpA(function, mod_func_name))
                        return func_idx + (void**)(iid->FirstThunk + (size_t)hModule);
                }
                else if (IMAGE_SNAP_BY_ORDINAL(mod_func_ptr_ord))
                {
                    //printf("%s @%u\n", mod_name, IMAGE_ORDINAL(mod_func_ptr_ord));
                    if (chModule != NULL && ordinal != 0 && (ordinal == IMAGE_ORDINAL(mod_func_ptr_ord)))
                        return func_idx + (void**)(iid->FirstThunk + (size_t)hModule);
                }
            }
        }
        return 0;
    }

    uintptr_t detour_iat_ptr(const char* function, void* newfunction, HMODULE hModule = NULL, const char* chModule = NULL, const DWORD ordinal = 0)
    {
        void** func_ptr = find_iat_func(function, hModule, chModule, ordinal);
        if (!func_ptr || *func_ptr == newfunction || *func_ptr == NULL)
            return 0;

        DWORD old_rights, new_rights = PAGE_READWRITE;
        VirtualProtect(func_ptr, sizeof(uintptr_t), new_rights, &old_rights);
        uintptr_t ret = (uintptr_t)*func_ptr;
        *func_ptr = newfunction;
        VirtualProtect(func_ptr, sizeof(uintptr_t), old_rights, &new_rights);
        return ret;
    }
#ifdef __cplusplus
};
#endif

#endif //__IATHOOK_H

dllmain.cpp

#include <windows.h>
#include <wininet.h>
#include "iathook.h"

typedef HINTERNET (__stdcall *InternetConnectW_fn)(HINTERNET hInternet, LPCWSTR lpszServerName, INTERNET_PORT nServerPort, LPCWSTR lpszUserName, LPCWSTR lpszPassword, DWORD dwService, DWORD dwFlags, DWORD_PTR dwContext);
InternetConnectW_fn oInternetConnectW;

typedef FARPROC (__stdcall *GetProcAddress_fn)(HMODULE hModule, LPCSTR lpProcName);
GetProcAddress_fn oGetProcAddress;

HINTERNET __stdcall hk_InternetConnectW(HINTERNET hInternet, LPCWSTR lpszServerName, INTERNET_PORT nServerPort, LPCWSTR lpszUserName,
                                        LPCWSTR lpszPassword, DWORD dwService, DWORD dwFlags, DWORD_PTR dwContext)
{
    //MessageBoxW(0, lpszServerName, L"InternetConnectW", MB_OK);
    if (!lstrcmpW(lpszServerName, L"www.example.com"))
    {
        SetLastError(ERROR_INTERNET_NAME_NOT_RESOLVED);
        return 0;
    }

    return oInternetConnectW(hInternet, lpszServerName, nServerPort, lpszUserName, lpszPassword, dwService, dwFlags, dwContext);
}

FARPROC __stdcall hk_GetProcAddress(HMODULE hModule, LPCSTR lpProcName)
{
    //MessageBoxA(0, lpProcName, "GetProcAddress", MB_OK);
        
    if (!lstrcmpA(lpProcName, "InternetConnectW"))
        return (FARPROC)hk_InternetConnectW;

    return oGetProcAddress(hModule, lpProcName);
}

extern "C" BOOL WINAPI DllMain(HINSTANCE hInst,DWORD reason,LPVOID)
{    
    if(reason == DLL_PROCESS_ATTACH)
    {
        DisableThreadLibraryCalls(hInst);

        oInternetConnectW = (InternetConnectW_fn)Iat_hook::detour_iat_ptr("InternetConnectW", (void*)hk_InternetConnectW);
        oGetProcAddress = (GetProcAddress_fn)Iat_hook::detour_iat_ptr("GetProcAddress", (void*)hk_GetProcAddress);

        HMODULE urlmon = GetModuleHandleA("urlmon.dll");
        if (urlmon)
            Iat_hook::detour_iat_ptr("InternetConnectW", (void*)hk_InternetConnectW, urlmon);
    }

    return TRUE;
}

@ThirteenAG
Copy link
Owner

I'd like to add this "do not notify on task switch" option to wrappers cuz I noticed some games crash on alt tab even in windowed mode, and there's no quick way to avoid that. Haven't tried to implement it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants