Cover Illustration by atomic_arctic

This research was done using software obtained by myself individually or through open-source projects abiding to all licenses. There is no intention of harming any company’s product.

This post is not meant to be an attack towards any game or anticheat developer, and I am not tied to any game hack publisher or entities.

Everything here is constructed for educational purposes.

I've talked somewhat briefly about my previous adventures being a cheat developer for various competitive games during highschool. I've learned alot through those endeavors and i've been applying the skills i've learned there at my work ever since. But since i've gotten out of the game, anticheat systems have grown considerably more aggressive and the backlash against them is getting more intense.

On one side, developers say that kernel-level anticheat drivers are no more invasive than antivirus/EDR software. But consumers have also take notice about how its kinda absurd that to enjoy a game, they have to install something akin to spyware in their devices. While i do agree with some points lounged by both sides, i think there is a conversation to be had about the need for these drivers.

On one hand, cheating software has moved forward from the days of scrappy kids in UnknownCheats to a massive industry, with many making six figure incomes from building cheating software. But there are also alot of cases where anticheat drivers have been used for offensive purposes through LOLBins.

In this blogpost we're going to see a few things, mainly comparing Kernel-Level Anticheats to EDR/AV solutions (which is a comparison many anticheat firms make), what type of telemetry they monitor, what type of protection methods they have, and what are the alternative solutions to kernel mode anticheats.

This article would not be possible without :

ItsGamerDoc reflection on anticheats and their public perception
Amazing research from the following individuals
- Writeup about Valorant's Guarded Memory Regions by @xyrem256
- Writeup about Anti Cheat Extreme (ACE)'s Screenshotting Capabilities by @koyzdev
ac open source anticheat project on GitHub by donnaskiez
Alot of writeups about Vanguard implementations on Valorant and League of Legends by Riot Games
The UnknownCheats forum, lmfao

Comparing EDRs to Anticheats

To be honest this entire article was made due to a post by an article by ItsGamerDoc on X (formerly Twitter) about anticheats and their public perception in the gaming community. He works as a Senior Anticheat Analyst at Riot Games and i've been following his work for sometime, and this article of mine is in no way an attack on him or his employers.

The article details how the concern for kernel level anticheats are overblown, and how it does similar things as like an anvirus (which i more commonly refer to as Endpoint Detection and Response (EDR), an enterprise and more agg version of antiviruses). At first, the comparison between EDR and anticheat systems make sense. They're both programs to detect malicious activity against certain processes in the system, and to achieve this they both do seemingly similar things :

Signature-based detection of known threats (usually anticheats will halt the execution of software like IDA Pro or x64dbg)
Detection and prevention of mapped drivers and DLLs (similar to the prevention of LOLBins)
Monitoring certain system binaries and processes through behavioral detection to spot malicious activity
Obfuscation of certain processes to make sure tampering is harder

But there is a very fundamental difference between EDRs and anticheats that many vendors like to skip over, the adversarial nature of the relationship between the user and the software. In EDR/AVs, the users are working with the security vendor to prevent a security breach and the attackers are usually external actors, but in anticheat software the user is the attacker.

To effectively monitor and intervene in the activities of potential cheaters, anticheats often operate at the kernel level. This allows them to intercept system calls, monitor kernel objects, and prevent tampering with user-mode processes. Kernel-level drivers can provide a higher privilege level, enabling more robust protection mechanisms such as NMI (Non-Maskable Interrupt) stack walking and direct hardware access for integrity checks.

Continuous verification of the integrity of the game's executable and memory space is critical. Anticheats employ techniques such as cyclic redundancy checks (CRC), hash-based verifications, and periodic integrity checks on both static and dynamic code sections. This ensures that any unauthorized modifications or injections are detected and thwarted promptly.

Similar to EDRs, anticheat systems also rely on behavioral analysis but with a more aggressive stance. This includes monitoring for patterns indicative of cheating, such as abnormal input rates, suspicious memory modifications, and unauthorized API calls. But this has also extended into more, aggressive and spyware like behavior.

Unlike EDR systems that may prioritize logging and post-event analysis, anticheat solutions require real-time intervention capabilities. This means detecting and responding to cheating attempts as they occur, often resulting in the immediate suspension or banning of the offending player. Techniques such as immediate process termination, user session invalidation, and real-time communication with game servers for coordinated responses are employed.

Fundamentally, the adversarial relationship between the user and anticheat systems changes the entire dynamic of how the software operates and what measures it employs. In the case of EDR systems, the security model is built on trust. The assumption is that the endpoint user is cooperating with the security measures, and the focus is on detecting, analyzing, and responding to threats that typically come from external sources. In contrast, anticheat systems operate in an inherently hostile environment. The anticheat software must assume that the user (or a subset of users) will actively attempt to circumvent its measures.

This article is meant as introducing people to the concept of kernel-level anticheats, how it works, and what does it do. While i talk alot about the anticheats like MiHoYo Protect (mhyprot.sys) and Riot Games Vanguard (vgk.sys), the discussion here about the telemetry and protection methods are more of a combination of alot of different methods i've found tinkering with different anticheat programs, public documentation, and also code from the ac open source anticheat by donnaskiez under the AGPL-3.0.

Environment Verification and Fingerprinting

Anticheats must be able to validate the environment they are running on to make sure that they are running inside of a trusted environment. This is not only to do things such as enforcing game bans, but also to detect if they are being run in a virtual machine, debugger, or sandbox, which are common tools used by cheat developers to analyze and bypass anti-cheat protections.

This includes the fingerprinting of the hardware that they are running on, detecting malicious PCI devices, and also detecting virtualized environments. The latter reason is why games like Valorant are unable to be run through translation layers like Crossover (Mac) or Wine (Linux).

Hardware Fingerprinting Through TPM

Extraction of hardware identifiers involves identifying and collecting unique information from the hardware components of a computer system to ensure the integrity and authenticity of the system running the software. This is crucial for anti-cheat mechanisms as it helps in uniquely identifying a machine, preventing users from easily evading bans or other restrictions by simply reinstalling the software or changing user accounts.

To extract hardware identifiers, the kernel-level driver interacts directly with the hardware or low-level system APIs. For instance, the driver might query the system's BIOS, CPU, motherboard, network interfaces, and other components to gather unique identifiers such as serial numbers or MAC addresses.

One common approach is to use the CPUID instruction to obtain the CPU's serial number and other characteristics. This can be achieved using inline assembly in C:

void get_cpu_id(char* cpu_id) {
    int cpu_info[4] = { 0 };
    __cpuid(cpu_info, 0);
    sprintf(cpu_id, "%08X%08X%08X%08X", cpu_info[0], cpu_info[1], cpu_info[2], cpu_info[3]);
}

In the code, the __cpuid intrinsic is used to execute the CPUID instruction, which fills the cpu_info array with the CPU's identification information. This information is then formatted into a string that represents the CPU's unique ID.

But there are more aggressive anticheats like Riot Games' Vanguard Anticheat, which requires TPM (Trusted Platform Module) to extract unique hardware-related information. TPM is used by apps to securely create and store cryptographic keys, and to confirm that the operating system and firmware on your device are what they're supposed to be, and haven't been tampered with.

Anti-cheat systems use hardware identifiers to ensure that the machine's hardware and firmware have not been tampered with. By regularly verifying these identifiers against known values, the system can detect if any unauthorized hardware changes have been made, which might indicate an attempt to compromise the game's security.

Some cheats also operate by modifying hardware-level interactions, such as manipulating memory or utilizing custom drivers to alter game behavior. By monitoring hardware identifiers, the anti-cheat system can detect unusual changes or unauthorized access at the hardware level, which is often a strong indicator of cheating.

We first need to ensure the current platform can support TPM operations by checking the CPU type. If the TPM hardware is present, we can go ahead and determine the type of TPM interface. We can read the TPM CRB (Command Response Buffer) interface identifier and FIFO interface capability from physical memory to identify the specific type of TPM interface.

STATIC NTSTATUS TpmGetPtpInterfaceType(_In_ PVOID Register, _Out_ TPM2_PTP_INTERFACE_TYPE* InterfaceType) {
    NTSTATUS                      status     = STATUS_UNSUCCESSFUL;
    PTP_CRB_INTERFACE_IDENTIFIER  identifier = {0};
    PTP_FIFO_INTERFACE_CAPABILITY capability = {0};

    *InterfaceType = 0;

    status = MapAndReadPhysical(
        (UINT64)(&((PTP_CRB_REGISTERS*)Register)->InterfaceId),
        sizeof(PTP_CRB_INTERFACE_IDENTIFIER),
        &identifier,
        sizeof(PTP_CRB_INTERFACE_IDENTIFIER));

    if (!NT_SUCCESS(status)) {
        DEBUG_ERROR("MapAndReadPhysical: %x", status);
        return status;
    }

    status = MapAndReadPhysical(
        (UINT64) & ((PTP_FIFO_REGISTERS*)Register)->InterfaceCapability,
        sizeof(PTP_FIFO_INTERFACE_CAPABILITY),
        &capability,
        sizeof(PTP_FIFO_INTERFACE_CAPABILITY));

    if (!NT_SUCCESS(status)) {
        DEBUG_ERROR("MapAndReadPhysical: %x", status);
        return status;
    }

    *InterfaceType = TpmExtractInterfaceTypeFromCapabilityAndId(&identifier, &capability);

    return status;
}

We can then determines the TPM interface type and retrieves the TPM endorsement key, which serves as a unique hardware identifier.

NTSTATUS TpmExtractEndorsementKey() {
    NTSTATUS                status   = STATUS_UNSUCCESSFUL;
    BOOLEAN                 presence = FALSE;
    TPM2_PTP_INTERFACE_TYPE type     = {0};

    if (!TpmIsPlatformSupported())
        return STATUS_NOT_SUPPORTED;

    status = TpmCheckPtpRegisterPresence(TPM20_INTEL_BASE_PHYSICAL, &presence);

    if (!NT_SUCCESS(status)) {
        DEBUG_ERROR("TpmCheckPtpRegisterPresence: %x", status);
        return status;
    }

    if (!presence) {
        DEBUG_INFO("TPM2.0 PTP Presence not detected.");
        return STATUS_UNSUCCESSFUL;
    }

    status = TpmGetPtpInterfaceType(TPM20_INTEL_BASE_PHYSICAL, &type);

    if (!NT_SUCCESS(status)) {
        DEBUG_ERROR("TpmGetPtpInterfaceType: %x", status);
        return status;
    }

    DEBUG_INFO("TPM2.0 PTP Interface Type: %x", (UINT32)type);
    return status;
}

Once the TPM endorsement key is retrieved, the anti-cheat system can use this key to create a unique profile for the device, ensuring that each device can be reliably tracked. This helps in identifying and tracking users across different gaming sessions, even if they change their network identities or reinstall the game.

The endorsement key can be used as part of a larger integrity check process. By combining the endorsement key with other hardware and software identifiers, the anti-cheat system can create a comprehensive profile of the system. This profile can be checked against known good states to detect any unauthorized changes or tampering, ensuring that the system has not been compromised.

We can do this by reading the Platform Configuration Register (PCR) value from the TPM (Trusted Platform Module) to detect if the device has been tampered with. This involves several steps: ensuring buffer size, initializing a TBS (TPM Base Services) context, preparing and sending a TPM command, checking the TPM response, extracting the PCR value, and cleaning up resources.

The function starts by ensuring that the provided buffer is large enough to hold the PCR value, which is typically 32 bytes for SHA-256 hashed values.

NTSTATUS ReadPcrValue(UINT32 pcrIndex, BYTE* pcrValue, UINT32 pcrValueSize) {
    if (pcrValueSize < PCR_VALUE_SIZE) {
        return STATUS_BUFFER_TOO_SMALL;
    }

Next, a TBS context is initialized. The TBS context facilitates communication with the TPM hardware. The TBS_CONTEXT_PARAMS2 structure is configured to specify the use of TPM 2.0, as detailed in a Microsoft documentation about the issue.

   TBS_HCONTEXT hContext = NULL;
    TBS_RESULT result;
    NTSTATUS status = STATUS_UNSUCCESSFUL;

    // Initialize the TBS context
    TBS_CONTEXT_PARAMS2 contextParams;
    contextParams.version = TBS_CONTEXT_VERSION_TWO;
    contextParams.asUINT32 = 0;
    contextParams.includeTpm12 = 0;
    contextParams.includeTpm20 = 1;

    result = Tbsi_Context_Create((PCTBS_CONTEXT_PARAMS)&contextParams, &hContext);
    if (result != TBS_SUCCESS) {
        DEBUG_ERROR("Tbsi_Context_Create failed with result: %x", result);
        return STATUS_UNSUCCESSFUL;
    }

The function then prepares the TPM command buffer. This buffer holds the TPM command to read the PCR value. The command is structured according to the TPM 2.0 specification, beginning with a command header containing the command code for TPM2_PCR_Read.

// Prepare TPM command buffer
    BYTE commandBuffer[1024] = { 0 };
    UINT32 commandSize = sizeof(commandBuffer);
    BYTE responseBuffer[1024] = { 0 };
    UINT32 responseSize = sizeof(responseBuffer);

    // TPM2_PCR_Read command
    TPM2_COMMAND_HEADER* commandHeader = (TPM2_COMMAND_HEADER*)commandBuffer;
    commandHeader->tag = htons(TPM_ST_NO_SESSIONS);
    commandHeader->commandCode = htonl(TPM_CC_PCR_Read);
    commandHeader->commandSize = htonl(22);

    TPM2_PCR_SELECTION* pcrSelection = (TPM2_PCR_SELECTION*)(commandBuffer + sizeof(TPM2_COMMAND_HEADER));
    pcrSelection->hash = htons(TPM_ALG_SHA256);
    pcrSelection->sizeOfSelect = 3;
    memset(pcrSelection->pcrSelect, 0, 3);
    pcrSelection->pcrSelect[pcrIndex / 8] = (1 << (pcrIndex % 8));

The command is then sent to the TPM using the Tbsip_Submit_Command function. This function handles the low-level communication with the TPM, sending the command buffer and receiving the response buffer.

    // Send the TPM command
    result = Tbsip_Submit_Command(hContext, TBS_COMMAND_LOCALITY_ZERO, TBS_COMMAND_PRIORITY_NORMAL, commandBuffer, commandSize, responseBuffer, &responseSize);
    if (result != TBS_SUCCESS) {
        DEBUG_ERROR("Tbsip_Submit_Command failed with result: %x", result);
        Tbsip_Context_Close(hContext);
        return STATUS_UNSUCCESSFUL;
    }

Upon receiving the response, the function checks the TPM response header to ensure that the command was processed successfully. The response header contains the status of the TPM command.

    // Check the TPM response
    TPM2_RESPONSE_HEADER* responseHeader = (TPM2_RESPONSE_HEADER*)responseBuffer;
    if (ntohs(responseHeader->tag) != TPM_ST_NO_SESSIONS || ntohl(responseHeader->responseCode) != TPM_RC_SUCCESS) {
        DEBUG_ERROR("TPM command failed with response code: %x", ntohl(responseHeader->responseCode));
        Tbsip_Context_Close(hContext);
        return STATUS_UNSUCCESSFUL;
    }

If the TPM command was successful, the function extracts the PCR value from the response buffer. The PCR values are located after the standard response header and other response data. The function copies the PCR value into the provided buffer.

    // Extract the PCR value
    BYTE* pcrValues = responseBuffer + sizeof(TPM2_RESPONSE_HEADER) + 10; // Skipping the rest of the PCR read response structure
    memcpy(pcrValue, pcrValues, PCR_VALUE_SIZE);

With this, the anticheat can detect any unauthorized changes to the system's firmware, bootloader, or other critical components, thereby ensuring the integrity of the device.

EPT Hook Detection for Hypervisor Fingerprinting

EPT (Extended Page Tables) hook detection is a technique used to identify hidden hypervisors or virtualization-based rootkits that manipulate memory access through EPT. This feature, part of Intel VT-x technology, allows a hypervisor to control guest physical address translation, enabling efficient virtualization but also providing a potential vector for malicious activity.

EPT hooks can monitor and modify memory accesses stealthily, making traditional anti-cheat mechanisms ineffective. To counter this, EPT hook detection involves measuring read latencies to identify anomalies indicative of EPT manipulation.

First, we can retrieve and store the addresses of both control functions and protected functions. Control functions serve as a baseline for normal read times, while protected functions are commonly targeted by EPT hooks.

STATIC
NTSTATUS
InitiateEptFunctionAddressArrays()
{
    PAGED_CODE();

    UNICODE_STRING current_function;

    for (INT index = 0; index < EPT_CONTROL_FUNCTIONS_COUNT; index++) {
        ImpRtlInitUnicodeString(&current_function, CONTROL_FUNCTIONS[index]);
        CONTROL_FUNCTION_ADDRESSES[index] =
            ImpMmGetSystemRoutineAddress(&current_function);

        if (!CONTROL_FUNCTION_ADDRESSES[index])
            return STATUS_UNSUCCESSFUL;
    }

    for (INT index = 0; index < EPT_PROTECTED_FUNCTIONS_COUNT; index++) {
        ImpRtlInitUnicodeString(&current_function, PROTECTED_FUNCTIONS[index]);
        PROTECTED_FUNCTION_ADDRESSES[index] =
            ImpMmGetSystemRoutineAddress(&current_function);

        if (!PROTECTED_FUNCTION_ADDRESSES[index])
            return STATUS_UNSUCCESSFUL;
    }

    return STATUS_SUCCESS;
}

The average read times of control functions are measured to establish a baseline. This is done by reading the function addresses multiple times and calculating the average time taken for these reads.

STATIC
UINT64
MeasureReads(_In_ PVOID Address, _In_ ULONG Count)
{
    UINT64 read_average = 0;
    KIRQL  irql         = {0};

    MeasureInstructionRead(Address);

    KeRaiseIrql(HIGH_LEVEL, &irql);
    _disable();

    for (ULONG iteration = 0; iteration < Count; iteration++)
        read_average += MeasureInstructionRead(Address);

    _enable();
    KeLowerIrql(irql);

    return read_average / Count;
}

STATIC
NTSTATUS
GetAverageReadTimeAtRoutine(_In_ PVOID    RoutineAddress,
                            _Out_ PUINT64 AverageTime)
{
    if (!RoutineAddress || !AverageTime)
        return STATUS_UNSUCCESSFUL;

    if (!MmIsAddressValid(RoutineAddress))
        return STATUS_INVALID_ADDRESS;

    *AverageTime = MeasureReads(RoutineAddress, EPT_CHECK_NUM_ITERATIONS);

    return *AverageTime == 0 ? STATUS_UNSUCCESSFUL : STATUS_SUCCESS;
}

The read times of protected functions are measured in the same way as control functions. These times are then compared to the baseline read times of the control functions.

NTSTATUS
DetectEptHooksInKeyFunctions()
{
    PAGED_CODE();

    NTSTATUS status           = STATUS_UNSUCCESSFUL;
    UINT32   control_fails    = 0;
    UINT64   instruction_time = 0;
    UINT64   control_time_sum = 0;
    UINT64   control_average  = 0;

    status = InitiateEptFunctionAddressArrays();

    if (!NT_SUCCESS(status)) {
        return status;
    }

    for (INT index = 0; index < EPT_CONTROL_FUNCTIONS_COUNT; index++) {
        status = GetAverageReadTimeAtRoutine(CONTROL_FUNCTION_ADDRESSES[index],
                                             &instruction_time);

        if (!NT_SUCCESS(status)) {
            control_fails += 1;
            continue;
        }

        control_time_sum += instruction_time;
    }

    if (control_time_sum == 0)
        return STATUS_UNSUCCESSFUL;

    control_average =
        control_time_sum / (EPT_CONTROL_FUNCTIONS_COUNT - control_fails);

    if (control_average == 0)
        return STATUS_UNSUCCESSFUL;

    for (INT index = 0; index < EPT_PROTECTED_FUNCTIONS_COUNT; index++) {
        status = GetAverageReadTimeAtRoutine(
            PROTECTED_FUNCTION_ADDRESSES[index], &instruction_time);

        if (!NT_SUCCESS(status)) {
            continue;
        }

        if (control_average * EPT_EXECUTION_TIME_MULTIPLIER <
            instruction_time) {
            DEBUG_WARNING(
                "EPT hook detected at function: %llx with execution time of: %llx",
                PROTECTED_FUNCTION_ADDRESSES[index],
                instruction_time);
        }
    }

    return status;
}

If the read time for a protected function significantly exceeds the baseline, it indicates that an EPT hook is present, which is a strong indicator of a hypervisor.

if (control_average * EPT_EXECUTION_TIME_MULTIPLIER < instruction_time) {
    DEBUG_WARNING(
        "EPT hook detected at function: %llx with execution time of: %llx",
        PROTECTED_FUNCTION_ADDRESSES[index],
        instruction_time);
}

By measuring and comparing the read times, the detection mechanism can identify the additional latency introduced by EPT hooks. Since EPT hooks are typically used by hypervisors, detecting these hooks can effectively indicate the presence of a hypervisor.

Malicious PCI Device Detection

Detecting malicious PCI devices is crucial in the context of kernel-level anti-cheats for several reasons. PCI devices operate at a low level within the computer's architecture, providing direct access to the system's memory and hardware, which allows malicious devices to execute arbitrary code and manipulate system operations in ways that are difficult to detect and counteract from higher levels of the operating system. A malicious PCI device can intercept, alter, or inject malicious code directly into the system memory.

Anticheat systems usually perform malicious PCI device detection by scanning the configuration space of PCI devices. Every PCI device has a set of registers commonly referred to as the PCI configuration space. In modern PCI-e devices, an extended configuration space is implemented, which is mapped into the main memory, allowing the system to read/write to the registers. The configuration space consists of a standard header containing information such as the DeviceID, VendorID, Status, and other details.

This configuration space allows querying important information from PCI devices within the device tree using the IRP_MN_READ_CONFIG code, which reads from a PCI device's configuration space.

We can first start by enumerating all PCI device objects in the system.

NTSTATUS
ValidatePciDevices()
{
    NTSTATUS status = STATUS_UNSUCCESSFUL;

    status = EnumeratePciDeviceObjects(PciDeviceQueryCallback, NULL);

    if (!NT_SUCCESS(status))
        DEBUG_ERROR("EnumeratePciDeviceObjects failed with status %x", status);

    return status;
}

Windows splits DEVICE_OBJECTS into two categories: Physical Device Object (PDO) and Functional Device Object (FDO). A PDO represents each device connected to a physical bus, with an associated DEVICE_NODE, while an FDO represents the functionality of the device, defining how the system interacts with the device objects. A device stack can have multiple PDOs but only one FDO. To access each PCI device on the system, the anti-cheat system can enumerate all device objects given the PCI FDO, which is managed by pci.sys.

We first retrieve the driver object associated with the PCI driver (pci.sys). It then enumerates all device objects managed by this driver, storing them in an array. For each device object, it checks if the object is a valid Physical Device Object (PDO) by calling the IsDeviceObjectValidPdo function. If it is a valid PDO, the callback routine (PciDeviceQueryCallback) is invoked.

NTSTATUS
EnumeratePciDeviceObjects(_In_ PCI_DEVICE_CALLBACK CallbackRoutine,
                          _In_opt_ PVOID           Context)
{
    NTSTATUS        status             = STATUS_UNSUCCESSFUL;
    UNICODE_STRING  pci                = RTL_CONSTANT_STRING(L"\\Driver\\pci");
    PDRIVER_OBJECT  pci_driver_object  = NULL;
    PDEVICE_OBJECT* pci_device_objects = NULL;
    PDEVICE_OBJECT  current_device     = NULL;
    UINT32          pci_device_objects_count = 0;

    status = GetDriverObjectByDriverName(&pci, &pci_driver_object);

    if (!NT_SUCCESS(status)) {
        DEBUG_ERROR("GetDriverObjectByDriverName failed with status %x",
                    status);
        return status;
    }

    status = EnumerateDriverObjectDeviceObjects(
        pci_driver_object, &pci_device_objects, &pci_device_objects_count);

    if (!NT_SUCCESS(status)) {
        DEBUG_ERROR("EnumerateDriverObjectDeviceObjects failed with status %x",
                    status);
        return status;
    }

    for (UINT32 index = 0; index < pci_device_objects_count; index++) {
        current_device = pci_device_objects[index];

        /* make sure we have a valid PDO */
        if (!IsDeviceObjectValidPdo(current_device)) {
            ObDereferenceObject(current_device);
            continue;
        }

        status = CallbackRoutine(current_device, Context);

        if (!NT_SUCCESS(status))
            DEBUG_ERROR(
                "EnumeratePciDeviceObjects CallbackRoutine failed with status %x",
                status);

        ObDereferenceObject(current_device);
    }

    if (pci_device_objects)
        ExFreePoolWithTag(pci_device_objects, POOL_TAG_HW);

    return status;
}

Then we read the device's configuration space, starting from the PCI_VENDOR_ID_OFFSET, and stores this data in a PCI_COMMON_HEADER structure. The configuration space consists of a standard header containing information such as the DeviceID, VendorID, Status, and other details. The function reads this space using an IRP with the IRP_MN_READ_CONFIG code.

STATIC
NTSTATUS
PciDeviceQueryCallback(_In_ PDEVICE_OBJECT DeviceObject, _In_opt_ PVOID Context)
{
    UNREFERENCED_PARAMETER(Context);

    NTSTATUS          status = STATUS_UNSUCCESSFUL;
    PCI_COMMON_HEADER header = {0};

    status = QueryPciDeviceConfigurationSpace(
        DeviceObject, PCI_VENDOR_ID_OFFSET, &header, sizeof(PCI_COMMON_HEADER));

    if (!NT_SUCCESS(status)) {
        DEBUG_ERROR("QueryPciDeviceConfigurationSpace failed with status %x",
                    status);
        return status;
    }

    if (IsPciConfigurationSpaceFlagged(&header)) {
        DEBUG_VERBOSE("Flagged DeviceID found. Device: %llx, DeviceId: %lx",
                      (UINT64)DeviceObject,
                      header.DeviceID);
        ReportBlacklistedPcieDevice(DeviceObject, &header);
    }
    else {
        DEBUG_VERBOSE("Device: %llx, DeviceID: %lx, VendorID: %lx",
                      DeviceObject,
                      header.DeviceID,
                      header.VendorID);
    }

    return status;
}

Then we can send an IRP (I/O Request Packet) to read the configuration space of the PCI device. We then wait for the IRP to complete and then returns the status of the operation. The configuration space contains important registers such as the DeviceID, VendorID, Status, Command, and others, which are crucial for identifying the device.

STATIC
NTSTATUS
QueryPciDeviceConfigurationSpace(_In_ PDEVICE_OBJECT DeviceObject,
                                 _In_ UINT32         Offset,
                                 _Out_opt_ PVOID     Buffer,
                                 _In_ UINT32         BufferLength)
{
    NTSTATUS           status = STATUS_UNSUCCESSFUL;
    KEVENT             event  = {0};
    IO_STATUS_BLOCK    io     = {0};
    PIRP               irp    = NULL;
    PIO_STACK_LOCATION packet = NULL;

    if (BufferLength == 0)
        return STATUS_BUFFER_TOO_SMALL;

    KeInitializeEvent(&event, NotificationEvent, FALSE);

    /*
     * we dont need to free this IRP as the IO manager will free it when the
     * request is completed
     */
    irp = IoBuildSynchronousFsdRequest(
        IRP_MJ_PNP, DeviceObject, NULL, 0, NULL, &event, &io);

    if (!irp) {
        DEBUG_ERROR("IoBuildSynchronousFsdRequest failed with no status.");
        return STATUS_INSUFFICIENT_RESOURCES;
    }

    packet                = IoGetNextIrpStackLocation(irp);
    packet->MinorFunction = IRP_MN_READ_CONFIG;
    packet->Parameters.ReadWriteConfig.WhichSpace = PCI_WHICHSPACE_CONFIG;
    packet->Parameters.ReadWriteConfig.Offset     = Offset;
    packet->Parameters.ReadWriteConfig.Buffer     = Buffer;
    packet->Parameters.ReadWriteConfig.Length     = BufferLength;

    status = IoCallDriver(DeviceObject, irp);

    if (status == STATUS_PENDING) {
        KeWaitForSingleObject(&event, Executive, KernelMode, FALSE, NULL);
        status = io.Status;
    }

    if (!NT_SUCCESS(status))
        DEBUG_ERROR("Failed to read configuration space with status %x",
                    status);

    return status;
}

Once the configuration space is read, we can check if the device ID is among the flagged IDs. If the device ID matches any of the flagged IDs, we can report the blacklisted device.

BOOLEAN
IsPciConfigurationSpaceFlagged(_In_ PPCI_COMMON_HEADER Configuration)
{
    for (UINT32 index = 0; index < FLAGGED_DEVICE_ID_COUNT; index++) {
        if (Configuration->DeviceID == FLAGGED_DEVICE_IDS[index])
            return TRUE;
    }

    return FALSE;
}

Game Binary and Driver Protection

One of the primary concern for both EDR and anticheat developers is the protection of its binary and processes. While EDRs sometimes fall back on protections given to them through the MVI program, anticheat providers need to think of more creative solutions on how to ward off static and dynamic analysis.

Usually anticheats don't operate alone, but alongside an antitamper solution. They also sometimes have really good ASCII art games, like this one from Packman, an antitamper for Vanguard Anticheat.

But Byfron's Hyperion, the antitamper for EasyAntiCheat in Roblox, is definitely winning brownie points from me for (to my knowledge) the first implementation of REpsych on production software.

Byfron Hyperion

But what are the protections these antitamper systems give?

Binary Packing

Binary packing is the technique of encrypting executable files and binaries to obscure their content, making it harder to detect or analyze them statically. Ideally in games, you only authorized processes can access and modify the game assets, thereby protecting the game's integrity.

But encrypting/decrypting game assets and binaries can have severe impacts to performance as some DRM providers would like you to not believe. While there are many packers on the market today, they are often very performance heavy and offer little control to improve performance in graphics-heavy applications.

For the aforementioned packman, Riot Games somewhat explain how it works here. The code below is not an exact replica of the solution, but more of what i think a solution looks like. Mind you this solution is based on incomplete information from their blogpost and likely doesn't work anymore.

The encryption process starts by initializing an initial structure, which holds the cipher state for each decryption event. Then we initialize this key using a randomly generated seed value.

struct GKey {
    uint8_t key[0x100];
    uint8_t count;
    uint8_t hold;
};

void SpawnKey(GKey* gk, const uint8_t* seed, size_t len) {
    for (int i = 0; i < 0x100; i++) {
        gk->key[i] = i;
    }
    uint8_t h = 0;
    for (int i = 0; i < 0x100; i++) {
        uint8_t j = gk->key[i];
        h += seed[i % len] + j;
        gk->key[i] = gk->key[h];
        gk->key[h] = j;
    }
}

The encryption routine uses an initialized key to encrypt the data and the same function is used for both encryption and decryption due to the symmetry of the XOR operation. Each byte of the input data is encrypted by modifying the count and hold counters and performing a series of swaps and XOR operations.

void Encrypt(GKey* gk, const void* in, void* out, size_t len) {
    uint8_t t1, t2;
    uint8_t j;
    for (uint32_t i = 0; i < len; i++) {
        gk->count++;
        j = gk->count;
        gk->hold += gk->key[j];
        t1 = gk->key[j];
        t2 = gk->key[gk->hold];
        gk->key[j] = t2;
        gk->key[gk->hold] = t1;
        t1 += t2;
        ((uint8_t*)out)[i] = ((uint8_t*)in)[i] ^ gk->key[t1];
    }
}

When the game client is executed, the .text section of the executable file needs to be decrypted. The decryption routine involves re-initializing the key and decrypting the data in stages, starting with the primary seed.

void Decrypt(GKey* gk, const void* in, void* out, size_t len) {
    uint8_t t1, t2;
    uint8_t j;
    for (uint32_t i = 0; i < len; i++) {
        gk->count++;
        j = gk->count;
        gk->hold += gk->key[j];
        t1 = gk->key[j];
        t2 = gk->key[gk->hold];
        gk->key[j] = t2;
        gk->key[gk->hold] = t1;
        t1 += t2;
        ((uint8_t*)out)[i] = ((uint8_t*)in)[i] ^ gk->key[t1];
    }
}

Riot Games overcame the limitations of traditional stub code injection by using an external library for unpacking. This method allows for validating game dependencies before they are loaded, ensuring the integrity of the game’s libraries. The process involves modifying the game’s import descriptors to list only their custom library, which loads first and validates other dependencies.

// Pointers to the 'real' Import Table and array of name lengths
IMAGE_IMPORT_DESCRIPTOR* import_descriptor_ptr = (IMAGE_IMPORT_DESCRIPTOR*)(league + 0x13D4B10);
uint32_t* import_name_len_ptr = (uint32_t*)(stub + 0xBF5C8);

// Decrypt and validate the imports
for (int i = 0; i < 0x13; i++) {
    Decrypt(&gk, import_descriptor_ptr, import_descriptor_ptr, 0x14);
    size_t len = *import_name_len_ptr;
    uint8_t* name_ptr = league + import_descriptor_ptr->name_rva;
    Decrypt(&gk, name_ptr, name_ptr, len);
    // Validate and load libraries
    import_descriptor_ptr++;
    import_name_len_ptr++;
}

The .text section is decrypted in pages, allowing for non-sequential decryption. Each 4096-byte page is decrypted independently using a unique key derived from the primary seed, ensuring the security of the game code during execution.

uint32_t num_pages = ltext_len / 0x1000;
for (uint32_t i = 1; i <= num_pages; i++) {
    memset(&gk, 0, sizeof(GKey));
    uint8_t* seed = decrypt2_seed + ((i % 0x53) * decrypt2_seed_len);
    uint8_t* text = league + (i * 0x1000);
    SpawnKey(&gk, seed, decrypt2_seed_len);
    Decrypt(&gk, text, text, 0x1000);
}

Anti-Debugging

While static analysis can be thwarted easily, the use of dynamic analysis tools present a more complicated challenge. Anti-debugging techniques in anti-cheat systems are designed to detect and counteract these tools.

Windows itself has built in protections against debuggers such as the IsDebuggerPresent and CheckRemoteDebuggerPresent function, which checks the PEB (Process Environment Block) of the calling process. The PEB contains a flag named BeingDebugged, which is set to 1 if a debugger is attached. The problem with these solutions is that they are easily circumvented by patching the flags. This can be done directly sometimes if you use OllyDbg or x32/64dbg as a debugger, with plugins such as ScyllaHide.

This is why many packers protect binaries from debuggers using some more interesting methods, one of them being the use of INT 3. The INT 3 instruction is a single-byte opcode (0xCC) designed to signal a breakpoint. When the CPU encounters this instruction, it generates an EXCEPTION_BREAKPOINT, which is a specific type of interrupt that transfers control to an exception handler. In the context of Windows, the exception handler is part of the system's structured exception handling (SEH) mechanism.

bool g_bDebugged = false;

int filter(unsigned int code, struct _EXCEPTION_POINTERS *ep)
{
    g_bDebugged = code != EXCEPTION_BREAKPOINT;
    return EXCEPTION_EXECUTE_HANDLER;
}

bool IsDebugged()
{
    __try
    {
        __asm __emit(0xCD);
        __asm __emit(0x03);
    }
    __except (filter(GetExceptionCode(), GetExceptionInformation()))
    {
        return g_bDebugged;
    }
}

When the EXCEPTION_BREAKPOINT occurs, Windows adjusts the Instruction Pointer (EIP) to point to the address of the 0xCC opcode. This adjustment is crucial for the debugger to handle the breakpoint correctly. The EIP is decremented by one to point to the 0xCC instruction, allowing the debugger to recognize and process the breakpoint instruction.

When a process is being traced in a debugger, the inherent delays introduced between instructions and their execution can be significant. Detecting these delays can also help in identifying the presence of a debugger. For this purpose, we can use the RDPMC instruction to read the performance monitoring counters of the processor.

These counters keep track of various events such as the number of instructions executed, cache misses, and more. The usage of RDPMC requires the PCE (Performance-Monitoring Counter Enable) flag to be set in the CR4 register, which typically limits its usage to kernel mode.

bool IsDebugged(DWORD64 qwNativeElapsed)
{
    ULARGE_INTEGER Start, End;
    __asm
    {
        xor  ecx, ecx    // Select performance counter 0
        rdpmc            // Read performance counter
        mov  Start.LowPart, eax
        mov  Start.HighPart, edx
    }

    // ... some work ...

    __asm
    {
        xor  ecx, ecx    // Select performance counter 0
        rdpmc            // Read performance counter
        mov  End.LowPart, eax
        mov  End.HighPart, edx
    }

    return (End.QuadPart - Start.QuadPart) > qwNativeElapsed;
}

RDPMC is used to read the performance counter before and after executing some work. By comparing the difference with a predefined threshold, we can detect if a debugger or VM is slowing down execution.

Telemetry & Defenses in Anticheats

Telemetry is important to detect certain behaviors that is closely linked to cheating. This is where the approach of detection differs to EDRs which mainly line of telemetry is through OS-provided event streams like Microsoft Threat Intelligence Drivers (ETW-TI) which is locked behind the Microsoft MVI program. Anticheats do not have access to these event streams. EDRs are also protected using things like Early Launch Anti Malware (ELAM) drivers and Process Protection Light (PPL), which are also locked behind the MVI program.

Attached Thread Detection

Attached thread detection is a technique used to identify and monitor threads that are injected into a process. This process is crucial for anti-cheat systems, as it allows the detection of unauthorized threads that might be used to read or modify game memory, disrupt normal operations, or execute arbitrary code within the game's process space.

In the Windows operating system, threads can be created within a process using various methods, including CreateRemoteThread, NtCreateThreadEx, or through DLL injection. By monitoring these threads, one can identify and potentially prevent malicious activities.

To start, the driver sets up a notification routine for thread creation using the PsSetCreateThreadNotifyRoutine API. This routine gets called whenever a new thread is created in the system.

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath)
{
    NTSTATUS status;
    DriverObject->DriverUnload = DriverUnload;

    status = PsSetCreateThreadNotifyRoutine(ThreadCreateNotifyRoutine);
    if (!NT_SUCCESS(status))
    {
        DbgPrint("Failed to set thread creation notify routine\n");
        return status;
    }

    DbgPrint("Driver loaded successfully\n");
    return STATUS_SUCCESS;
}

PsSetCreateThreadNotifyRoutine registers ThreadCreateNotifyRoutine as the callback function that will be invoked whenever a thread is created. The DriverUnload function ensures that this callback is properly removed when the driver is unloaded.

When a new thread is created, the ThreadCreateNotifyRoutine function is called. This function retrieves the thread object using PsLookupThreadByThreadId and then validates the thread's context.

VOID ThreadCreateNotifyRoutine(HANDLE ProcessId, HANDLE ThreadId, BOOLEAN Create)
{
    if (Create)
    {
        PETHREAD Thread;
        NTSTATUS status = PsLookupThreadByThreadId(ThreadId, &Thread);
        if (NT_SUCCESS(status))
        {
            if (!ValidateThreadContext(Thread))
            {
                DbgPrint("Unauthorized thread detected in process %d\n", ProcessId);
            }
            ObDereferenceObject(Thread);
        }
    }
}

In the ThreadCreateNotifyRoutine, when a thread is created (Create is TRUE), the function looks up the thread object using PsLookupThreadByThreadId. Once the thread object is retrieved, it is passed to the ValidateThreadContext function to determine if the thread is legitimate.

The ValidateThreadContext function performs a basic check to see if the thread's starting address falls within a known valid range for the process. Real-world implementations would involve more complex checks, but this is just a lazy example.

BOOLEAN ValidateThreadContext(PETHREAD Thread)
{
    PVOID StartAddress = PsGetThreadStartAddress(Thread);
    PEPROCESS Process = IoThreadToProcess(Thread);
    PVOID BaseAddress = PsGetProcessSectionBaseAddress(Process);

    if ((ULONG_PTR)StartAddress >= (ULONG_PTR)BaseAddress &&
        (ULONG_PTR)StartAddress < (ULONG_PTR)BaseAddress + 0x1000000) // 16 MB range
    {
        return TRUE;
    }

    return FALSE;
}

ValidateThreadContext retrieves the thread's start address using PsGetThreadStartAddress and compares it with the base address of the process obtained through PsGetProcessSectionBaseAddress. If the start address falls within a 16 MB range of the base address, the thread is considered valid. Otherwise, it is flagged as potentially unauthorized.

This approach allows the detection of threads that do not originate from the expected code regions within the process, which is a common characteristic of threads injected by malicious actors. By integrating this detection mechanism into a kernel-mode driver, anti-cheat systems can effectively monitor and respond to unauthorized thread creation, thereby enhancing the security and integrity of the game.

DPC/APC Stackwalking

Stack walking via Asynchronous Procedure Calls (APC) and Deferred Procedure Calls (DPC) is an essential technique for identifying potentially malicious activities. Both APC and DPC are mechanisms that execute code asynchronously in the context of a particular thread, but they operate differently and serve different purposes within the Windows operating system.

APC stack walking via RtlCaptureStackBackTrace is used to capture the call stack of a thread at a specific point in time when an APC is executed. APCs are designed to allow user-mode applications and kernel-mode drivers to execute code in the context of a specific thread. They are commonly used for asynchronous I/O operations and other delayed execution tasks.

To implement APC stack walking, the driver can set an APC to be executed in the context of a thread and then use RtlCaptureStackBackTrace to capture the call stack. This allows the driver to examine the sequence of function calls that led to the execution of the APC and identify any suspicious or unauthorized code execution paths.

VOID APCFunction(KAPC *Apc, PKNORMAL_ROUTINE *NormalRoutine, PVOID *NormalContext,
                 PVOID *SystemArgument1, PVOID *SystemArgument2)
{
    UNREFERENCED_PARAMETER(Apc);
    UNREFERENCED_PARAMETER(NormalRoutine);
    UNREFERENCED_PARAMETER(NormalContext);
    UNREFERENCED_PARAMETER(SystemArgument1);
    UNREFERENCED_PARAMETER(SystemArgument2);

    ULONG framesToCapture = 10;
    PVOID stackBackTrace[10];
    ULONG capturedFrames = RtlCaptureStackBackTrace(0, framesToCapture, stackBackTrace, NULL);

    for (ULONG i = 0; i < capturedFrames; i++)
    {
        DbgPrint("APC Stack Frame[%d]: %p\n", i, stackBackTrace[i]);
    }
}

VOID SetAPC(PETHREAD Thread)
{
    PKAPC apc = (PKAPC)ExAllocatePool(NonPagedPool, sizeof(KAPC));
    if (apc)
    {
        KeInitializeApc(apc, Thread, OriginalApcEnvironment, APCFunction, NULL, NULL, KernelMode, NULL);
        KeInsertQueueApc(apc, NULL, NULL, 0);
    }
}

In this example, APCFunction is the APC routine that captures the stack trace using RtlCaptureStackBackTrace and prints the captured stack frames. The SetAPC function initializes and inserts the APC into the queue of the specified thread.

DPC stack walking via RtlCaptureStackBackTrace operates similarly but is used for Deferred Procedure Calls. DPCs are designed to handle high-priority tasks that need to be executed promptly but at a lower priority than interrupt service routines (ISRs). They are commonly used for deferred processing of I/O operations and other time-sensitive tasks that do not require immediate execution in the context of an interrupt.

To capture the stack trace during a DPC execution, the driver can set up a DPC and use RtlCaptureStackBackTrace in the DPC routine to examine the call stack. This allows the driver to analyze the sequence of function calls that led to the DPC execution and detect any anomalies.

VOID DPCFunction(KDPC *Dpc, PVOID DeferredContext, PVOID SystemArgument1, PVOID SystemArgument2)
{
    UNREFERENCED_PARAMETER(Dpc);
    UNREFERENCED_PARAMETER(DeferredContext);
    UNREFERENCED_PARAMETER(SystemArgument1);
    UNREFERENCED_PARAMETER(SystemArgument2);

    ULONG framesToCapture = 10;
    PVOID stackBackTrace[10];
    ULONG capturedFrames = RtlCaptureStackBackTrace(0, framesToCapture, stackBackTrace, NULL);

    for (ULONG i = 0; i < capturedFrames; i++)
    {
        DbgPrint("DPC Stack Frame[%d]: %p\n", i, stackBackTrace[i]);
    }
}

VOID ScheduleDPC()
{
    PKDPC dpc = (PKDPC)ExAllocatePool(NonPagedPool, sizeof(KDPC));
    if (dpc)
    {
        KeInitializeDpc(dpc, DPCFunction, NULL);
        KeInsertQueueDpc(dpc, NULL, NULL);
    }
}

In this example, DPCFunction is the DPC routine that captures the stack trace using RtlCaptureStackBackTrace and prints the captured stack frames. The ScheduleDPC function initializes and inserts the DPC into the system DPC queue.

Comparing the two approaches, APC stack walking is performed in the context of a specific thread, which allows for a more granular inspection of thread-specific execution paths. This is particularly useful for detecting malicious code execution within user-mode threads or within the context of specific kernel-mode threads. On the other hand, DPC stack walking is performed in the context of system-wide deferred procedure calls, which are generally executed at a higher priority than normal thread execution. This makes DPC stack walking more suitable for detecting anomalies in high-priority, time-sensitive operations, such as those related to interrupt handling or critical I/O processing. Both approaches leverage RtlCaptureStackBackTrace to capture the call stack and provide valuable insights into the execution paths leading to APC or DPC execution.

NMI Stackwalking

NMI (Non-Maskable Interrupt) stackwalking via ISR (Interrupt Service Routine) IRETQ involves a sequence of operations to validate the integrity of the system by inspecting the call stack during an NMI. This is achieved by handling NMIs and capturing the call stack through the interrupt service routine, ensuring no unauthorized modifications have been made to critical sections of the kernel.

When an NMI occurs, the HandleNmiIOCTL function is invoked to handle the interrupt. This function is responsible for setting up the necessary environment to capture the stack trace. The ISR for the NMI is designed to save the processor state, including the instruction pointer, stack pointer, and other critical registers, to ensure a reliable context for stackwalking.

The core function involved in NMI stackwalking dispatches a kernel APC (Asynchronous Procedure Call) to each CPU. This APC is used to walk the stack and capture the instruction pointers at each frame. The captured stack frames are then validated against known good regions of the code to detect any anomalies.

HandleNmiIOCTL()
{
    PAGED_CODE();

    NTSTATUS       status  = STATUS_UNSUCCESSFUL;
    PVOID          handle  = NULL;
    SYSTEM_MODULES modules = {0};
    PNMI_CONTEXT   context = NULL;

    UINT32 size = ImpKeQueryActiveProcessorCount(0) * sizeof(NMI_CONTEXT);

    if (IsNmiInProgress())
        return STATUS_ALREADY_COMMITTED;

    status = ValidateHalDispatchTables();

The HandleNmiIOCTL function prepares the system to handle an NMI, ensuring that all necessary resources are allocated and the environment is correctly configured. The actual stackwalking is performed by dispatching an APC to each CPU. The APC callback function captures the stack frames and validates them.

NTSTATUS
DispatchStackwalkToEachCpuViaDpc()
{
    NTSTATUS       status  = STATUS_UNSUCCESSFUL;
    PDPC_CONTEXT   context = NULL;
    SYSTEM_MODULES modules = {0};
    UINT32 size = ImpKeQueryActiveProcessorCount(0) * sizeof(DPC_CONTEXT);
    context = ImpExAllocatePool2(POOL_FLAG_NON_PAGED, size, POOL_TAG_DPC);
    if (!context)
        return STATUS_MEMORY_NOT_ALLOCATED;
    status = GetSystemModuleInformation(&modules);
    if (!NT_SUCCESS(status)) {
        DEBUG_ERROR("GetSystemModuleInformation failed with status %x", status);
        goto end;
    }
    ImpKeGenericCallDpc(DpcStackwalkCallbackRoutine, context);
    while (!CheckForDpcCompletion(context))
        YieldProcessor();
    ValidateDpcCapturedStack(&modules, context);
    DEBUG_VERBOSE("Finished validating cores via dpc");
end:
    if (modules.address)
        ImpExFreePoolWithTag(modules.address, SYSTEM_MODULES_POOL);
    if (context)
        ImpExFreePoolWithTag(context, POOL_TAG_DPC);
    return status;
}

In the context of an NMI, the ISR captures the processor state and obtaind the stack trace. This is typically done within the ISR or the APC callback. We can retrieve the interrupted instruction pointer (RIP) and stack pointer (RSP), ensuring that even at high interrupt levels, critical information can be captured without relying on potentially unsafe functions. We can then access the specific NMI context for the current processor core from the Context array. Several variables are initialized, including kpcr for the kernel processor control region, tss for the task state segment, and machine_frame for the interrupted machine state.

STATIC BOOLEAN
NmiCallback(_Inout_opt_ PVOID Context, _In_ BOOLEAN Handled)
{
    UNREFERENCED_PARAMETER(Handled);

    ULONG                  core          = KeGetCurrentProcessorNumber();
    PNMI_CONTEXT           context       = &((PNMI_CONTEXT)Context)[core];
    UINT64                 kpcr          = 0;
    TASK_STATE_SEGMENT_64* tss           = NULL;
    PMACHINE_FRAME         machine_frame = NULL;

    if (!ARGUMENT_PRESENT(Context))
        return TRUE;
    kpcr          = __readmsr(IA32_GS_BASE);
    tss           = GetTaskStateSegment(kpcr);
    machine_frame = GetIsrMachineFrame(tss);

To locate the IRETQ frame, which contains the interrupted instruction pointer (RIP), the function must find the top of the NMI ISR stack. This stack top is stored in the TSS (Task State Segment) at TSS->Ist[3]. The TSS itself can be obtained from the KPCR->TSS_BASE. After obtaining the TSS, the function reads the value at TSS->Ist[3], which points to the top of the ISR stack, and then subtracts the size of the MACHINE_FRAME structure. This allows the function to read the interrupted RIP.

Using the __readmsr function, the kpcr is retrieved, which is the base address of the KPCR. The GetTaskStateSegment function is then called to obtain the TSS from the kpcr. Finally, the GetIsrMachineFrame function retrieves the machine frame from the TSS. We can then check if the interrupted RIP belongs to user mode using IsUserModeAddress. If it does, it sets the user_thread flag in the context to TRUE.

    if (IsUserModeAddress(machine_frame->rip))
        context->user_thread = TRUE;

    context->interrupted_rip = machine_frame->rip;
    context->interrupted_rsp = machine_frame->rsp;
    context->kthread         = PsGetCurrentThread();
    context->callback_count++;

    DEBUG_VERBOSE(
        "[NMI CALLBACK]: Core Number: %lx, Interrupted RIP: %llx, Interrupted RSP: %llx",
        core,
        machine_frame->rip,
        machine_frame->rsp);

    return TRUE;
}

The interrupted RIP and RSP are stored in the context. The current thread is obtained using PsGetCurrentThread and stored in the context's kthread field. The callback_count is incremented to keep track of the number of times this callback has been invoked. To determine the validity of an instruction pointer, we can check whether the captured instruction pointer (RIP) falls within an invalid region, indicating potential tampering.

BOOLEAN
IsInstructionPointerInInvalidRegion(_In_ UINT64          RIP,
                                    _In_ PSYSTEM_MODULES SystemModules)
{
    PAGED_CODE();

    PRTL_MODULE_EXTENDED_INFO modules =
        (PRTL_MODULE_EXTENDED_INFO)SystemModules->address;

    for (INT index = 0; index < SystemModules->module_count; index++) {
        UINT64 base = (UINT64)modules[index].ImageBase;
        UINT64 end  = base + modules[index].ImageSize;

        if (RIP >= base && RIP <= end) {
            return FALSE;
        }
    }

    return TRUE;
}

Memory Section Integrity Checks

Cheaters often attempt to modify game code or system modules and we can also detect these attempts by making sure that the executable code in memory matches the original, untampered code.

Once we have obtained the information of the modules, we can store it to its executable sections into a buffer for further analysis. We can do this by iterating through the module's sections, identifying and copying the executable sections into a buffer.

NTSTATUS StoreModuleExecutableRegionsInBuffer(_Out_ PVOID* Buffer,
                                              _In_ PVOID ModuleBase,
                                              _In_ SIZE_T ModuleSize,
                                              _Out_ PSIZE_T BytesWritten,
                                              _In_ BOOLEAN IsModulex86) {
    if (!ModuleBase || !ModuleSize)
        return STATUS_INVALID_PARAMETER;
    if (!IsModuleAddressSafe(ModuleBase, IsModulex86))
        return STATUS_UNSUCCESSFUL;

    *BytesWritten = 0;
    *Buffer = ImpExAllocatePool2(POOL_FLAG_NON_PAGED,
                                 ModuleSize + sizeof(INTEGRITY_CHECK_HEADER),
                                 POOL_TAG_INTEGRITY);
    if (*Buffer == NULL)
        return STATUS_MEMORY_NOT_ALLOCATED;

    nt_header = PeGetNtHeader(ModuleBase);
    num_sections = GetSectionCount(nt_header);
    section = IMAGE_FIRST_SECTION(nt_header);
    buffer_base = (UINT64)*Buffer + sizeof(INTEGRITY_CHECK_HEADER);

    for (ULONG index = 0; index < num_sections - 1; index++) {
        if (!IsSectionExecutable(section)) {
            section++;
            continue;
        }
        address.VirtualAddress = section;
        status = ImpMmCopyMemory((UINT64)buffer_base + total_packet_size,
                                 address,
                                 sizeof(IMAGE_SECTION_HEADER),
                                 MM_COPY_MEMORY_VIRTUAL,
                                 &bytes_returned);
        if (!NT_SUCCESS(status)) {
            ImpExFreePoolWithTag(*Buffer, POOL_TAG_INTEGRITY);
            *Buffer = NULL;
            return status;
        }
        address.VirtualAddress = (UINT64)ModuleBase + section->PointerToRawData;
        status = ImpMmCopyMemory((UINT64)buffer_base + total_packet_size +
                                     sizeof(IMAGE_SECTION_HEADER),
                                 address,
                                 section->SizeOfRawData,
                                 MM_COPY_MEMORY_VIRTUAL,
                                 &bytes_returned);
        if (!NT_SUCCESS(status)) {
            ImpExFreePoolWithTag(*Buffer, POOL_TAG_INTEGRITY);
            *Buffer = NULL;
            return status;
        }
        total_packet_size += GetSectionTotalPacketSize(section);
        num_executable_sections++;
        section++;
    }
    InitIntegrityCheckHeader(&header, num_executable_sections, total_packet_size);
    RtlCopyMemory(*Buffer, &header, sizeof(INTEGRITY_CHECK_HEADER));
    *BytesWritten = total_packet_size + sizeof(INTEGRITY_CHECK_HEADER);
    return status;
}

We first check check if a section is executable, and then if true we copy the section headers and their content into a buffer. The buffer now contains all the executable sections of the module, which will be used for integrity verification. The next crucial step is to map the disk image of the module into the virtual address space. This allows the integrity checker to access the module's original, unmodified state directly from the disk.

NTSTATUS MapDiskImageIntoVirtualAddressSpace(_Inout_ PHANDLE SectionHandle,
                                             _Out_ PVOID* Section,
                                             _In_ PUNICODE_STRING Path,
                                             _Out_ PSIZE_T Size) {
    HANDLE file_handle = NULL;
    OBJECT_ATTRIBUTES object_attributes = {0};
    UNICODE_STRING path = {0};
    *Section = NULL;
    *Size = 0;
    ImpRtlInitUnicodeString(&path, Path->Buffer);
    InitializeObjectAttributes(&object_attributes, &path, OBJ_KERNEL_HANDLE, NULL, NULL);
    status = ImpZwOpenFile(&file_handle, GENERIC_READ, &object_attributes, &pio_block, NULL, NULL);
    if (!NT_SUCCESS(status)) {
        return status;
    }
    object_attributes.ObjectName = NULL;
    status = ImpZwCreateSection(SectionHandle, SECTION_ALL_ACCESS, &object_attributes, NULL, PAGE_READONLY, SEC_IMAGE, file_handle);
    if (!NT_SUCCESS(status)) {
        ImpZwClose(file_handle);
        *SectionHandle = NULL;
        return status;
    }
    status = ImpZwMapViewOfSection(*SectionHandle, ZwCurrentProcess(), Section, NULL, NULL, NULL, Size, ViewUnmap, MEM_TOP_DOWN, PAGE_READONLY);
    if (!NT_SUCCESS(status)) {
        ImpZwClose(file_handle);
        ImpZwClose(*SectionHandle);
        *SectionHandle = NULL;
        return status;
    }
    ImpZwClose(file_handle);
    return status;
}

After mapping the disk image, we can recall the storing function but this time with the disk image as the source. This ensures that we have a buffer containing the executable sections from the disk image, which can be directly compared to the in-memory buffer.

NTSTATUS ComputeHashOfSections(_In_ PIMAGE_SECTION_HEADER DiskSection,
                               _In_ PIMAGE_SECTION_HEADER MemorySection,
                               _Out_ PVOID* DiskHash,
                               _Out_ PULONG DiskHashSize,
                               _Out_ PVOID* MemoryHash,
                               _Out_ PULONG MemoryHashSize) {
    if (DiskSection->SizeOfRawData != MemorySection->SizeOfRawData) {
        return STATUS_INVALID_BUFFER_SIZE;
    }
    status = ComputeHashOfBuffer((UINT64)DiskSection + sizeof(IMAGE_SECTION_HEADER),
                                 DiskSection->SizeOfRawData,
                                 DiskHash,
                                 DiskHashSize);
    if (!NT_SUCCESS(status)) {
        return status;
    }
    status = ComputeHashOfBuffer((UINT64)MemorySection + sizeof(IMAGE_SECTION_HEADER),
                                 MemorySection->SizeOfRawData,
                                 MemoryHash,
                                 MemoryHashSize);
    return status;
}

We check if the sizes of the sections match before computing their hashes, then generate the SHA-256 hashes of the section contents. Finally, we can compare the two results. If the hashes do not match, it indicates that the in-memory section has been modified and we can trigger an integrity violation.

FORCEINLINE
STATIC
BOOLEAN
CompareHashes(_In_ PVOID Hash1, _In_ PVOID Hash2, _In_ UINT32 Length) {
    return RtlCompareMemory(Hash1, Hash2, Length) == Length ? TRUE : FALSE;
}

Detection of `PspCidTable` Entry Detection Removal

The PspCidTable (Process Structure CID Table) is a critical data structure in the Windows kernel that maintains mappings of process and thread IDs to their respective structures. By removing or modifying entries in this table, a cheat can effectively hide its own threads and processes from system monitoring tools. This makes it difficult for anticheat systems to detect the presence of the cheat software.

Previously, we captured the state of the interrupted thread and stores the relevant information in the NMI_CONTEXT structure. This includes the kthread pointer, which points to the current thread's kernel structure. We can in turn use this to detecting removed thread PspCidTable entries is crucial for identifying malicious activities that attempt to hide the presence of threads from the operating system.

After capturing the thread context via NMI, the AnalyseNmiData function is used to validate the presence of each thread in the PspCidTable. The function iterates through each core's NMI_CONTEXT and checks if the captured thread is listed in the PspCidTable.

STATIC
NTSTATUS
AnalyseNmiData(_In_ PNMI_CONTEXT NmiContext, _In_ PSYSTEM_MODULES SystemModules)
{
    PAGED_CODE();

    NTSTATUS status = STATUS_UNSUCCESSFUL;

    if (!NmiContext || !SystemModules)
        return STATUS_INVALID_PARAMETER;

    for (INT core = 0; core < ImpKeQueryActiveProcessorCount(0); core++) {
        if (!NmiContext[core].callback_count) {
            ReportNmiBlocking();
            return STATUS_SUCCESS;
        }

        DEBUG_VERBOSE(
            "Analysing Nmi Data for: cpu number: %i callback count: %lx",
            core,
            NmiContext[core].callback_count);

        if (!DoesThreadHaveValidCidEntry(NmiContext[core].kthread)) {
            ReportMissingCidTableEntry(&NmiContext[core]);
        }

        if (NmiContext[core].user_thread)
            continue;

        if (IsInstructionPointerInInvalidRegion(
                NmiContext[core].interrupted_rip, SystemModules))
            ReportInvalidRipFoundDuringNmi(&NmiContext[core]);
    }

    return STATUS_SUCCESS;
}

We can then verify if the thread has a valid entry in the PspCidTable. If the thread is not found in the PspCidTable, it indicates that the thread might have been hidden or unlinked, which is a common technique used by cheat programs to avoid detection.

DirectX Graphics Kernel Monitoring

The gDxgkInterface table is part of the Windows graphics subsystem, specifically used by the DirectX Graphics Kernel (dxgkrnl.sys). By hooking into these interfaces, cheaters can manipulate the graphics rendering pipeline to achieve various forms of cheating, such as:

Wallhacks: Allowing players to see through walls by modifying how graphics are rendered, making certain objects transparent or highlighting players through walls.
Aimbots: Automatically aiming at targets by altering the input handling routines.
ESP (Extrasensory Perception): Displaying additional information on the screen, such as player names, health, and locations.

Monitoring this kernel involves creating a routine validation for Win32kBase_DxgInterface that ensure that the functions within the gDxgkInterface table are legitimate and reside within valid memory regions.

We first start by searching for the win32kbase.sys and dxgkrnl.sys modules.

PRTL_MODULE_EXTENDED_INFO FindModuleByName(_In_ PSYSTEM_MODULES Modules, _In_ PCHAR ModuleName) {
    for (UINT32 index = 0; index < Modules->module_count; index++) {
        PRTL_MODULE_EXTENDED_INFO entry =
            &((PRTL_MODULE_EXTENDED_INFO)(Modules->address))[index];
        if (strstr(entry->FullPathName, ModuleName))
            return entry;
    }

    return NULL;
}

We can then attach to the winlogon process context using KeStackAttachProcess, which allows for safely accessing and manipulating user-mode memory within a kernel-mode context. Within this context, the function locates the gDxgkInterface table in the win32kbase.sys.

KeStackAttachProcess(winlogon, &apc);
dxg_interface = PeFindExportByName(win32kbase->ImageBase, "gDxgkInterface");

if (!dxg_interface) {
    status = STATUS_UNSUCCESSFUL;
    goto detatch;
}

The entries in gDxgkInterface are then iterated over, starting from the fourth entry (the first three entries are housekeeping).

for (UINT32 index = 3; index < WIN32KBASE_DXGKRNL_INTERFACE_FUNC_COUNT + 3; index++) {
    if (!dxg_interface[index])
        continue;

    PVOID entry = FindChainedPointerEnding(dxg_interface[index]);

We then follow the chain of pointers, ensuring each is valid, and returns the final pointer. Each entry is then validated to ensure it resides within the dxgkrnl.sys module's memory region.

PVOID FindChainedPointerEnding(_In_ PVOID* Start) {
    PVOID* current = *Start;
    PVOID  prev    = Start;

    while (IsValidKernelAddress(current)) {
        __try {
            prev    = current;
            current = *current;
        }
        __except (EXCEPTION_EXECUTE_HANDLER) {
            return prev;
        }
    }

    return prev;
}

HAL Dispatch Table Validation

HalDispatch and HalPrivateDispatch are structures in the Windows operating system kernel that contain function pointers to various hardware abstraction layer (HAL) routines. These tables are critical for the operation of the HAL, which abstracts hardware-specific details from the rest of the operating system, providing a consistent interface for hardware interaction. As these structures contain pointers to essential HAL functions that manage hardware resources, cheats can work by hooking into these structures.

For HalDispatch, we can iterates through predefined function pointers, verifying if they reside within valid kernel memory regions.

STATIC VOID ValidateHalDispatchTable(_Out_ PVOID* Routine, _In_ PSYSTEM_MODULES Modules) {
    *Routine = NULL;
    DEBUG_VERBOSE("Validating HalDispatchTable.");

    if (IsInstructionPointerInInvalidRegion(HalQuerySystemInformation, Modules)) {
        *Routine = HalQuerySystemInformation;
        goto end;
    }

    if (IsInstructionPointerInInvalidRegion(HalSetSystemInformation, Modules)) {
        *Routine = HalSetSystemInformation;
        goto end;
    }

    // ...

end:
    return;
}

We can checks if the instruction pointer (HalQuerySystemInformation, HalSetSystemInformation, etc.) is within a valid region of memory. If any pointer is found to be invalid, it sets the Routine pointer to the invalid function and exits. Each function pointer in the HalDispatchTable is validated sequentially.

But for HalPrivateDispatchTable this task is abit more difficult, as its not as well documented as HalDispatch because its reserved for hardware-specific functions that are not exposed through standard HAL interfaces. This table is also slightly more complex because its size varies depending on the Windows version.

STATIC NTSTATUS ValidateHalPrivateDispatchTable(_Out_ PVOID* Routine, _In_ PSYSTEM_MODULES Modules) {
    NTSTATUS status = STATUS_UNSUCCESSFUL;
    PVOID table = NULL;
    UNICODE_STRING string = RTL_CONSTANT_STRING(L"HalPrivateDispatchTable");
    PVOID* base = NULL;
    RTL_OSVERSIONINFOW os_info = {0};
    UINT32 count = 0;

    DEBUG_VERBOSE("Validating HalPrivateDispatchTable.");

    table = ImpMmGetSystemRoutineAddress(&string);

    if (!table) return status;

    status = GetOsVersionInformation(&os_info);

    if (!NT_SUCCESS(status)) {
        DEBUG_ERROR("GetOsVersionInformation failed with status %x", status);
        return status;
    }

    base  = (UINT64)table + sizeof(UINT64);
    count = GetHalPrivateDispatchTableRoutineCount(&os_info);

    ValidateTableDispatchRoutines(base, count, Modules, Routine);
    return status;
}

We can first retrieve the address of the HalPrivateDispatchTable, then determine the number of entries in the table based on the OS version, obtained by calling GetOsVersionInformation. The routine count is computed by GetHalPrivateDispatchTableRoutineCount, which checks the OS build number and returns the appropriate size.

Then we can do the same and iterate through each entry in the HalPrivateDispatchTable, checking if each instruction pointer resides within a valid memory region. If an invalid pointer is found, it sets the Routine pointer to this invalid function.

STATIC VOID ValidateTableDispatchRoutines(_In_ PVOID* Base, _In_ UINT32 Entries, _In_ PSYSTEM_MODULES Modules, _Out_ PVOID* Routine) {
    for (UINT32 index = 0; index < Entries; index++) {
        if (!Base[index]) continue;

        if (IsInstructionPointerInInvalidRegion(Base[index], Modules))
            *Routine = Base[index];
    }
}

Handle Stripping via Object Callbacks

Cheat programs often try to open handles to game processes to read or write memory, inject code, or manipulate the game's execution. By intercepting handle creation and duplication requests through object callbacks, the anti-cheat driver can inspect these requests and deny access if they are deemed unauthorized.

By stripping handles and denying unauthorized access, the anti-cheat driver ensures that the game process and other related processes maintain their integrity. We can start by registering callback routines for process and thread objects.

OB_CALLBACK_REGISTRATION callbackRegistration;
OB_OPERATION_REGISTRATION operationRegistration[1];

RtlZeroMemory(&callbackRegistration, sizeof(OB_CALLBACK_REGISTRATION));
RtlZeroMemory(&operationRegistration, sizeof(OB_OPERATION_REGISTRATION));

operationRegistration[0].ObjectType = PsProcessType;
operationRegistration[0].Operations = OB_OPERATION_HANDLE_CREATE | OB_OPERATION_HANDLE_DUPLICATE;
operationRegistration[0].PreOperation = ObPreOpCallbackRoutine;
operationRegistration[0].PostOperation = ObPostOpCallbackRoutine;

callbackRegistration.Version = OB_FLT_REGISTRATION_VERSION;
callbackRegistration.OperationRegistrationCount = 1;
callbackRegistration.Altitude = L"320000";
callbackRegistration.RegistrationContext = NULL;
callbackRegistration.OperationRegistration = operationRegistration;

NTSTATUS status = ObRegisterCallbacks(&callbackRegistration, &callbackHandle);
if (!NT_SUCCESS(status)) {
    // handle errors
}

Then we need to check if the object type is a process and if the operation is handle creation or duplication. After that we need to inspect the handle attributes to determine if the handle request is unauthorized. If it is, we strip the handle by modifying the desired access rights.

OB_PREOP_CALLBACK_STATUS ObPreOpCallbackRoutine(
    PVOID RegistrationContext,
    POB_PRE_OPERATION_INFORMATION OperationInformation)
{
    PAGED_CODE();

    UNREFERENCED_PARAMETER(RegistrationContext);

    ACCESS_MASK deny_access = SYNCHRONIZE | PROCESS_TERMINATE;

    PEPROCESS process_creator = PsGetCurrentProcess();
    PEPROCESS target_process = (PEPROCESS)OperationInformation->Object;
    HANDLE process_creator_id = ImpPsGetProcessId(process_creator);
    LPCSTR process_creator_name = ImpPsGetProcessImageFileName(process_creator);
    LPCSTR target_process_name = ImpPsGetProcessImageFileName(target_process);

    if (!process_creator_name || !target_process_name)
        return OB_PREOP_SUCCESS;

    // check if the process is whitelisted
    if (IsWhitelistedHandleOpenProcess(process_creator_name) ||
        !strcmp(process_creator_name, target_process_name)) {
        return OB_PREOP_SUCCESS;
    }

    // deny access if the process is not whitelisted
    OperationInformation->Parameters->CreateHandleInformation.DesiredAccess = deny_access;
    OperationInformation->Parameters->DuplicateHandleInformation.DesiredAccess = deny_access;

    return OB_PREOP_SUCCESS;
}

But in targeting these specific processes, we also want to exclude certain processes from being terminated. Common user installed programs like Discord and Steam, or essential Windows services should be whitelisted to avoid system instability.

#define PROCESS_HANDLE_OPEN_WHITELIST_COUNT 3

CHAR PROCESS_HANDLE_OPEN_WHITELIST[PROCESS_HANDLE_OPEN_WHITELIST_COUNT]
                                  [MAX_PROCESS_NAME_LENGTH] = {"Discord.exe",
                                                               "svchost.exe",
                                                               "explorer.exe"};

STATIC
BOOLEAN
IsWhitelistedHandleOpenProcess(_In_ LPCSTR ProcessName)
{
    for (UINT32 index = 0; index < PROCESS_HANDLE_OPEN_WHITELIST_COUNT;
         index++) {
        if (!strcmp(ProcessName, PROCESS_HANDLE_OPEN_WHITELIST[index]))
            return TRUE;
    }

    return FALSE;
}

Screenshot Gathering

This is probably one of the most egregious cases of anticheat overreach and is probably what comes to mind when people talk about kernel level anticheats. The most popular example of this when @w_sted found out Valorant is taking full display screenshots of user devices, but this was subsequently debunked by @daaximus who said screenshotting only occurs on active windows (aka, only the game).

In the context of a kernel-level anti-cheat system, this code is designed to capture screenshots of a user's desktop or a specific window at regular intervals, likely for the purpose of monitoring and ensuring the integrity of the gameplay environment. The code includes functionalities to minimize or hide the capture window under certain conditions, like when the user presses specific keys, which is common in anti-cheat mechanisms to prevent tampering.

Impressively, @daaximus provided the full approximate recreation of the function based on the reverse engineered snippet. The core of the implementation revolves around GDI+ for capturing and saving screenshots. The GDI+ library is initialized at the beginning of the screenshot capture function to facilitate image encoding and saving in PNG format.

GdiplusStartupInput gdipsi;
ULONG_PTR token;
GdiplusStartup(&token, &gdipsi, nullptr);

The get_encoder_clsid function fetches the CLSID of the image encoder for PNG files, which is necessary for saving the captured images in the correct format. This function iterates through the available image encoders and matches the requested format to return the appropriate CLSID.

cppCopy codeint get_encoder_clsid(const WCHAR* format, CLSID* clsid) {
    UINT num_encoders = 0;
    UINT size_encoders = 0;
    ImageCodecInfo* codec_info = nullptr;
    GetImageEncodersSize(&num_encoders, &size_encoders);

    if (size_encoders == 0) return -1;

    codec_info = static_cast<ImageCodecInfo*>(malloc(size_encoders));

    if (codec_info == nullptr) return -1;

    GetImageEncoders(num_encoders, size_encoders, codec_info);

    for (UINT it = 0; it < num_encoders; ++it) {
        if (wcscmp(codec_info[it].MimeType, format) == 0) {
            *clsid = codec_info[it].Clsid;
            free(codec_info);
            return it;
        }
    }
    free(codec_info);
    return -1;
}

The capture_screenshot function takes a file name and an optional window handle (hwnd). It determines the screen dimensions to capture, either the entire virtual screen or the dimensions of the specified window. The function creates a compatible bitmap and device context to store the captured screen content.

void capture_screenshot(const std::wstring& filename, HWND hwnd) {
    const HDC hdc_screen = GetDC(nullptr);
    const HDC hdc_capture = CreateCompatibleDC(hdc_screen);

    int left = GetSystemMetrics(SM_XVIRTUALSCREEN);
    int top = GetSystemMetrics(SM_YVIRTUALSCREEN);
    int width = GetSystemMetrics(SM_CXVIRTUALSCREEN);
    int height = GetSystemMetrics(SM_CYVIRTUALSCREEN);

    if (hwnd != nullptr) {
        RECT window_rect;
        GetWindowRect(hwnd, &window_rect);
        left = window_rect.left;
        top = window_rect.top;
        width = window_rect.right - window_rect.left;
        height = window_rect.bottom - window_rect.top;
    }

    const HBITMAP hbm = CreateCompatibleBitmap(hdc_screen, width, height);
    SelectObject(hdc_capture, hbm);
    BitBlt(hdc_capture, 0, 0, width, height, hdc_screen, left, top, SRCCOPY);

    Bitmap* bitmap = Bitmap::FromHBITMAP(hbm, nullptr);

    CLSID png_clsid;
    get_encoder_clsid(L"image/png", &png_clsid);

    bitmap->Save(filename.c_str(), &png_clsid, nullptr);

    delete bitmap;
    DeleteObject(hbm);
    DeleteDC(hdc_capture);
    ReleaseDC(nullptr, hdc_screen);
    GdiplusShutdown(token);
}

The application includes a custom window procedure (window_proc) to handle various Windows messages. This procedure ensures that the capture window minimizes or hides itself under certain conditions, such as when the user presses the Alt+Tab combination or the Windows key. This behavior prevents the window from interfering with the user's actions and hides the presence of the anti-cheat mechanism.

LRESULT CALLBACK window_proc(HWND hwnd, UINT message, WPARAM wparam, LPARAM lparam) {
    switch (message) {
        case WM_DESTROY:
            PostQuitMessage(0);
            return 0;
        case WM_ACTIVATE:
            if (wparam == WA_INACTIVE) {
                ShowWindow(hwnd, SW_MINIMIZE);
                return 0;
            }
            break;
        case WM_KEYDOWN:
            switch (wparam) {
                case VK_TAB:
                    if ((GetKeyState(VK_MENU) & 0x1) != 0)
                        ShowWindow(hwnd, SW_MINIMIZE);
                    return 0;
                case VK_LWIN:
                case VK_RWIN:
                    ShowWindow(hwnd, SW_MINIMIZE);
                    return 0;
                case VK_ESCAPE:
                    PostQuitMessage(0);
                    return 0;
                default: break;
            }
            break;
        default: break;
    }
    return DefWindowProc(hwnd, message, wparam, lparam);
}

The main functionality of continuously capturing screenshots is managed by the capture_gamers function, which runs in an infinite loop on a separate thread. This function checks if the user has pressed the F1 key to toggle between capturing the specified window (if any) and capturing the entire screen. The function increments a counter to generate unique file names for each screenshot and calls capture_screenshot to perform the capture and saving process. The loop includes a delay (using Sleep(1000)) to capture screenshots at one-second intervals.

void capture_gamers() {
    HWND backup_hwnd = main_window;

    for (auto n = 0;; n++) {
        if (GetAsyncKeyState(VK_F1) & 0x1) {
            if (main_window)
                main_window = nullptr;
            else
                main_window = backup_hwnd;
        }

        std::wstring filename = L"ss_" + std::to_wstring(n) + L".png";
        capture_screenshot(filename, main_window);
        Sleep(1000);
    }
}

The main entry point of the application, WinMain, sets up and registers a window class and creates a window. It then shows and updates the window, and starts the screenshot capture thread by creating a new thread running the capture_gamers function. The application enters a message loop to handle messages sent to the window, ensuring it remains responsive.

int WINAPI WinMain(HINSTANCE instance, HINSTANCE prev_instance, LPSTR cmd_line, int cmd_show) {
    WNDCLASSEX wcex;
    wcex.cbSize = sizeof(WNDCLASSEX);
    wcex.style = CS_HREDRAW | CS_VREDRAW;
    wcex.lpfnWndProc = window_proc;
    wcex.cbClsExtra = 0;
    wcex.cbWndExtra = 0;
    wcex.hInstance = instance;
    wcex.hIcon = LoadIcon(NULL, IDI_APPLICATION);
    wcex.hCursor = LoadCursor(NULL, IDC_ARROW);
    wcex.hbrBackground = HBRUSH(COLOR_WINDOW + 1);
    wcex.lpszMenuName = NULL;
    wcex.lpszClassName = L"OhNoScreenshots";
    wcex.hIconSm = LoadIcon(NULL, IDI_APPLICATION);
    RegisterClassEx(&wcex);

    main_window = CreateWindowEx(
        0,
        L"OhNoScreenshots",
        L"FairFight Never Did This! /s",
        WS_POPUP | WS_VISIBLE,
        0, 0,
        GetSystemMetrics(SM_CXSCREEN),
        GetSystemMetrics(SM_CYSCREEN),
        nullptr, nullptr,
        instance, nullptr);

    ShowWindow(main_window, cmd_show);
    UpdateWindow(main_window);

    CreateThread(nullptr, 0, reinterpret_cast<LPTHREAD_START_ROUTINE>(capture_gamers), nullptr, 0, nullptr);

    MSG msg;
    while (GetMessage(&msg, nullptr, 0, 0)) {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }

    return int(msg.wParam);
}

If the game window is valid (i.e., the hwnd is not null), the application captures only the active window, which is typically the game window. This ensures the anti-cheat mechanism monitors only the game environment. If the user switches away from the game using Alt+Tab, the capture window becomes inactive, and the application stops capturing to avoid recording irrelevant content. If the hwnd is null, the application captures the entire screen, which is not a normal behavior under typical operations but serves as a fallback to ensure continuous monitoring even if the specific window handle becomes invalid.

But there are some other anticheats that use other more aggressive methods, such as @koyzdev's finding about how ACE (Anti Cheat Expert) works. ACE itself is used in the game Arena Breakout : Infinite made by Morefun Studio which is a direct subsidiary of Tencent Games.

The particular issue comes from ACE's user-mode component, ACE-Safe.dll, for its screenshot-taking capabilities. The function in ACE-Safe.dll begins by checking the Windows version with the line.

if (GetVersion() >= 0x80000000 || (result = check_window_station(), result <= 0))

This code determines if the operating system is Windows NT, 2000, or XP. If the current system does not meet these criteria, it checks the Window Station name. If the Window Station name does not contain "Service-0x", the function proceeds. This is a preliminary check to ensure compatibility and execution context. Next, the function creates a device context for the primary display.

hdcSrc = CreateDCW(L"DISPLAY", 0, 0, 0);
CompatDC = CreateCompatibleDC(hdcSrc);

CreateDCW creates a device context handle for the display, while CreateCompatibleDC creates a compatible memory device context. These device contexts are essential for capturing the screen content. Interestingly, the function lacks thorough error checking, which might lead to unexpected behavior in certain scenarios. The function then retrieves the display's horizontal and vertical resolutions.

HorizontalRes = GetDeviceCaps(hdcSrc, HORZRES);
VerticalRes = GetDeviceCaps(hdcSrc, VERTRES);

These values are used to determine the dimensions of the screenshot. Subsequently, a compatible bitmap is created with these dimensions, although the height is fixed at 16 pixels. The use of a height of 16 pixels is unusual and indicates that the screenshot will be captured in segments rather than as a whole.

CompatibleBitmap = CreateCompatibleBitmap(hdcSrc, HorizontalRes, 16);
v8 = VerticalRes - 16;
for (y1 = 0; y1 < v8; y1 += 16)

Here, v8 is the vertical resolution minus 16, and a loop iterates over the screen height in 16-pixel increments. This method is reminiscent of old CRT raster scanning, capturing the entire screen width but only 16 pixels in height per iteration. This segmentation could potentially be used for low FPS streaming as well.

The actual screenshot capture occurs with the BitBlt function. BitBlt transfers pixel data from the source device context to the compatible memory device context, capturing a 16-pixel high segment of the screen.

BitBlt(CompatDC, 0, 0, HorizontalRes, 16, hdcSrc, 0, y1, 0xCC0020);
GetBitmapBits(CompatibleBitmap, v6, v7);

GetBitmapBits then retrieves the bitmap's bits, storing them in v7. This raw pixel data is processed further, although the exact processing steps involve obfuscated functions likely related to OpenSSL for encryption, as hinted by the sub-functions like sub_180002D80. After capturing and processing the screenshot, the function cleans up the resources.

v11 = SelectObject(CompatDC, v5);
DeleteObject(v11);
DeleteDC(CompatDC);
return DeleteDC(DCW);

The cleanup ensures that device contexts and objects are properly released, preventing resource leaks.

All of this suggests that ACE has the capability to capture and potentially transmit comprehensive screenshots, including sensitive information unrelated to the game. This functionality could be triggered under specific conditions, such as when cheating is detected or reported, which is way more aggressive than what Vanguard does.

Conclusions

This is definitely a non-exhaustive list of techniques being used, i have definitely glossed over alot of other protection methods like Vanguard's guarded memory regions which are too complex for a simple subsection to explain. But this should provide you guys with a generic overview on how most of these systems work.

To be fully honest, even though i do not trust Riot Games (partly owned by Tencent, a Chinese entity) or MiHoYo (owned by MiHoYo, a Chinese entity) if they were to do something such as stealing personal user information it would've probably already been found out by now due to the amount of people trying to crack apart mhyprot and Vanguard. The same applies to other cheat systems like BattlEye, EAC, VAC, etc.

Are the fear for kernel level anticheats overblown? Perhaps, but one must consider that from all of the techniques we discussed above, kernel-level anticheats have the capability to :

Have nearly limitless access to your device and data, including connected devices and data in-memory
Conduct aggressive integrity checks can lead to false positives, causing legitimate processes to crash or behave unpredictably
Harms attempts to play games in alternative platforms such as in Linux through Proton/Wine
Interfere with the operation of legitimate security software that relies on accessing certain core Windows systems, potentially reducing the effectiveness of these tools

Anticheats also get away with hooking things that usually will trigger EDR alerts, which give some interesting EDR bypass capabilities.

For me personally, these are not tradeoffs that i'm willing to give in order to play games on the internet. I've also blacklisted common anticheat processes and binaries in major EDR platforms.

But i've never been fond of competitive or gacha games anyways, with me usually enjoying more singleplayer-oriented games. However, the tolerance and preferences of other people might be different, and i do think kernel-level anticheats are important to ensure fair play inside competitive games that even have professional leagues with million dollar prize pools or have accounts that can sell up to six figures.

This article was only meant as introducing people to the concept of kernel-level anticheats, which are sometimes surrounded with mystery due to their sensitive nature and the heavy amount of obfuscation that developers put into these systems. Most of you guys probably already made up your mind about this issue, and if you haven't i hope this article helped you form an informed opinion.

Understanding Kernel-Level Anticheats in Online Games

Comparing EDRs to Anticheats

Environment Verification and Fingerprinting

Hardware Fingerprinting Through TPM

EPT Hook Detection for Hypervisor Fingerprinting

Malicious PCI Device Detection

Game Binary and Driver Protection

Binary Packing

Anti-Debugging

Telemetry & Defenses in Anticheats

Attached Thread Detection

DPC/APC Stackwalking

NMI Stackwalking

Memory Section Integrity Checks

Detection of `PspCidTable` Entry Detection Removal

DirectX Graphics Kernel Monitoring

HAL Dispatch Table Validation

Handle Stripping via Object Callbacks

Screenshot Gathering

Conclusions

More from this blog

Forensics on Network Appliances

Messing Around with GPUs Again

Deepseek's Low Level Hardware Magic

The Elusive Apple Matrix Coprocessor (AMX)

Behind Chrome-Based DLP Plugins

Command Palette

Comparing EDRs to Anticheats

Environment Verification and Fingerprinting

Hardware Fingerprinting Through TPM

EPT Hook Detection for Hypervisor Fingerprinting

Malicious PCI Device Detection

Game Binary and Driver Protection

Binary Packing

Anti-Debugging

Telemetry & Defenses in Anticheats

Attached Thread Detection

DPC/APC Stackwalking

NMI Stackwalking

Memory Section Integrity Checks

Detection of PspCidTable Entry Detection Removal

DirectX Graphics Kernel Monitoring

HAL Dispatch Table Validation

Handle Stripping via Object Callbacks

Screenshot Gathering

Conclusions

More from this blog

Detection of `PspCidTable` Entry Detection Removal