Control-Flow Enforcement on Windows With CFG and Intel CET

During the Skylake era, Intel introduced Memory Protection Extensions (MPX) for preventing memory errors and attacks through checking pointer references who has their compile time normal intensions maliciously exploited at runtime due to a buffer overflow. MPX was designed to give legacy C/C++ programs the ability to do bound checking transparently without significant impact to performance, making sure that no program can put more data in a buffer than it can hold.

But in practice, it introduced alot of performance penalties compared to software-based solutions and was ineffective against certain types of attacks. MPX was deprecated in Intel hardware releases after 2019.

But that doesn’t mean Intel gave up on the idea, as a year after they introduced Control-flow Enforcement Technology (CET) in Tiger Lake. The solution was meant as a Hardware-assisted stop gap, which needs a piece of software to facilitate its implementation. In Windows, this is achieved by integrating CET functions into Control Flow Guard (CFG).

Arbitrary Code Execution Strategy.jpg

One of the components used is through shadow stacks, which prevents exploits by detecting tampering of return addresses and throwing an exception in order to crash the application. This prevents exploits from hijacking the control-flow transfer instructions for back-edge transfer violations (ret) or Return Oriented Programming using shadow stacks.

Shadow stacks are nothing new and have been previously implemented in software-based solutions. CET’s ability to tackle back-edge transfer violations is what Microsoft was searching for after it’s own software-level implementation at tackling the issue named Return Flow Guard (RFG) was broken by the ability of the attackers to modify the shadow stack itself.

Screen Shot 2022-10-24 at 10.51.32.png

A shadow stack is a second stack exclusively used for control transfer operations that can only hold return addresses. CET provides parity with program call stacks, by keeping a record of all the return addresses via the shadow stack. On every CALL instruction, return addresses are pushed onto both the call stack and shadow stack, and on RET instructions a comparison is made to ensure integrity is not compromised. If the addresses do not match, the processor issues a control protection exception. This traps into the kernel and the system will terminate the process to guarantee the integrity and security of the system.

Intel’s implementation of a shadow stack through CET is protected from being tampered through the page table protections to make it so that regular store instructions cannot modify the contents of the shadow stack.

36390867 (1).png

Writes to the shadow stack are restricted to control transfer instructions and shadow stack management instructions, according to Intel these instructions are :

INCSSP: Increment the shadow stack pointer
RDSSP: Read the shadow stack pointer
SAVEPREVSSP/RSTORSSP: Save the previous shadow stack pointer/ restore the saved shadow stack pointer
WRSS/ WRUSS: Write to the shadow stack
SETSSBSY/CLRSSBSY: Mark the shadow stack busy/clear the shadow stack busy flag (supervisor shadow stack token management)

The architecture provides a mechanism to switch shadow stacks using a pair of instructions; RSTORSSP and SAVEPREVSSP . RSTORSSP verifies a restore token located at the top of the new shadow stack and referenced by the memory operand of this instruction. After RSTORSSP determines the validity of the restore point on the new shadow stack, it switches the SSP to point to the token.

Screen Shot 2022-10-23 at 15.34.25.png

Once the shadow stack has been switched to a new shadow stack, the system can create a restore point on the old shadow stack by executing the SAFEPREVSSP instruction. In order to allow the SAVEPREVSSP instruction to determine the address where to save the “shadow stack restore” token, the RSTORSSP instruction replaces the restore token with a previous ssp token that holds the value of the SSP at the time the RSTORSSP instruction was invoked.

Screen Shot 2022-10-23 at 15.34.46.png

From here we can fundamentally understand how CET and MPX differs. Below is a C loop that allocates an array with some pointers and allocates it into a buffer then iterate through the array to calculate the sum of objects length value.

for (i=0; i<X; i++) {
   total+= a[i]->len;
}

When run through Intel MPX, bound checks are added to the code and the code turns into the snippet below.

obj* a[10]
a_b = bndmk a, a+79          // Make bounds
total = 0
 for (i=0; i<X; i++):
    ai = a + i
    bndcl a_b, ai            // Lower-bound check of a[i]
    bndcu a_b, ai+7          // Upper-bound check of a[i]
    objptr = load ai
    objptr_b = bndldx ai     // Bounds for pointer at a[i]
    lenptr = objptr + 100
    bndcl objptr_b, lenptr   // Lower-bound check of obj.len
    bndcu objptr_b, lenptr+3 // Upper-bound check of obj.len
    len = load lenptr
    total += len

The fact that MPX adds instructions into programs decreases program overhead by a significant margin, even compared to software-based solutions. While Intel MPX baked in instructions to perform these tasks at a silicon level, it’s complicated design made the performances gained from it’s hardware-approach completely meaningless.

CET is not focused on adding protections baked into the code at runtime, but rather to create a shadow buffer that ensures all return addresses can be verified in a secure manner. While the solution isn't exactly bulletproof

Intel has only rolled out CET in 10th generation CPUs and while methodologies to achieve buffer overflows still exist, they mainly rely on exploiting the weaker chains inside Windows such as Microsoft’s insistence on not implementing Indirect Branch Tracking (IBT) and instead enforcing control flow integrity on indirect calls through it’s own software-based solution.

Screen Shot 2022-10-24 at 10.46.21.png

While there are certain deficiencies to this approach of augmenting some control flow guarding functions to software-based solutions, the net positive is a reduction of buffer overflow type exploits as easy ways to attack Windows systems through buffer exploits are now significantly harder.

Control-Flow Enforcement on Windows With CFG and Intel CET

More from this blog

Forensics on Network Appliances

Messing Around with GPUs Again

Deepseek's Low Level Hardware Magic

The Elusive Apple Matrix Coprocessor (AMX)

Behind Chrome-Based DLP Plugins

Command Palette

More from this blog