Control-Flow Enforcement on Windows With CFG and Intel CET
A dive into the purported final solution against buffer overflow attacks
During the Skylake era, Intel introduced Memory Protection Extensions (MPX) for preventing memory errors and attacks through checking pointer references who has their compile time normal intensions maliciously exploited at runtime due to a buffer overflow. MPX was designed to give legacy C/C++ programs the ability to do bound checking transparently without significant impact to performance, making sure that no program can put more data in a buffer than it can hold.
But in practice, it introduced alot of performance penalties compared to software-based solutions and was ineffective against certain types of attacks. MPX was deprecated in Intel hardware releases after 2019.
But that doesn’t mean Intel gave up on the idea, as a year after they introduced Control-flow Enforcement Technology (CET) in Tiger Lake. The solution was meant as a Hardware-assisted stop gap, which needs a piece of software to facilitate its implementation. In Windows, this is achieved by integrating CET functions into Control Flow Guard (CFG).
One of the components used is through shadow stacks, which prevents exploits by detecting tampering of return addresses and throwing an exception in order to crash the application. This prevents exploits from hijacking the control-flow transfer instructions for back-edge transfer violations (ret
) or Return Oriented Programming using shadow stacks.
Shadow stacks are nothing new and have been previously implemented in software-based solutions. CET’s ability to tackle back-edge transfer violations is what Microsoft was searching for after it’s own software-level implementation at tackling the issue named Return Flow Guard (RFG) was broken by the ability of the attackers to modify the shadow stack itself.
A shadow stack is a second stack exclusively used for control transfer operations that can only hold return addresses. CET provides parity with program call stacks, by keeping a record of all the return addresses via the shadow stack. On every CALL
instruction, return addresses are pushed onto both the call stack and shadow stack, and on RET
instructions a comparison is made to ensure integrity is not compromised. If the addresses do not match, the processor issues a control protection exception. This traps into the kernel and the system will terminate the process to guarantee the integrity and security of the system.
Intel’s implementation of a shadow stack through CET is protected from being tampered through the page table protections to make it so that regular store instructions cannot modify the contents of the shadow stack.
Writes to the shadow stack are restricted to control transfer instructions and shadow stack management instructions, according to Intel these instructions are :
INCSSP
: Increment the shadow stack pointerRDSSP
: Read the shadow stack pointerSAVEPREVSSP/RSTORSSP
: Save the previous shadow stack pointer/ restore the saved shadow stack pointerWRSS/ WRUSS
: Write to the shadow stackSETSSBSY/CLRSSBSY
: Mark the shadow stack busy/clear the shadow stack busy flag (supervisor shadow stack token management)
The architecture provides a mechanism to switch shadow stacks using a pair of instructions; RSTORSSP
and SAVEPREVSSP
. RSTORSSP
verifies a restore token located at the top of the
new shadow stack and referenced by the memory operand of this instruction. After RSTORSSP determines the validity of the restore point on the new shadow stack, it switches the SSP to point to the token.
Once the shadow stack has been switched to a new shadow stack, the system can create a restore point on the old shadow stack by executing the SAFEPREVSSP
instruction. In order to allow the SAVEPREVSSP
instruction to determine the address where to save the “shadow stack restore” token, the RSTORSSP
instruction replaces the restore token with a previous ssp token that holds the value of the SSP at the time the RSTORSSP
instruction was invoked.
From here we can fundamentally understand how CET and MPX differs. Below is a C loop that allocates an array with some pointers and allocates it into a buffer then iterate through the array to calculate the sum of objects length value.
for (i=0; i<X; i++) {
total+= a[i]->len;
}
When run through Intel MPX, bound checks are added to the code and the code turns into the snippet below.
obj* a[10]
a_b = bndmk a, a+79 // Make bounds
total = 0
for (i=0; i<X; i++):
ai = a + i
bndcl a_b, ai // Lower-bound check of a[i]
bndcu a_b, ai+7 // Upper-bound check of a[i]
objptr = load ai
objptr_b = bndldx ai // Bounds for pointer at a[i]
lenptr = objptr + 100
bndcl objptr_b, lenptr // Lower-bound check of obj.len
bndcu objptr_b, lenptr+3 // Upper-bound check of obj.len
len = load lenptr
total += len
The fact that MPX adds instructions into programs decreases program overhead by a significant margin, even compared to software-based solutions. While Intel MPX baked in instructions to perform these tasks at a silicon level, it’s complicated design made the performances gained from it’s hardware-approach completely meaningless.
CET is not focused on adding protections baked into the code at runtime, but rather to create a shadow buffer that ensures all return addresses can be verified in a secure manner. While the solution isn't exactly bulletproof
Intel has only rolled out CET in 10th generation CPUs and while methodologies to achieve buffer overflows still exist, they mainly rely on exploiting the weaker chains inside Windows such as Microsoft’s insistence on not implementing Indirect Branch Tracking (IBT) and instead enforcing control flow integrity on indirect calls through it’s own software-based solution.
While there are certain deficiencies to this approach of augmenting some control flow guarding functions to software-based solutions, the net positive is a reduction of buffer overflow type exploits as easy ways to attack Windows systems through buffer exploits are now significantly harder.