Quick Analysis About the Crowdstrike Situation
itoldyouso.jpg about kernel-mode drivers, duh

Cover Illustration by ireneparamithaa
DISCLAIMER : This research was done using software obtained by myself individually, analyzed using hardware owned by myself individually. Some code may be simplified and edited to provide clarity or to maintain confidentiality.
On Friday, July 19th 2024, a faulty update pushed by Crowdstrike to its Falcon EDR caused machines globally to bluescreen and also to be stuck in an infinite bootloop. This has caused havoc globally, with Microsoft saying that globally around 8.5 million endpoints are affected globally.
So instead of relaxing like a normal person on a weekend, i decided to take a look (because i wanted twitter clout). One of the questions is what caused the BSOD and bootloop situation, and why did removing a .sys file help with the situation?
This article would not be possible without :
Patrick Wardle which provided the kernel driver for Crowdstrike and the offending channel files
Tavis Ormand for initial analysis
Aliz Hammond and Kyle Avery for final checking and editing because i wrote this half asleep
Analysis
Checking the Crash Logs and Official Workaround
DISCLAIMER : Some screenshots and crash logs were edited for confidentiality purposes
Checking the crash logs for the csagent.sys driver revealed some interesting clues.
EXCEPTION_RECORD: fffffb0d18d3ec28 -- (.exr 0xfffffb0d18d3ec28)
ExceptionAddress: fffff80d21df335a1 (csagent+0x0000000000035a1)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 000000000000009c
Attempt to read from address 000000000000009c
CONTEXT: fffffb0d18d3e460 -- (.cxr 0xfffffb0d18d3e460)
rax=ffffffb0d18d43f0 rbx=0000000000000240 rcx=0000000000000234
rdx=ffffffb0d18d5430 rsi=ffffff9a815b7835 rdi=ffffff9a815b7925
rip=ffff80d21df335a1 rsp=ffffffb0d18d3ef40 rbp=ffffffb0d18d3f50
r8=000000000000009c r9=ffffffb0d18d3e75 r10=ffffffb0d18d4f2c
r11=ffffffb0d18d4124 r12=ffffffb0d18d4128 r13=ffffffb0d18d41d0
r14=0000000000000030 r15=0000000000000120
iopl=0 nv up ei pl nz na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00050206
csagent+0x35a1:
fffff80d21df335a1 458b08 mov r9d,dword ptr [r8] ds:002b:00000000`0000009c=????????
Resetting default scope
An examination of the context record reveals the state of the CPU registers at the time of the exception. The instruction pointer (RIP) was at fffff80d21df335a1, and it attempted to execute the instruction mov r9d, dword ptr [r8]. This instruction tried to move data from the memory address pointed to by r8 into the r9d register. However, the address in r8 was 000000000000009c, which is invalid and resulted in the access violation.
Crowdstrike's official workaround post also provided some interesting clues.

The instructions advised users to boot into safe mode and locating a file called C-00000291*.sys in the Crowdstrike's driver directory and deleting it, which is interesting. This is where Crowdstrike's kernel-mode drivers are stored, which we have talked in length about the risks of kernel mode drivers, both in EDRs and in anticheat programs. But in the folder itself there are tons of .sys files, which begs the question why does Crowdstrike need so many kernel-mode drivers?
A later technical writeup published by Crowdstrike detailed how these files, despite their .sys extensions, these files are actually whats called Channel Files which contains behavioral detection signatures for the EDR platform.
Checking the Channel Files and Kernel Driver
The aforementioned technical writeup stated that the problematic Channel File (C-00000291-* with a .sys extension) controlled how Crowdstrike evaluates named pipes in Windows, which at the time was being updated to account for a new named pipe technique being leveraged by common C2 frameworks. However, the post didn't get into much detail into what it was actually trying to detect.
However, the timing highly coincided with the release of Cobaltstrike's new Aggressor Function, which can send arbitrary data to a custom post-exploitation job via a named pipe, providing significant flexibility in how post-ex tasks are executed.

This function empowers operators to send specific data strings to custom post-explotiation jobs, which can be used to build dynamic and context-specific execution of post-ex tasks, such as adaptive data exfiltration or automated response triggers. The general consesus is that C-00000291-00000000-00000032.sys is the offending channel file for this issue.

There was an early tweet with this screenshot that says this file is full of zeroes, which would've implied that Crowdstrike pushed a signature file filled with null values. But as far as im concerned this is definitely not the case for the many samples i've gathered so far. The files do appear to be obfuscated, probably to stop competitors reading their channel files and figuring out their heuristic detection models.

Inside Crowdstrike's driver we can read further how the driver interacts with the channel files.

The searches for a filename using the format string C-%08u-%08u-%08u.sys and then loads this into rdx. Its not possible to know for sure what logic error happened inside the channel file that caused the condition due to the file being obfuscated, but probably an invalid signature data triggered a fault in the kernel-mode driver.

The driver tried to access the 0x14 pointer in a buffer (mov r8, [rax+r11*8]). It sets up rax as the base address and calculates the offset. It seems like the code included checks to ensure that r8 is not null before attempting to read from it. test r8, r8 checks if r8 is non-zero, and if r8 is not zero, it jumps to loc_1400E14F4.
The problem lies where the initial null check only ensures that r8 is not zero, but it doesn't verify whether r8 points to a valid and accessible memory address. Because of this, the address passes and continues on to the next instruction.
The instruction movzx r9d, word ptr [r8] reads a word value from the address pointed to by r8, an invalid address in r8 would cause a crash when this read is attempted.
Why the bootloop?
But usually even third-party kernel mode drivers are loaded, they usually are loaded after the system has completed the boot process. But it seems that the crashes affect machines before even they get into the OS, so whats actually happening?
As part of the Microsoft Virus Initiative (MVI), Crowdstrike has special privileges through Windows's Early Launch Anti-Malware (ELAM) feature. ELAM allows security vendors to build kernel-mode drivers that loads early during bootup and before third-party drivers initialize. This can enable detection of malicious third-party kernel mode drivers and rootkits.
When the driver reads the new channel files, it caused an invalid memory access which caused a bluescreen. Then subsequently during the device restart, the driver reloads the channel file and causes another BSOD during boot which puts the system into Recovery Mode and causing the bootloop.
There are strict requirements to build ELAM drivers, which requires a vendor to have their drivers to pass certain requirements from Microsoft through testing and certification. But since these channel files weren't considered part of the driver, they didn't need to be signed of by Microsoft.
Solution
EDIT : Microsoft has released a dedicated tool for this exact purpose, check out here
There isn't a magic bullet solution to this really, since the driver loads before other third-party drivers like MDMs, its not like you can push out a GPO policy or a command via an MDM to remove these drivers. Likely you'd have to go to the machines one by one physically and boot into safemode to delete the faulty channel files manually.
You can use something like the Windows Preinstallation Environment (WinPE) to significantly speed-up the procedure, which makes this much easier to do at scale. This will create a bootable USB using the created WinPE media with a script will automatically run, delete the problematic CrowdStrike files, and reboot the system.
Prerequisites
Windows Assessment and Deployment Kit (ADK) for Windows 10 or later
Windows PE add-on for the ADK
Administrative privileges on a Windows 10 or later system
Steps
Install Windows ADK and Windows PE add-on Download and install both from the Microsoft website.
Create a working copy of Windows PE files Open Command Prompt as Administrator and run:
copype amd64 C:\WinPE_amd64Mount the Windows PE image
Dism /Mount-Image /ImageFile:"C:\WinPE_amd64\media\sources\boot.wim" /Index:1 /MountDir:"C:\WinPE_amd64\mount"Create a startup script Create a file named
startnet.cmdinC:\WinPE_amd64\mount\Windows\System32with the following content:wpeinit powershell -Command "Remove-Item -Path '$env:SystemDrive\Windows\System32\drivers\CrowdStrike\C-00000291*.sys' -Force" shutdown /f /r /t 0Modify the Windows PE configuration Edit
C:\WinPE_amd64\mount\Windows\System32\winpeshl.ini:[LaunchApps] %SYSTEMROOT%\System32\startnet.cmdAdd PowerShell support to WinPE
Dism /Add-Package /Image:"C:\WinPE_amd64\mount" /PackagePath:"C:\Program Files (x86)\Windows Kits\10\Assessment and Deployment Kit\Windows Preinstallation Environment\amd64\WinPE_OCs\WinPE-PowerShell.cab"Unmount and commit changes
Dism /Unmount-Image /MountDir:"C:\WinPE_amd64\mount" /CommitCreate bootable media
MakeWinPEMedia /UFD C:\WinPE_amd64 E:*Replace E: with your USB drive letter
For devices with BitLocker, there are some workarounds to retrieve the BitLocker key using various methods.
Conclusion
Who is to blame? Probably Crowdstrike.
But building kernel drivers are hard, and they are harder to QA test for. This isn't to excuse Crowdstrike, who basically made a kernel driver that loads hotpatches from userland. Because of the nature ELAM certification proces and lapses in QA practices at Crowdstrike, this possibly passed alot of eyes and was deployed haphazardly to production.
I've talked previously in my post about EndpointSecurity in MacOS about how building kernel mode drivers has tremendous risks to security and OS stability. While in recent days there have been more discussions surrounding EndpointSecurity, how it might be too restrictive, i still maintain that this is a good solution to avoid these types of disasters.
Many argue that creating usermode telemetry pipelines creates a single point of failure, but this is currently no different to the approaches used by vendors today like kernel callbacks and ETW-TI.
Microsoft tried to introduced security boundaries like this with the introduction of Kernel Patch Protection (KPP), but was met with stiff opposition from cybersecurity vendors. The European Commission also targeted Microsoft with an antitrust investigation due to its inclusion of KPP.
To quote :
"The second Vista security area causing the EC concern was PatchGuard, or kernel patch protection, the code that prevents access to the Vista kernel. Security vendors McAfee and Symantec were incensed they were banned from the kernel. The EC wanted Microsoft to disable this feature but Microsoft refused."
The issue was purely because when PatchGuard was introduced, Microsoft didn't really provide a userland telemetry equivalent for security vendors. I think the goal shouldn't be to remove all third-party code from the kernel, but atleast to give more usermode telemetry so these types of operations are not as necessary as before.
Are people gonna move out of Crowdstrike for this? Probably. Management types, especially non-technical ones, are inpatient and think they can buy themselves out of any problem. They think EDRs are magic black boxes that can stop breaches, but many forget that Crowdstrike is market leading not because their software has some secret sauce (they don't, all EDRs are basically the same product skinned with a different UI) but the human capital behind it from its reasearch and managed defense teams.
But alas, one can dream.






