Canary in the Kernel Mine: Exploiting and Defending Against Same-Type Object Reuse
By Mathias Krause
October 21, 2022
Introduction
Our goal at Open Source Security Inc. is to constantly push the envelope in Linux kernel security as attackers won’t be resting either. The team has been doing it for over two decades now and is still hungry. Our work not only involves analyzing and auditing existing hardening and exploit mitigations, but also to monitor current state of the art exploits to get feedback – so to say – on which technique or neat little trick we might have missed so far. As the state of art in exploitation is rather slowly moving – at least in public – we work on topics we know have been troublesome in the past or will be in the future and research possible defenses. This blog entry is about one of these research topics, namely, handling a special case of a use-after-free bug: same type, same address reuse.
The Bug-ground Story
During a brief code audit, quite a while ago already, we noticed a bug in the Nitro Enclaves driver. Faulty error handling code would leave a stale file pointer behind in a process’ file descriptor table. If a reallocation of the underlying file object happens to be in a privileged process that is opening a restricted file, the stale file descriptor entry in the exploiting process could be abused to access that file as well.
Our proof-of-concept exploit was making use of passwd
as the privileged helper program to reallocate the dangling file object to gain access to /etc/shadow
, a sensitive file containing hashed passwords of all local user accounts including the root user. But the bug isn’t limited to that. Virtually any file opened on the system can be accessed this way – Emails processed by a mail server, SSH private keys, root’s nobly maintained .bashrc. It’s just a matter of waiting for it to happen.
It’s a severe bug on affected systems, but it also showed a blind spot in our back-then implementation of AUTOSLAB, a kernel heap hardening mechanism separating kernel memory allocations from each other. That separation would ensure that type confusion isn’t possible for a given slab page, as allocations will always be of the same type. But no type confusion nor buffer overflow, not even a KASLR leak is needed to successfully exploit the bug. All that’s required is to make the kernel reallocate the released memory for the same type of object; that’s likely to happen, because file objects use a dedicated slab cache.
Defining the Bug Class
The underlying bug is not of the usual type confusion or buffer overflow kind which AUTOSLAB already helps tremendously to tackle. It’s “only” a violation of the temporal memory integrity by using an object after it was released and later reallocated – preserving the object’s type, bounds and address. It’s a same-type, same-address use-after-free bug.
The threat model to handle boils down to an attacker abusing a dangling pointer to a reallocated object of the same type at the same address.
Evaluating Existing Defenses
Exploiting such a bug relies on reallocating the underlying memory by a more privileged process, which makes instrumentation-based approaches like software-based KASAN only partially help to detect these. An attacker can simply wait long enough to be sure the object got through the mandatory quarantine delay and reallocated prior to attempting to exploit the hijacked file object.
Arm’s Memory Tagging Extension (MTE) would be capable of handling the temporal violation aspect by assigning a new tag for each reallocation. But it’s only available for Armv8.5+, limiting its applicability to a narrow set of mostly mobile devices. There is even a KASAN mode that makes use of MTE. But aside from that, there’s no kernel-side use of MTE to protect kernel heap allocations for production-level workloads.
Designing a New Defense
Software-based tagging schemes making use of otherwise unused address bits are doomed to perform poorly, as each pointer indirection needs to be instrumented to do a tag check and mangle the pointer for the final memory access or make use of features like Top Byte Ignore (TBI) to ensure using canonical addresses. TBI, however, is again an Arm-only feature so far.
The fortunate nature of the bug at hand is that the userspace API doesn’t allow a process to directly refer to a file object. There’s an indirection layer via file descriptors which are simple integer values used as indices into a table, which allows us to implement a pointer verification mechanism at the file descriptor lookup level. But to detect the reuse of a dangling file pointer, still some kind of allocation generation needs to be embedded somewhere – either encoded in the pointer or the pointed-to object itself.
We went with the latter, as instrumenting pointers in software brings an overhead for all users of such typed pointers we want to avoid for performance reasons. We only need to verify a pointer’s validity at the few transitioning places where userspace file descriptors get converted to file pointers. To do so, we added a canary member to struct file
that can have three values: FILE_MAGIC_ALIVE
, FILE_MAGIC_DYING
and FILE_MAGIC_DEAD
that correspond to the object’s current state: “alive”, as in being a valid object, “dying” to mark invalid objects that could still be found via RCU lookups and “dead” for no longer valid objects.
The canary member gets updated and verified during a file object’s lifecycle. Verification happens in __fget_light()
and __fget_files_rcu()
as these are the two core lookup functions to convert file descriptors into file objects.1 If a validation violation occurs, the offending process gets killed, but not immediately, as it’s still holding critical locks, but before returning from the currently executing system call.2
The mindful reader might be right to point out that the canary checking is still not sufficient to detect a malicious reuse of a dangling file pointer, as the reallocation will, for sure, turn a “dead” object into a valid one again. To address this part of the problem we use a probabilistic approach by adding a random factor to the second aspect of the bug class: the memory address.
Under our new defense, reallocated objects will, with a certain probability, use a different memory address, making the dangling pointer no longer point to the beginning of the reallocated object. The now offset dangling pointer will make the canary check logic look at the wrong memory location and make it fail, as the magic value can no longer be found.
With both mechanisms in place we ensure that:
- File descriptors can only be turned into file pointers for live file objects and
- Reallocating file objects will likely get them a different memory address.
The first mechanism on its own is a cheap use-after-free detection, but we already mentioned that an attacker can simply wait until an object gets reallocated to overcome that. Combined with the second mechanism, however, this will lead to the canary check detecting the invalid use of a dangling pointer. The load of the canary member won’t actually read the file object’s canary value but some “random” (but safe) memory instead. Once detected, the current task gets terminated and further exploit handling is initiated.
This second defense comes at the cost of slightly higher memory usage to support the random object placing within a given allocation slot. By default, we use up to a cache line of slack space (64 bytes for most systems). We do, however, have to respect the object type’s alignment, reducing the total number of possible addresses per allocation slot.
For file objects on a 64-bit grsecurity kernel this leads to 64 / 8 + 1 = 9
possible addresses or a probability of ~11% of reallocating to the same address. That might seem disappointing at first, but the opposite is that the canary check will detect abuse of an invalid pointer with a probability of almost 90%. If a higher probability is wanted, one can get it by sacrificing a higher memory overhead and changing the slack space through a kernel command line parameter.
Trial by Fire
Eager to know if the mitigation will detect the bug class it was targeting, I was looking for a test case. I could have simply reintroduced the Nitro bug, but as that requires loading a driver for a special PCI device, more tweaks would have been needed and I didn’t wanted to do these again. So I started looking at some kernel code. Maybe there are more instances of the bug pattern?
Going through the list of files “git grep fd_install
” was flagging as potential candidates made me find one in the vmwgfx driver. I knew QEMU had support for a VMware virtual graphics card, so I tried to target that bug. It still needed some tweaks to the driver to get it loaded in QEMU to be able to trigger the bug, but I could reuse most of last year’s PoC. I just had to change some ioctl()
arguments and the path of the device node to operate on.
user@box:~$ gcc -o vmwgfx vmwgfx.c
user@box:~$ ./vmwgfx
[~] vmwgfx setup using /dev/dri/card0...
[i] confirmed to be targeting the right driver
[~] forking helper process...
[~] gathering stat info of '/etc/shadow'...
[i] predicted fence fd = 8
[~] signaling helper to get busy...
[~] triggering fence fd export...
[~] monitoring stale fd...Killed
The exploit attempt was prevented and the process terminated with more details to be found in the kernel log (trimmed for readability):
[ 79.442649] grsec: exploit attempt detected, please report to support@grsecurity.net
[ 79.442668] WARNING: CPU: 1 PID: 20765 at fs/file.c:862 __fdget_raw+0x105/0x140
[ 79.445309] Modules linked in:
[ 79.445934] CPU: 1 PID: 20765 Comm: vmwgfx Tainted: G T 5.4.171-grsec-02834-gb64d26c913d9-dirty #234
[ 79.447614] RIP: 0010:[<ffffffff818ac195>] __fdget_raw+0x105/0x140
[ :::: ]
[ 79.499088] grsec: banning user with uid 1000 until system restart for suspicious kernel crash
The canary check in __fget_light()
(which was inlined into __fdget_raw()
) detected the invalid file pointer and triggered the termination of the current process vmwgfx
. Grsecurity’s exploit handling then banned the user, preventing any more exploit attempts happening from that user. The system is otherwise still functional, no locks are pending, no state is corrupted.
Fortunately, my tests were successful and we could deploy the defense in grsecurity already earlier this year.
Covering more Object Types
Even though the technique is very effective for file objects as these go though the file-descriptor-to-file conversion where we placed the canary check, it’s not so much if there isn’t such a clear transition point. If a malicious process can refer to a freed object and is able to probe its validity without triggering the canary check logic, an attacker can just use this probe primitive to retry reallocating the targeted object until the address matches that of the dangling pointer again.
A task’s credentials are, no doubt, an object to protect, preferably by using a canary check. Unfortunately, cred objects trivially have such a probe primitive in the form of the get*id()
system calls. An attacker can just loop over geteuid()
until the dangling pointer gets the right offset again and returns a real user id – zero, root’s UID, would be what an attacker would be waiting for.
A task’s credentials are accessed directly by following a pointer in the task_struct
object describing the various resources and properties attached to the corresponding thread of execution. One could think of adding canary checks to all the various wrappers like current_cred()
, current_real_cred()
, __task_cred()
and so on. However, this would miss cases where the cred pointer only gets read and stored for later consumption, like the f_creds
member in struct file
, used to check the capabilities of the opener of a file; or within the io_uring subsystem to be able to asynchronously execute an i/o operation with the right credentials – the one of the original task issuing the system call instead of the kernel worker thread handling it.
Cred objects already do have a canary based scheme behind the kernel configuration option CONFIG_DEBUG_CREDENTIALS
. Its checks, though, are only exercised when a task’s credentials are intentionally modified, e.g. when reference counts change or credentials are temporarily overridden. However, for exploiting a dangling cred pointer nothing like that is needed. It’s just a matter of (ab)using the hijacked credentials to do privileged operations, like adding an entry to /etc/shadow
or creating a suid root binary with the attacker’s code.
Compiler Plugins to the Rescue!
Instead of trying to find and patch all relevant places in the Linux kernel source where a cred object gets dereferenced, we can do better with the help of a compiler plugin that will add these checks prior to using a cred object. This not only allows to defer the canary check up to the point where a cred object actually gets used instead of only getting its pointer taken, it also allows finer-grained control over which kind of access needs to be validated.
For the compiler-based approach, we added another annotation in the form of a structure member attribute and added such an enriched member to struct cred
:
struct cred {
[...]
#ifdef ....
unsigned int canary __canary(CRED_MAGIC);
#endif
[...]
} __randomize_layout;
This annotation will make the compiler plugin handle instances of struct cred
specially: testing the canary
member to have the value CRED_MAGIC
prior to being dereferenced for a read operation. We explicitly care about read operations only, as a write to one of the members implies the process’ privilege to change, for example, the uid
value has already been verified. This verification, in turn, implies a read operation of the current credentials to do the necessary checks. Security relevant writes are therefore strictly preceded by a read operation which will do the canary check as well.
This scheme also simplifies object initialization, as that’s just a series of writes to the object’s memory, allowing the canary to be initialized without creating a hen-and-egg problem of having a valid canary value to do the canary initialization.
The plugin in its current form instruments all read operations and relies on GCC’s dead code elimination pass to drop superfluous ones. GCC is actually pretty good at doing so and even moves checks out of loops if it can prove no writes to a cred object can happen within the loop. The overall instrumentation overhead is thereby condensed to the actual necessary ones, which happen to still be plentiful. However, as a canary check always precedes a real structure field read, its performance impact is benign, while its security impact is huge.
Fishing Credentials
I wanted to use CVE-2022-1043 as a litmus test for the canary plugin – a silently-fixed (at the time) vulnerability in the io_uring subsystem leading to a premature release of a process’ credentials affecting Linux kernel versions v5.12-rc3 to v5.14-rc6. As the bug was already fixed last year, I simply reverted the commit on top of grsecurity’s current stable tree for Linux kernel v5.15. I also needed to lift the Kconfig restrictions on CONFIG_IO_URING
to be able to actually enable the vulnerable subsystem, as it’s disabled by default in grsecurity.3
Writing a bug trigger was easy, turning it into a functional exploit not so much. After a day, I had cooked up something and made sure it could – beside all the subtleties that needed to be handled – “pop a shell”:
user@box:~$ uname -r
5.15.74-grsec+revert_a30f895a+
user@box:~$ zgrep -E 'OBJREUSE|IO_URING' /proc/config.gz
CONFIG_IO_URING=y
# CONFIG_GRKERNSEC_SLAB_OBJREUSE_HARDEN is not set
user@box:~$
user@box:~$ # We're running on a modified grsecurity kernel with commit a30f895ad323
user@box:~$ # ("io_uring: fix xa_alloc_cycle() error return value check") reverted,
user@box:~$ # IO_URING re-enabled and the canary plugin disabled.
user@box:~$
user@box:~$ gcc -pthread -o cve-2022-1043 cve-2022-1043.c
user@box:~$ ./cve-2022-1043
[~] forking helper process...
[~] creating worker threads...
[~] ID wrapped after 65536 allocation attempts! (id = 1)
[~] ID wrapped again after 131071 allocation attempts! (id = 1)
[~] waiting for creds to get reallocated...
[.] reused by uninteresting EUID -16843010 (PaX MEMORY_SANITIZE?)
[.] reused by uninteresting EUID 1000
[*] waiting for root shell...
# id
uid=0(root) gid=0(root) groups=0(root),1000(user)
The above log shows the power of a same-type use-after-free bug: no ROP, no info leak, no SMEP / SMAP / PTI bypass is needed to exploit the bug. It’s a data-only attack violating the temporal integrity of an object allowing an attacker to escalate privileges if there are no mitigations in place.
Lets try again with the canary plugin enabled:
user@box:~$ uname -r
5.15.74-grsec+revert_a30f895a+
user@box:~$ zgrep -E 'OBJREUSE|IO_URING' /proc/config.gz
CONFIG_IO_URING=y
CONFIG_GRKERNSEC_SLAB_OBJREUSE_HARDEN=y
CONFIG_GRKERNSEC_SLAB_OBJREUSE_HARDEN_PLUGIN=y
user@box:~$
user@box:~$ # We're again running a modified grsecurity kernel with the bug fix
user@box:~$ # reverted and IO_URING re-enabled but this time with the canary
user@box:~$ # plugin enabled.
user@box:~$
user@box:~$ gcc -pthread -o cve-2022-1043 cve-2022-1043.c
user@box:~$ ./cve-2022-1043
[~] forking helper process...
[~] creating worker threads...
[~] ID wrapped after 65536 allocation attempts! (id = 1)
[~] ID wrapped again after 131071 allocation attempts! (id = 1)
[~] waiting for creds to get reallocated...
Killed
The exploit was killed, the privilege escalation prevented.
Below is a trimmed-down kernel log relating to the failed exploit attempt. After some deciphering, one might see that the process that gets killed is cve-2022-1043
, i.e. our exploit. It trips over the canary check in the __do_sys_geteuid()
function which is implementing the geteuid()
system call, which demonstrates that the plugin is instrumenting the code as expected and grsecurity’s exploit handling is working by banning the offending user:
[ 166.638648] invalid opcode: 0000 [#1] PREEMPT SMP
[ 166.639209] CPU: 1 PID: 809 Comm: cve-2022-1043 Not tainted 5.15.74-grsec+revert_a30f895a+ #203
[ 166.640024] RIP: 0010:[<ffffffff811e7c06>] __do_sys_geteuid+0x66/0x80
[ 166.640606] Code: 31 c0 e8 4d 16 12 00 89 c0 48 8b 55 08 4c 31 e3 48 39 da 75 1a 48 81 7a f0 51 20 00 e7 75 0d 48 8b 5d f8 c9 c3 cc 0f 1f 40 00 <0f> 0b 0f b9 10 0f b9 10 66 90 cc cc cc cc cc cc 48 b8 af df ff 18
[ :::: ]
[ 166.694859] grsec: current credential structure corrupted, determining user via fallback method
[ 166.695513] grsec: banning user with uid 1000 until system restart for suspicious kernel crash
Having proven the effectiveness of the canary plugin to protect credentials, we were able to deploy this feature in grsecurity earlier this month in all supported kernels.
Supporting Features
As reusing file objects or credentials is a guessing game for an attacker with the defense in place, handling the violations is what really supports killing exploits. A canary check violation will always be handled as an exploit attempt. Grsecurity’s lockout feature will respond to such an event by terminating not only the triggering task, but also all other processes of the user that ran the exploit. The user also gets banned, preventing them from logging in again and attempting any other exploits while the system continues to operate.
For other bug class/exploit scenarios, proper structure layout randomization also deters attackers from crafting fake objects, spoofing canary values to spoil the canary checks.
Relevance
As shown by exercising real exploits, the new defense is effective in preventing exploitation of same-type, same-address use-after-free bugs for file objects and process credentials. Interestingly, the very same two object types are also specifically targeted in Zhenpeng Lin’s DirtyCred themed presentation. Both of his exploitation methods will fail on grsecurity kernels because the object offsetting by reallocating a memory slot and the canary checks will prevent exploiting the underlying use-after-free bug that’s needed to start off the attack.
Closing Remarks
The general problem of dangling file descriptors became mostly a non-issue in grsecurity, while for upstream Linux it still means patching one vulnerability at a time. Dangling cred pointers got tamed as well, thanks to the new type canary plugin, thwarting a new bug class before it becomes a widespread threat.
We’ll keep on doing what we’ve been doing successfully for over two decades: pushing Linux kernel security to handle new threats as they arise and iteratively improving our own production-grade defenses to make them stronger. Stay tuned for more to come!
There are two functions just because the kernel optimizes for the common case of a single-threaded process where the file descriptor table isn’t shared with other tasks. As a process cannot interrupt itself, no concurrent modifications to the file descriptor table can happen, so a simple table access is sufficient to get a stable result – either a live file object pointer or NULL for an unused slot.↩︎
In grsecurity this is implemented by
DEFERRED_BUG_ON(…)
which allows the current control flow to continue executing but signal an error to the caller and marking the current task to be terminated as soon as it’s safe to do so.↩︎grsecurity forcibly disables
CONFIG_IO_URING
as it has a track record of serious exploitable vulnerabilities ever since its introduction with no end in sight.↩︎