An Ancient Kernel Hole is (Not) Closed
June 19, 2017
The following is an analysis of a currently (at the time of this writing in early June) unnecessarily embargoed vulnerability in multiple OSes regarding the userland heap/stack gap and how it affects grsecurity (or rather, doesn't affect it and hasn't affected it for many years). It's unnecessarily embargoed because the issue was already publicly discussed fully in Gaël Delalleau's underappreciated 2005 CanSecWest presentation available here. A patch (available here) for this problem even existed prior to that presentation, created by Andrea Arcangeli in 2004 and included in at least SUSE Linux Enterprise 9 and 11 (as mentioned here).
I am not a member of the private distros or linux-distros mailing lists, and though this particular issue is leaking through various channels like a sieve, I have no interest in digging out more details or ruining embargoes that I'm not party to. As a matter of principle, we choose not to participate in embargoes (even for issues reported to us) as they distort public perception of security: differences in response time, level of proactive measures, and quality of in-house security talent is obfuscated through the activity and delay happening behind closed doors. Effective security shouldn't depend on hoping the "bad guys" don't have early access to vulnerability information, which is what zero-days are all about. So this article is being written based off my impression of what the "newly" discovered vulnerability is, knowing nothing about whatever particular userland application prompted the sudden intense interest, and will be published immediately after expiration of the embargo.
Andrea's patch mentioned above was never included in upstream Linux in 2004, and no improved version appeared in its place in the years following. That changed in 2010 when a highly-publicized instance of the kind of vulnerability Andrea's patch was designed to prevent was published against the X server by Rafal Wojtczuk. Rafal had worked with distributions on the now-shuttered private vendor-sec mailing list to ensure a fix for the issue was created at the kernel level. Seven weeks later, based purely off private discussions, Linus attempted to silently fix the issue via this commit, yet it had to be revised repeatedly later (at least as late as 2013 -- see this commit) as it broke userland applications and oopsed the kernel. But more importantly, Linus' patch didn't address the wider problem, but instead only the particular reported instance of the problem -- attacker-controlled recursion that could be prevented by a single guard page. We mentioned the problems with the patch publicly in comments on an LWN article on Linus' patch, but those comments, technical explanations, and C/assembly examples of issues not covered by Linus' patch fell on deaf ears; the upstream Linux stack/heap guard code saw no improvements in the following years. And now, as they say, the chickens have come home to roost.
I imagine the issue is that someone again realized the behavior of GCC and LLVM on non-Windows OSes with regard to the default non-existence of stack probing and found some new instance of a vulnerable and widely-used application/service to exploit. The following is a summary of the main issues which have all been heavily detailed and explained already in Gaël's 2005 presentation from slide 24 onward (under the heading of "Jumping the stack gap"). Calls to alloca() over a page in size, large local variables, variable-length arrays (VLAs), etc can all easily skip over a single guard page and read from/write to an attacker-controlled mmap-based heap allocation and cause further deeper stack frames to do the same, with all the implications on saved instruction pointers, etc. Not only does this problem affect the main process stack, but it's an issue for thread stacks as well. It was part of the motivation for the creation of GRKERNSEC_RAND_THREADSTACK over four years ago in response to this alloca()-based vulnerability/exploit by Exodus Intelligence. Further easing exploitation is that Linux will honor mmap hints, conceivably permitting some rare vulnerable application to allow an attacker to suggest an allocation just below the stack guard page without needing to exhaust virtual address space to accomplish it (which would open up this vulnerability for more 64-bit apps).
Under PaX's ASLR, this is infeasible since mmap hints are ignored. MAP_FIXED requests are of course honored as required by the standard, but an attacker controlling those can just as easily blow away any existing allocation and replace it with their own content. Further, much like Andrea's original patch, our heap/stack protection (implemented via an enforced gap instead of Linus' guard page) can be adjusted in size at runtime. Andrea's patch defaulted to a single page gap (for compatibility reasons that aren't a problem for the PaX implementation) whereas the PaX implementation enforces a 64KB gap at minimum by default. Without stack probing in place, any uncontrolled alloca() could be abused, so the chosen size of the enforced gap is a tradeoff between virtual address space wastage and security-based assumptions about reasonable stack allocation ranges an attacker might have control over without being fully unbounded. It should be clear that kernel-only attempts to solve this problem will necessarily always be incomplete, as the real issue lies in the lack of stack probing. Since the alternative real solution depends on rebuilding all userland, this is likely the only feasible solution for the foreseeable future. On grsecurity systems, the size of the heap/stack gap can be adjusted via the /proc/sys/vm/heap_stack_gap sysctl entry. For instance, the following command will enforce a 1MB main stack gap for all new allocations:
echo 1048576 > /proc/sys/vm/heap_stack_gap
An interesting historical note: looking through the current upstream Linux kernel code, one will find a remnant perhaps of Andrea's never-merged original implementation, a single "int heap_stack_gap = 0;" line unreferenced by anything else in the kernel, but introduced by accident via an unrelated nommu commit by David Howells in 2005. This variable in Andrea's implementation held the number of pages of the variable-sized heap/stack gap, something Linus' later implementation crucially lacked. Despite several public comments and LKML threads about this line, it continues to stand alone as a reminder about the dangers of NIH.
One might now be wondering: doesn't this same issue also apply to the kernel stack? Yes, it does. Here too upstream developers failed to note or care about this particular excerpt from our KSTACKOVERFLOW configuration help:
This introduces guard pages that in combination with the alloca checking of the STACKLEAK feature and removal of thread_info from the kernel stack prevents all forms of kernel process stack overflow abuse.
The PaX STACKLEAK plugin was added to grsecurity prior to my work on KSTACKOVERFLOW, so KSTACKOVERFLOW built upon it. The STACKLEAK plugin importantly instruments all implicit and explicit calls to alloca() by the kernel, ensuring the requests wouldn't step outside expected stack boundaries. One might recall the STACKLEAK plugin from when it was "ported" by a member of the KSPP, making no mention whatsoever in the commit description or Kconfig help of this functionality, and also having failed to even enable the plugin at all due to failing to adjust some copied and pasted lines from the Makefile to actually enable the plugin. This functionality also wasn't mentioned during a subsequent recent "port". This is but one of many examples that seriously raise the question of how security functionality will be properly implemented and maintained upstream if the maintainers don't understand what the code they've copy+pasted from grsecurity does in the first place.
The upstream CONFIG_VMAP_STACK has been claimed by many to be equivalent to what's present in grsecurity. Its Kconfig description includes the following:
This causes kernel stack overflows to be caught immediately rather than causing difficult-to-diagnose corruption.
This claim was repeated by various news outlets reporting on the upstream VMAP_STACK feature. One may recall VMAP_STACK for being responsible for over a dozen kernel CVEs, introducing potential memory corruption and denial of service through its design and resulting in several additional CVEs for memory handling errors in the fixes needed for those CVEs. Defending VMAP_STACK recently against claims by me that the implementation is objectively bad, Kees Cook of the KSPP said "With this implementation in place, now those kernel stack exploit methods are dead." Remember that even though stack overflow vulnerabilities are quite rare in the first place, let alone exploitable ones, at least one of the published exploits for a Linux kernel stack overflow vulnerability (CVE-2010-3848) was exactly for this kind of vulnerability that VMAP_STACK would be unable to protect against. Unless there were at least 99 other exploitable stack overflow vulnerabilities in the kernel, characterizations by another KSPP and linux-hardened contributor that VMAP_STACK fixes 99% of the issue are also patently false.
In fact, VMAP_STACK lacking the equivalent functionality in grsecurity not only leaves it possible to exploit certain VLA-based overflows, it may ironically make it even easier to exploit these lingering forms of kernel stack overflows. Vmalloc allocations are less frequent to the point that one could more reliably target an adjacent victim kernel stack or other large allocation, not needing to know its absolute address.
As should now be clear, the kinds of kernel stack overflows grsecurity can prevent are not at all dead upstream, or for that matter in the recent linux-hardened project, which in its comparison matrix comparing upstream to grsecurity under the heading of "Prevent kernel stack overflows" suggests that upstream's reimplementation of grsecurity's protection for this class of vulnerabilities is "complete". In our comparison matrix we marked the associated KSPP feature with a orange minus symbol denoting "watered-down features that differ significantly in their implementation and security benefits". We'd been called misleading for this, while I held my tongue knowing the facts of the matter. By now it should be evident how much faith should be put into security claims and comparisons to grsecurity by developers that don't understand the basics of the code they're copying and pasting. It also demonstrates what we've said all along about the synergistic benefits of various grsecurity and PaX features that aren't realized by mindless piecemeal extraction.
A careful reader may have noted the title of this article is a tongue-in-cheek reference to the LWN article linked above. This blog is being published as a teachable moment to demonstrate that articles written by non-security experts simply repeating the security claims of other non-security experts about their own code or others' code are to be taken with a large grain of salt. The "ancient kernel hole" LWN grandiosely proclaimed was closed, was in fact not closed at all for the past seven years. The facts destroying the myth have been available there for everyone to see all this time, but when a news site seems to care so little about accuracy in reporting that it doesn't (for instance) contact the subject of an article prior to publication, or correct glaring factual errors in its content that they've been made aware of, instead requiring readers to wade through pages of third-party comments, it's no wonder that the public is fooled by authoritative-sounding articles and don't bother investigating further. Some months ago I stopped publicly commenting on LWN due to its lack of concern for accuracy in reporting and commenters' lack of interest in learning. When the next issue is reported incorrectly, a reader can't assume an expert will chime in with a proper analysis as happened in this case seven years ago.
Technical debt always finds a way to be repaid, with interest.