grsecurity - The Reports of CVE's Death Have Been Greatly Exaggerated

Today we'll be discussing a recent presentation given by Greg Kroah-Hartman at the 2019 Kernel Recipes conference in Paris, France. The title of the talk was "CVEs are dead, long live the CVE!". Slides for the presentation are available here and a video of it is available here. Greg is a Linux Foundation Fellow, a prominent upstream Linux kernel developer, maintainer of several LTS trees, and a former employee of the security industry (for those who still remember his work at WireX from twenty years ago). For those unaware of my background, I am someone who has been following CVEs for as long as Greg. I am kernel developer and a backporter of Linux kernel commits. However, by being outside the upstream community, I have a unique perspective resulting in different conclusions that I believe more closely match that of other downstream distributions and end-users.

CVE is Broken, but the Best We Have

First: points of agreement. The CVE process is fairly broken in general, mainly due to contributors to the process and not anything intrinsic to CVE itself. For a project like the Linux kernel, the existence of CVEs are misleading in multiple directions. On the one hand, there's the problem of non-vulnerabilities being given CVEs from time to time, CVEs being given incorrectly-amplified CVSS scores, and CVEs for vulnerabilities that aren't necessarily applicable to the general end-user. On the other hand is the documented practice by some upstream Linux kernel developers (including Linus Torvalds at the very top) of intentionally obfuscating commit messages, silent security fixes, fixes that unknowingly addressed security issues, and all other security issues that don't receive a CVE for whatever reason.

According to CVE Details, 170 CVEs were allocated for Linux kernel vulnerabilities so far this year. Does this mean all users were affected by 170 vulnerabilities if they updated their systems last on Dec 31st 2018? No, and not only for the reasons just mentioned. The Linux kernel has a huge problem with fragmentation that only worsened with the change from a model of separate stable and development trees to one where the latest Linux kernel is automatically deemed "stable" after receiving very little real-world testing (certainly in comparison to all N-1 versions in wide use). Versions closer to ones in active use by distributions are more likely to see their vulnerabilities having CVEs assigned for them. This is because outside of a security researcher filing their own CVE, much of the work of filing for CVEs is done by the major Linux distributions, themselves a large user of CVEs.

Distributions use this information to inform their users, in a standardized way, of vulnerabilities they've recognized or had reported to them by customers or the wider community. The CVE identifier, CVSS information, and other associated attached information like links to upstream commits fixing the issue are visible to the public and useful to those outside that particular distribution's users. It is useful to other distributions and their users, as well to other downstream users of the Linux kernel that don't ship distribution kernels but nevertheless need some sort of information to help inform when to ship new kernels. While inferring importance is relatively easy when exploits are published and widely discussed, it is less obvious for cases where exploits don't exist. Most companies don't have access to security experts that are also kernel developers and able to provide individualized context on these issues, so for these companies what they have is deeply flawed, but it is all they have.

One generally sees CVEs phrased as issues discovered in the "Linux kernel through <version number>", but in reality, it is extremely rarely the case that all kernel versions prior to the one listed are affected. Many times even though the information is readily available to see what upstream kernel version introduced the vulnerability, this information is not included in the CVE description itself. It is possible that this may be a way of reducing confusion among users of heavily-modified distribution kernels, that while perhaps running an older kernel version, contains a significant amount of newer driver code and other functionality. Several distributions ship multiple versions of similar kernels, making it easier to just give a single description to a CVE than listing every single affected version being shipped. That reduction of confusion though comes at the cost of exaggerating the impact of a vulnerability that doesn't affect nearly the number of kernel versions implied by the CVE description (an exaggeration repeated by some journalists who repeat such information verbatim).

Aside from affected versions, of the vulnerabilities that did receive CVEs, not all users may be affected. Unused drivers and other functionality, on heavily stripped-down and custom-built kernels, or highly modular kernels with features like grsecurity's MODHARDEN can prevent these vulnerabilities from being reached by an unprivileged user. It should be clear by now that the presence or absence of CVEs (and associated CVSS info) themselves do very little to assess true risk for a given user.

Problematic Analysis

This brings us back to the subject of Greg's presentation. In it, he laments the problems of CVEs from the perspective of a kernel developer that has hundreds of CVEs filed for code upstream kernel developers are collectively responsible for. The solution proposed by Greg to the kernel developers in the audience on how to "fix" CVEs is to "[i]gnore them!" and "[r]e-brand what we have been doing for 14 years" by referencing vulnerabilities with a Change ID (CID). However, as we'll illustrate below, his arguments have several flaws and his proposed solution doesn't actually address any of the problems he laments.

In addition to the problems with CVEs mentioned above, Greg also cites "inability to handle ongoing/complex problems requiring multiple fixes over extended periods of time." As an example, he uses Spectre, which has a single CVE assigned but for which there are currently several dozen individual manual fixes applied (in contrast to something like grsecurity's Respectre plugin which addresses the problem more comprehensively without needing individual patches). This is a legitimate complaint, but it also seems that Greg would not be happy with the alternative of hundreds of CVEs being allocated for each individual Spectre v1 instance fixed in the kernel. He follows up that complaint with an odd remark suggesting something untoward about the fact that the US government runs the National Vulnerability Database (NVD) and sponsors the maintenance of the CVE list and overseeing CVE Numbering Authorities (CNAs) like the various Linux distributions. The insinuation seems to be that the US government is somehow sitting on vulnerability information before publishing it to the world. The problem with this critique is that CVEs are often assigned without providing information to MITRE at all; this is exactly how the CNA process works. Distributions and other CNAs receive a pool of numbers that they can allocate as necessary on their own. Greg also cites "abuse" of CVE by Red Hat engineers to "circumvent foolish management policies", but in neither the recorded live presentation nor the slides linked above did he provide any evidence for such a claim (despite claiming "proof" of such activity).

It is not clear whether the slide immediately following which discusses 41% of CVEs having a negative "fix date" (that is, the CVE was assigned after the vulnerability was fixed upstream) is supposed to be the "proof" of Greg's claim, particularly given that it is a natural consequence of many upstream kernel developers' behavior of not being explicit about security vulnerabilities being fixed. The 41% also covers the large case where researchers and distros work with upstream (via security@kernel.org or otherwise) to publish a fix before any CVE is allocated/announced. From a statistical standpoint, the data he's provided on slide 24 is problematic. He mentions CVEs being assigned on average 100 days after the vulnerability was already fixed upstream. However, with only 41% having a negative fix date, that means 59% percent were fixed after the CVE was assigned. So the "average" provided was a mean, produced as a result of a handful of vulnerabilities being assigned CVEs long after they were fixed. A better statistic to provide here (but which would give the opposite impression of what Greg was trying to portray) would be a median, especially given that the standard deviation of fix dates in his statistics was 405 days.

Greg claims that the negative fix dates and large standard deviation show that CVEs don't matter. Perhaps to upstream (if they ignore the other 59%), but CVEs clearly do matter to end-users. Nearly the entire world uses a Linux kernel version less than the very latest upstream release. While some CVEs are requested to pad resumes, the majority are allocated by distributions to inform their users and customers about issues in versions they're shipping. The number of CVEs against the Linux kernel would expand greatly if there was a concerted effort to assign them for very recent kernel vulnerabilities. Many of these end up being caught and fixed within a number of months, so filing CVEs to track these for the small number of non-customers using the latest kernels is not a productive use of limited resources. What matters to end-users is whether a vulnerability affects them. The fact that in some cases a fix may already exist upstream and wasn't communicated as such simply does not matter to their current situation.

Greg describes a situation where a CVE was filed for a non-vulnerability, and the fix for it introduced a memory leak of its own. Disputing the CVE took some time, and in the meantime some distributions shipped the buggy fix. This situation can hardly be blamed on CVE. Kees Cook listed himself as the reviewer of the bad fix, and Greg committed it. This kind of situation happens all the time in the kernel, regardless of the presence of any CVE. I've mentioned on Twitter before where a known broken fix causing a remote DoS was shipped in 3 stable kernel releases before finally being fixed properly. A bad fix was also naively backported to stable kernels causing three separate errors including boot failures. Here's a still-unfixed example I mentioned early this month: due to the "Fixes:" tag, it should be pulled into stable trees, and when it does it'll introduce the same double-free currently present upstream. The problems here at several levels are insufficient review, process failures, and lack of funding. Monitoring, reviewing, and backporting fixes for the Linux kernel requires constant full-time effort; it's not a process that begins and ends with the issuance of a CVE.

The Old New Thing

In the presentation, Greg admits Linux security fixes "happen at least once a week" and "look like any other bugfix." He also admits "very few CVEs ever get assigned for kernel issues." These are all problems caused by upstream development standards and processes. How then does upstream plan to address these problems and the other issues mentioned with CVEs? As Greg mentions, by rebranding their use of git commit IDs. Every Linux kernel commit has an associated commit ID, a hexadecimal SHA1 hash value. He proposes referring to vulnerabilities then with CID-<truncated 12 char hash> (the Change ID) and using a script to determine what kernel version introduced the vulnerability and fixed it. There are a number of problems with Greg's proposal, however, that will be outlined below.

CID is proposed as some kind of CVE replacement, but it fundamentally misunderstands what a CVE is in the first place. A CVE identifies a vulnerability, while a CID identifies a fix, more specifically a single commit involved in a fix. It is quite common for a vulnerability to require several commits to fix the vulnerability. It is also common for a single commit to resolve multiple vulnerabilities.

As every commit has a commit ID, there is no value in the existence of a CID alone: vulnerability fixes, non-vulnerability fixes, and new features alike will all have such an ID. The promotion of CID thus unfairly provides the illusion that security vulnerabilities are being separately tracked upstream, when that is explicitly not the case (at least at the level of top management). The relevance of a CID can only exist through the presence of some other existing mutable database interested only in security vulnerabilities, i.e., the CVE system that we already have. It will thus not be possible for Greg and others to ignore CVEs, as despite upstream's protestations, they are still and will continue to be in use by the rest of the world. CVEs will continue to be assigned against the Linux kernel, continuing the same problems that exist with CVEs filed today. CVEs are easily recognized by end-users, and there's a wide array of existing infrastructure and security management products tracking CVE information. Furthermore, looking at any random sampling of Linux kernel CVEs (take for example this one), if one scrolls down to the bottom of the page where the "references" are listed, there will generally be a link to the upstream commit that fixes the issue, if one exists yet. The first 12 characters of the id in that URL is what would be this "new" CID, information that has been present already for many years.

What may perhaps appear new for some people (though certainly not the distribution creators and consumers of CVEs) is the ability to use that commit ID and some simple scripting to find what kernel version includes that fix, and in some cases, what kernel version introduced the vulnerability. Why only in some cases? This is because that functionality depends on the presence of a "Fixes:" tag in the commit that fixed a particular vulnerability. Without that, some analysis needs to be performed to find the original cause. Even in the presence of such a tag, the information it provides can be incorrect or can't be used in an automated fashion. I provided one such example on Twitter recently. A recent talk by Dmitry Vyukov at the Linux Kernel Summit mentioned that in 2019, only 13.8% of commits had "Fixes:" tags. In the LTS 4.14 tree, only 40% of the backports have fixes tags. Therefore, while commit IDs can be used just as they have been able to in the past to find chains of "Fixes:" tags to provide more assurance that a fix in a particular tree is the most up-to-date one currently available upstream, commit IDs alone can't yet be used for determining vulnerable kernel versions. As backporting security fixes is something we do regardless of any stable@kernel.org CC or "Fixes:" tag, we are uniquely familiar with the amount of work involved in tracking down what kernel versions are affected by a particular vulnerability.

Running scripts to determine what kernel versions are affected by a vulnerability is simply not something the general end-user is going to do. No end-user is going to refer to a vulnerability by some long 12-character hexadecimal value; the longer-form CVE values these days are already not memorizable. Furthermore, for the reasons outlined earlier, CID won't provide relevant information for the majority of end-users who are using distribution-provided kernels. CID can't provide any information for a CVE where there's not yet some agreed upon upstream fix for the vulnerability that made its way into Linus' git repository. An upstream fix has never been any kind of guarantee that the security fix is correct (as Greg himself notes) and so CID provides less information than a CVE where the fix has been posted on a mailing list or elsewhere, as happens fairly regularly. As most users are running older kernels, the presence of that upstream fix is also no guarantee that it has been backported correctly to the older kernels. The CID may be listed, but the code changes actually applied may be significantly different. Absent any third-party expert being involved, users' entire awareness of risk in the Linux kernel they're using is informed and controlled by their particular distribution.

Wrapping Up

CVE is broken but CID does nothing to fix either CVE's problems or anything else about the Linux kernel development and security processes. CID is less a real solution and more of yet another entry in a long string of half-baked upstream marketing attempts, following the 'a bug is a bug' mantra, to handwave away flaws in their handling of security vulnerabilities and development processes that prioritize the introduction of new code above all else. From Greg's perspective at least, the problem is not with upstream, and he seemingly cannot comprehend why the rest of the world still stubbornly recognizes that security vulnerabilities deserve special attention and that the world doesn't update its kernels every three days as the world is instructed it "must" do. Greg's presentation contains nothing about how the underlying problems could be addressed, or in fact anything about how upstream could improve. It is entirely about how they're already doing a great job, using misleading statistics as the basis, and that the problem is that the rest of the world just needs help in recognizing it.

Changes to upstream processes and standards are what could produce real results, but not simple rebranding of information that's already been readily available to everyone for many years. More collaboration with distributions and other security practitioners that have to deal with the repercussions of upstream development processes, rather than the anti-security positions that continue to promote the status-quo, could result in better understanding and real solutions. Since the problems cited with CVE have to do with certain contributors to it, setting up processes and providing additional documentation to help them be better contributors is one way of making progress. Any solution to these problems will require actual work and funding. With the apparent unannounced demise of the much-hyped Linux Foundation's Core Inftrastructure Initiative, that outcome is not looking especially likely for the average Linux user.

Thanks to my anonymous reviewers and those on Twitter I discussed the presentation with extensively before writing this post.