Faster Multi-core Linux Kernel Build Testing
May 6, 2020
There are several groups that find themselves repeatedly building the Linux kernel. These include:
- Linux kernel developers and Linux distribution developers
- Folks like Dmitry Vyukov who do great work fuzzing the kernel via the syzkaller project
- Enthusiasts who want to bisect down to the smallest possible kernel configuration for their hardware
- Companies who create patches for enhancing kernel security
While on smaller multicore systems, the overall time to compile dominates the build process times, on larger multicore systems (like the recent AMD EPYC 2 7742, capable of 256 threads in a dual-CPU configuration) serialized/single-threaded operations in the build begin to take a larger percentage of the build time. These serialized operations include tools automatically run against generated vmlinux/module objects (e.g. modpost), the instruction decoder self-test (which looks for discrepancies in instruction lengths between the in-kernel instruction decoder and the installed objdump version), linking, and compression.
In CI environments that operate on the concept of 'executors', single-threaded operations extend the amount of time the executor is reserved, preventing more efficient workloads from being performed on it and delaying the overall job completion time.
So, if you find yourself building the kernel more than a couple times a day (especially with large
kernel configurations such as
allyesconfig), then you yearn to find a way to speed up
your workflow. There are several solutions available, but the ones we're discussing today are
multithreaded linking, configuration changes, and multithreaded compression.
Out of the box, the Linux kernel source code uses purely GNU tools like
ld. By switching to LLVM's LLD linker (something not commonly done with the Linux kernel until relatively recently), we get time savings by enabling multithreaded linking phases. Next we'll demonstrate how to build the Linux 5.4 kernel with
gcc but link it with
LLD. In the process, we'll apply a few other tweaks to improve the kernel build times.
The first step is to get a recent copy of
LLD. The easiest way is to build and install it from source, which we show below on Debian Bullseye.
~$ sudo apt-get install build-essential gcc-9-plugin-dev clang ninja-build cmake ~$ sudo apt-get install libncurses5-dev libelf-dev libssl-dev flex bison bc git pigz ~$ git clone https://github.com/llvm/llvm-project.git ~$ cd llvm-project ~/llvm-project$ git checkout release/10.x ~/llvm-project$ mkdir build ~/llvm-project$ cd build ~/llvm-project/build$ cmake -G Ninja -DLLVM_ENABLE_PROJECTS='clang;lld;compiler-rt' \ -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_WARNINGS=OFF \ -DCMAKE_INSTALL_PREFIX=/usr/local/llvm-10 ../llvm ~/llvm-project/build$ ninja ~/llvm-project/build$ sudo ninja install
LLD installed, the next step is to prepare the Linux 5.4 source tree to support this type of build, because you will get errors otherwise. You'll need to apply these commits into the source code:
Raw diffs, to be applied with
~$ wget https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.4.38.tar.xz ~$ tar -xf linux-5.4.38.tar.xz ~$ cd linux-5.4.38 ~/linux-5.4.38$ wget -O patch1.patch 'https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/rawdiff/?id=7273ad2b08f8ac9563579d16a3cf528857b26f49' ~/linux-5.4.38$ wget -O patch2.patch 'https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/rawdiff/?id=163159aad74d3763b350861b879b41e8f64121fc' ~/linux-5.4.38$ wget 'https://grsecurity.net/gzip.diff' ~/linux-5.4.38$ patch -p1 < ./patch1.patch ~/linux-5.4.38$ patch -p1 < ./patch2.patch ~/linux-5.4.38$ patch -p1 < ./gzip.diff
The gzip.diff above is a small patch that allows overriding the command used for gzip compression (the default compression used for Linux kernels, via
CONFIG_KERNEL_GZIP). This allows us to use a multi-threaded version of gzip called
pigz. In our testing,
pigz was roughly twice as fast as
pbzip2 and 9x as fast as
pixz with the default options.
We can now build the Linux kernel with the aforementioned changes. Note that we disable three particular configuration options because they slow the build down and don't provide benefits for constant automated build testing. We're also not building with
clang, as outside tests have shown it to be roughly 50% slower in compiling the kernel, and we'd miss out on the benefit of testing the GCC plugins that exist for the Linux kernel.
~/linux-5.4.38$ make allyesconfig ~/linux-5.4.38$ scripts/config --disable CONFIG_X86_DECODER_SELFTEST ~/linux-5.4.38$ scripts/config --disable CONFIG_MODULE_SIG ~/linux-5.4.38$ scripts/config --disable CONFIG_DEBUG_INFO ~/linux-5.4.38$ export PATH=/usr/local/llvm-10/bin:$PATH ~/linux-5.4.38$ make -j`nproc --all` LD=ld.lld HOSTLDFLAGS=-fuse-ld=lld KGZIP=pigz
In our testing with a dual-CPU AMD EPYC 7601 system (128 threads total) and the above changes, the wall clock time of an x86_64
build was reduced by 35%, though your mileage may vary. With the number of kernels we build and test on a constant basis, even smaller improvements
quickly add up. One such change we've made on our end in 2018 was to make
modpost multi-threaded, as it takes a significant amount of time
and (in addition to
kallsyms) is one of the few remaining single-threaded operations in building Linux kernels.
If you have other tips that offer signficant savings, let us know on Twitter and we'll do our best to update this post with the information. Happy building!