The LLVM Sanitizers stage accomplished

I've managed to get the Memory Sanitizer to work for the elementary base system utilities, like ps(1), awk(1) and ksh(1). This means that the toolchain is ready for tests and improvements. I've iterated over the basesystem utilities and I looked for bugs, both in programs and in sanitizers. The number of detected bugs in the userland programs was low, there merely was one reading of an uninitialized variable in ps(1).

A prebuilt LLVM toolchain

I've prepared a prebuilt toolchain with Clang, LLVM, LLDB and compiler-rt for NetBSD/amd64. I prepared the toolchain on 8.99.12, however I have received reports that it works on other older releases.

Link: llvm-clang-compilerrt-lldb-7.0.0beta_2018-01-24.tar.bz2

The archive has to be untarballed to /usr/local (however it might work to some extent in other paths).

This toolchain contains a prebuilt tree of the LLVM projects from a snapshot of 7.0.0(svn). It is a pristine snapshot of HEAD with patches from pkgsrc-wip for llvm, clang, compiler-rt and lldb.

Sanitizers

Notable changes in sanitizers, all of them are in the context of NetBSD support.

Base system changes

I've tidied up inclusion of the internal namespace.h header in libc. This has hidden the usage of public global symbol names of:

I've also reverted the vadvise(2) syscall removal, from the previous month. This caused a regression in legacy code recompiled against still supported compat layers. Newly compiled code will use a libc's stub of vadvise(2).

The purpose of these changes was to stop triggering interceptors recursively. Such interceptors lead to sanitization of internals of unprepared (not recompiled with sanitizers) prebuilt code. It's not trivial to sanitize libc's internals and the sanitizers are not designed to do so. This means that they are not a full replacement of Valgrind-like software, but a a supplement in the developer toolbox. Valgrind translates native code to a bytecode virtual machine, while sanitizers are designed to work with interceptors inside the pristine libraries and embed functionality into the executable's code.

Future directions and goals

Possible paths in random order:

  1. In the quartet of UBSan (Undefined Behavior Sanitizer), ASan (Address Sanitizer), TSan (Thread Sanitizer), MSan (Memory Sanitizer) we need to add the fifth basic sanitizer: LSan (Leak Sanitizer). The Leak Sanitizer (detector of memory leaks) demands a stable ptrace(2) interface for processes with multiple threads (unless we want to build a custom kernel interface).
  2. Integrate the sanitizers with the userland framework in order to ship with the native toolchain to users.
  3. Port sanitizers from LLVM to GCC.
  4. Allow to sanitize programs linked against userland libraries other than libc, librt, libm and libpthread; by a global option (like MKSANITIZER) producing a userland that is partially prebuilt with a desired sanitizer. This is required to run e.g. MSanitized programs against editline(3). So far, there is no Operating System distribution in existence with a native integration with sanitizers. There are 3rd party scripts for certain scripts to build a stack of software dependencies in order to validate.
  5. Execute ATF tests with the userland rebuilt with supported flavors of sanitizers and catch regressions.
  6. Finish porting of modern linkers designed for large C++ software, such as GNU GOLD and LLVM LLD. Today the bottleneck with building the LLVM toolchain is a suboptimal linker GNU ld(1).

I've decided to not open new battlefields and return now to porting LLDB and fixing ptrace(2).

Plan for the next milestone

Keep upstreaming a pile of local compiler-rt patches.

Restore the LLDB support for traced programs with a single thread.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate