The LLVM Sanitizers stage accomplished

I've managed to get the Memory Sanitizer to work for the elementary base system utilities, like ps(1), awk(1) and ksh(1). This means that the toolchain is ready for tests and improvements. I've iterated over the basesystem utilities and I looked for bugs, both in programs and in sanitizers. The number of detected bugs in the userland programs was low, there merely was one reading of an uninitialized variable in ps(1).

A prebuilt LLVM toolchain

I've prepared a prebuilt toolchain with Clang, LLVM, LLDB and compiler-rt for NetBSD/amd64. I prepared the toolchain on 8.99.12, however I have received reports that it works on other older releases.

Link: llvm-clang-compilerrt-lldb-7.0.0beta_2018-01-24.tar.bz2

The archive has to be untarballed to /usr/local (however it might work to some extent in other paths).

This toolchain contains a prebuilt tree of the LLVM projects from a snapshot of 7.0.0(svn). It is a pristine snapshot of HEAD with patches from pkgsrc-wip for llvm, clang, compiler-rt and lldb.

Sanitizers

Notable changes in sanitizers, all of them are in the context of NetBSD support.

Added fstat(2) MSan interceptor.
Support for kvm(3) interceptors in the common sanitizer code.
Added devname(3) and devname_r(3) interceptors to the common sanitizer code.
Added sysctl(3) familty of functions interceptors in the common sanitizer code.
Added strlcpy(3)/strlcat(3) interceptors in the common sanitizer code.
Added getgrouplist(3)/getgroupmembership(3) interceptors in the common sanitizer code.
Correct ctype(3) interceptors in a code using Native Language Support.
Correct tzset(3) interceptor in MSan.
Correct localtime(3) interceptor in the common sanitizer code.
Added paccept(2) interceptor to the common sanitizer code.
Added access(2) and faccessat(2) interceptors to the common sanitizer code.
Added acct(2) interceptor to the common sanitizer code.
Added accept4(2) interceptor to the common sanitizer code.
Added fgetln(3) interceptor to the common sanitizer code.
Added interceptors for the pwcache(3)-style functions in the common sanitizer code.
Added interceptors for the getprotoent(3)-style functions in the common sanitizer code.
Added interceptors for the getnetent(3)-style functions in the common sanitizer code.
Added interceptors for the fts(3)-style functions in the common sanitizer code.
Added lstat(3) interceptor in MSan.
Added strftime(3) interceptor in the common sanitizer code.
Added strmode(3) interceptor in the common sanitizer code.
Added interceptors for the regex(3)-style functions in the common sanitizer code.
Disabled unwanted interceptor __sigsetjmp in TSan.

Base system changes

I've tidied up inclusion of the internal namespace.h header in libc. This has hidden the usage of public global symbol names of:

strlcat -> _strlcat
sysconf -> __sysconf
closedir -> _closedir
fparseln -> _fparseln
kill -> _kill
mkstemp -> _mkstemp
reallocarr -> _reallocarr
strcasecmp -> _strcasecmp
strncasecmp -> _strncasecmp
strptime -> _strptime
strtok_r -> _strtok_r
sysctl -> _sysctl
dlopen -> __dlopen
dlclose -> __dlclose
dlsym -> __dlsym
strlcpy -> _strlcpy
fdopen -> _fdopen
mmap -> _mmap
strdup -> _strdup

I've also reverted the vadvise(2) syscall removal, from the previous month. This caused a regression in legacy code recompiled against still supported compat layers. Newly compiled code will use a libc's stub of vadvise(2).

The purpose of these changes was to stop triggering interceptors recursively. Such interceptors lead to sanitization of internals of unprepared (not recompiled with sanitizers) prebuilt code. It's not trivial to sanitize libc's internals and the sanitizers are not designed to do so. This means that they are not a full replacement of Valgrind-like software, but a a supplement in the developer toolbox. Valgrind translates native code to a bytecode virtual machine, while sanitizers are designed to work with interceptors inside the pristine libraries and embed functionality into the executable's code.

Future directions and goals

Possible paths in random order:

In the quartet of UBSan (Undefined Behavior Sanitizer), ASan (Address Sanitizer), TSan (Thread Sanitizer), MSan (Memory Sanitizer) we need to add the fifth basic sanitizer: LSan (Leak Sanitizer). The Leak Sanitizer (detector of memory leaks) demands a stable ptrace(2) interface for processes with multiple threads (unless we want to build a custom kernel interface).
Integrate the sanitizers with the userland framework in order to ship with the native toolchain to users.
Port sanitizers from LLVM to GCC.
Allow to sanitize programs linked against userland libraries other than libc, librt, libm and libpthread; by a global option (like MKSANITIZER) producing a userland that is partially prebuilt with a desired sanitizer. This is required to run e.g. MSanitized programs against editline(3). So far, there is no Operating System distribution in existence with a native integration with sanitizers. There are 3rd party scripts for certain scripts to build a stack of software dependencies in order to validate.
Execute ATF tests with the userland rebuilt with supported flavors of sanitizers and catch regressions.
Finish porting of modern linkers designed for large C++ software, such as GNU GOLD and LLVM LLD. Today the bottleneck with building the LLVM toolchain is a suboptimal linker GNU ld(1).

I've decided to not open new battlefields and return now to porting LLDB and fixing ptrace(2).

Plan for the next milestone

Keep upstreaming a pile of local compiler-rt patches.

Restore the LLDB support for traced programs with a single thread.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate