The LLVM Memory Sanitizer support work in progress

In the past 31 days, I've managed to get the core functionality of MSan to work. This is an uninitialized memory usage detector. MSan is a special sanitizer because it requires knowledge of every entry to the basesystem library and every entry to the kernel through public interfaces. This is mandatory in order to mark memory regions as initialized.

Most of the work has been done directly for MSan. However, part of the work helped generic features in compiler-rt.

Sanitizers

Changes in the sanitizer are listed below in chronological order. Almost all of the changes mentioned here landed upstream. A few small patches were reverted due to breaking non-NetBSD hosts and are rescheduled for further investigation. I maintain these patches locally and have moved on for now to work on the remaining features.

NetBSD syscall hooks

I wrote a large patch (815kb!) adding support for NetBSD syscall hooks for use with sanitizers. I wrote the following description on the still pending patch for review:

Implement the initial set of NetBSD syscall hooks for use with sanitizers.

Add a script that generates the rules to handle syscalls
on NetBSD: generate_netbsd_syscalls.awk. It has been written
in NetBSD awk(1) (patched nawk) and is compatible with gawk.

Generate lib/sanitizer_common/sanitizer_platform_limits_netbsd.h
that is a public header for applications, and included as:
<sanitizer_common/sanitizer_platform_limits_netbsd.h>.

Generate sanitizer_netbsd_syscalls.inc that defines all the
syscall rules for NetBSD. This file is modeled after the Linux
specific file: sanitizer_common_syscalls.inc.

Start recognizing NetBSD syscalls with existing sanitizers:
ASan, ESan, HWASan, TSan, MSan, TSan.

Update the list of platform (NetBSD OS) specific structs
in lib/sanitizer_common/sanitizer_platform_limits_netbsd.

This patch does contain the most wanted structs
and handles the most wanted syscalls as of now, the rest
of them will be implemented in future when needed.

This patch is 815KB, therefore I will restrict the detailed
description to a demo:

$ uname -a
NetBSD chieftec 8.99.9 NetBSD 8.99.9 (GENERIC) #0: Mon Dec 25 12:58:16 CET 2017  root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64
$ cat s.cc                                                                                                                   
#include <assert.h>
#include <errno.h>
#include <glob.h>
#include <stdio.h>
#include <string.h>

#include <sanitizer/netbsd_syscall_hooks.h>

int main(int argc, char *argv[]) {
  char buf[1000];
  __sanitizer_syscall_pre_recvmsg(0, buf - 1, 0);
  // CHECK: AddressSanitizer: stack-buffer-{{.*}}erflow
  // CHECK: READ of size {{.*}} at {{.*}} thread T0
  // CHECK: #0 {{.*}} in __sanitizer_syscall{{.*}}recvmsg
  return 0;
}
$ ./a.out   
=================================================================
==18015==ERROR: AddressSanitizer: stack-buffer-underflow on address 0x7f7fffe9c2ff at pc 0x000000467798 bp 0x7f7fffe9c2d0 sp 0x7f7fffe9ba90
WRITE of size 48 at 0x7f7fffe9c2ff thread T16777215
    #0 0x467797 in __sanitizer_syscall_pre_impl_recvmsg /public/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_netbsd_syscalls.inc:394:3
    #1 0x4abeb2 in main (/public/llvm-build/./a.out+0x4abeb2)
    #2 0x419bba in ___start (/public/llvm-build/./a.out+0x419bba)

Address 0x7f7fffe9c2ff is located in stack of thread T0 at offset 31 in frame
    #0 0x4abd7f in main (/public/llvm-build/./a.out+0x4abd7f)

  This frame has 1 object(s):
    [32, 1032) 'buf' <== Memory access at offset 31 partially underflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMAR.Y: AddressSanitizer: stack-buffer-underflow /public/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_netbsd_syscalls.inc:394:3 in __sanitizer_syscall_pre_impl_recvmsg
Shadow bytes around the buggy address:
  0x4feffffd3800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3830: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3840: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x4feffffd3850: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1[f1]
  0x4feffffd3860: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3870: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd3890: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x4feffffd38a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==18015==ABORTING

NetBSD ioctl(2) hooks

Similar to the syscall hooks, there is need to handle every ioctl(2) call. I've created the needed patch, this time shorter - for less than 300kb. This code is still pending in upstream review:

Introduce handling of 1200 NetBSD specific ioctl(2) calls.
Over 100 operations are disabled as unavailable or conflicting
with the existing ones (the same operation number).

Add a script that generates the rules to detect ioctls on NetBSD.
The generate_netbsd_ioctls.awk script has been written
in NetBSD awk(1) (patched nawk) and is compatible with gawk.

Generate lib/sanitizer_common/sanitizer_netbsd_interceptors_ioctl.inc
with the awk(1) script.

Update sanitizer_platform_limits_netbsd accordingly to add the needed
definitions.

New patches still pending for upstream review

There are two corrections that I've created, and they are still pending upstream for review:

I've got a few more local patches that require cleanup before submitting to review.

NetBSD basesystem corrections

I've introduced few corrections in the NetBSD codebase:

Sanitizers in Go

I've prepared a scratch port of TSan and MSan to the Go language environment. This code mostly works. However, there are remaining bugs that must be fixed.

Results of ./race.bash:

Passed 340 of 347 tests (97.98%, 0+, 7-)
0 expected failures (0 has not fail)

The MSan state as of today

Although, I've managed to pass more than 90% of tests in the check-msan target within approximately one week, the gap between passing most of the test and sanitizing real world applications - even the small ones like cat(1) - is large. This pushed me towards supporting most of the important NetBSD syscalls and NetBSD ioctls, and demands from me to support most of the libc, libutil, librt, libm and libkvm entry calls.

********************
Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Testing Time: 27.92s
********************
Failing Tests (7):
    MemorySanitizer-X86_64 :: chained_origin_with_signals.cc
    MemorySanitizer-X86_64 :: dtls_test.c
    MemorySanitizer-X86_64 :: ioctl_custom.cc
    MemorySanitizer-X86_64 :: sem_getvalue.cc
    MemorySanitizer-X86_64 :: signal_stress_test.cc
    MemorySanitizer-X86_64 :: textdomain.cc
    MemorySanitizer-X86_64 :: tzset.cc

  Expected Passes    : 97
  Expected Failures  : 1
  Unsupported Tests  : 27
  Unexpected Failures: 7

I can already execute cat(1) under sanitizers, and this milestone was achieved just at the end of the passed month.

Most other from the NetBSD base programs are not usable, few examples:

There are also general stability problems with signals and forks. This all makes MSan not ready for larger applications like LLDB.

Solaris support in sanitizers

I've helped the Solaris team to bring the basic support for Solaris in sanitizers. I've helped the Solaris team add basic support for Sanitizers (ASan, UBsan). This does not help NetBSD directly, however indirectly it improves the overall support for non-Linux hosts and helps to catch more Linuxisms in the code.

Plan for the next milestone

I plan to continue the work on MSan and correct sanitizing of the NetBSD basesystem utilities. This mandates me to iterate over the basesystem libraries implementing the missing interceptors and correcting the current support of the existing ones. My milestone is to build all src/*bin* programs against Memory Sanitizer and when possible execute them cleanly.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate