One year checkpoint and Thread Sanitizer update

The past year has been started with bugfixes and the development of regression tests for ptrace(2) and related kernel features, as well as the continuation of bringing LLDB support and LLVM sanitizers (ASan + UBsan and partial TSan + Msan) to NetBSD.

My plan for the next year is to finish implementing TSan and MSan support, followed by a long run of bug fixes for LLDB, ptrace(2), and other related kernel subsystems

TSan

In the past month, I've developed Thread Sanitizer far enough to have a subset of its tests pass on NetBSD, started with addressing breakage related to the memory layout of processes. The reason for this breakage was narrowed down to the current implementation of ASLR, which was too aggressive and which didn't allow enough space to be mapped for Shadow memory. The fix for this was to either force the disabling of ASLR per-process, or globally on the system. The same will certainly happen for MSan executables. After some other corrections, I got TSan to work for the first time ever on October 14th. This was a big achievement, so I've made a snapshot available.

$ gdb ./a.out                                  
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...done.
(gdb) r
Starting program: /public/llvm-build/a.out 
[New LWP 2]
==================
WARNING: ThreadSanitizer: data race (pid=1621)
  Write of size 4 at 0x000001475d70 by thread T1:
    #0 Thread1 /public/llvm-build/tsan.c:4:10 (a.out+0x46bf71)

  Previous write of size 4 at 0x000001475d70 by main thread:
    #0 main /public/llvm-build/tsan.c:10:10 (a.out+0x46bfe6)

  Location is global 'Global' of size 4 at 0x000001475d70 (a.out+0x000001475d70)

  Thread T1 (tid=2, running) created by main thread at:
    #0 pthread_create /public/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:930:3 (a.out+0x412120)
    #1 main /public/llvm-build/tsan.c:9:3 (a.out+0x46bfd1)

SUMMARY: ThreadSanitizer: data race /public/llvm-build/tsan.c:4:10 in Thread1
==================

Thread 2 received signal SIGSEGV, Segmentation fault.

I was able to get the above execution results around 10% of the time (being under a tracer had no positive effect on the frequency of successful executions). Getting the snapshot of execution under GDB was pure hazard.

I've managed to hit the following final results for this month, with another set of bugfixes and improvements:

check-tsan:
Expected Passes    : 248
Expected Failures  : 1
Unsupported Tests  : 83
Unexpected Failures: 44

At the end of the month, TSan can now reliably executabe the same (already-working) program every time. The majority of failures are in tests verifying sanitization of correct mutex locking usage.

There are still problems with NetBSD-specific libc and libpthread bootstrap code that conflicts with TSan. Certain functions (pthread_create(3), pthread_key_create(3), _cxa_atexit()) cannot be started early by TSan initialization, and must be deffered late enough for the sanitizer to work correctly.

MSan

I've prepared a scratch support for MSan on NetBSD to help in researching how far along it is. I've also cloned and adapted the existing FreeBSD bits; however, the code still needs more work and isn't functional yet. The number of passed tests (5) is negligible and most likely does not work at all.

The conclusion after this research is that TSan shall be finished first, as it touches similar code.

In the future, there will be likely another round of iterating the system structs and types and adding the missing ones for NetBSD. So far, this part has been done before executing the real MSan code. I've added one missing symbol that was missing and was dettected when attempting to link a test program with MSan.

Sanitizers

The GCC team has merged the LLVM sanitizer code, which has resulted in almost-complete support for ASan and UBsan on NetBSD. It can be found in the latest GCC8 snapshot, located in pkgsrc-wip/gcc8snapshot. Though, do note that there is an issue with getting backtraces from libasan.so, which can be worked-around by backtracing ASan events in a debugger. UBsan also passes all GCC regression tests and appears to work fine. The code enabling sanitizers on the GCC/NetBSD frontend will be submitted upstream once the backtracing issue is fixed and I'm satisfied that there are no other problems.

I've managed to upstream a large portion of generic+TSan+MSan code to compiler-rt and reduce local patches to only the ones that are in progress. This deals with any rebasing issues, and allows me to just focus on the delta that is being worked on.

I've tried out the LLDB builds which have TSan/NetBSD enabled, and they built and started fine. However, there were some false positives related to the mutex locking/unlocking code as TSan still isn't reliable.

Plans for the next milestone

The general goals are to finish TSan and MSan and switch back to LLDB debugging. I plan to verify the impact of the TSan bootstrap initialization on the observed crashes and research the remaining failures.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate