Accomplishment of forking code support in ptrace(2)

I've finished all the planned tasks regarding fork(2), vfork(2), clone(2)/__clone(2) and posix_spawn(3) in the context of debuggers. There are no longer known any kernel issues for any of these calls and all of them are covered with ATF regression tests.

Debugger related changes

NetBSD is probably the only mainstream OS that implements posix_spawn(3) with a dedicated syscall. All the other recognized kernels implement it as a libc wrapper around either fork(2), vfork(2) or clone(2). I've introduced new ptrace(2) event PTRACE_POSIX_SPAWN, improved the posix_spawn(3) documentation and introduced new ATF regression tests for posix_spawn(3).

The new ptrace(2) code has been exercised under load with LLDB test-suite, picotrace and NetBSD truss.

I intended to resume porting of edb-debugger from 2 years ago and use it for verifying FPU registers. I've spent some time on porting of this debugger to NetBSD/amd64 and managed to get functional process attaching. Unfortunately the code is highly specific to a single Operating System that is the only one that is really functional and the code needs reworking to be more agnostic to different semantics of kernels.

Issues with threaded debuggees

I've analyzed the problems with programs with multiple threads under debuggers. They could be classified into the following groups:

I've found that strengthening the kernel correctness now and fixing a bugs in handling threads can have paradoxally random impact on GDB/NetBSD. The solution for multiple events reported from multiple threads concurrently has been fixed shortly but it caused some fallout in the GDB support and I have decided to revert it for now. This pushed me to the conclusion that before fixing LWP events, there is a priority to streamline the GDB support: modernize it, upstream it, run regression tests.

Meanwhile there is an ongoing work on covering more kernel code paths with fuzzers and we catch and address problems out there. I'm supervising 3 ongoing projects in this domain: syzkaller support enhancements, TriforceAFL porting and AFL+KCOV integration. This work makes the big picture of what is still needed to be fixed clearer and lowers the cost of improving the quality.

LSan/NetBSD

The original implementation of Leak Sanitizer for NetBSD was developed for GCC by Christos Zoulas. The whole magic of a functional LSan software is the Stop-The-World operation. It means that a process suspends all other threads and is capable to investigate register frame and stacks of the suspended threads.

Until recently there were two implementations of LSan: for Linux (ab)using ptrace(2) and for Darwin using low-level Mach interfaces. Furthermore the Linux version needs a special version of fork(2) and makes assumptions about the semantics of the signal kernel code.

The original Linux code must attach to each thread (pid in Linux) separately. This applies to all other operations, such as detaching or killing a process through ptrace(2). Additionally listing threads of a debuggee on Linux is troublesome as there is need to read directories in /proc.

The implementation in GCC was closely reusing the semantics of Linux and there was a room for enhancements. I've picked this code as an inspiration and wrote a clean implementation reflecting the NetBSD kernel behavior and interfaces. In the end the NetBSD code is simpler than the Linux code and needs fewer recovery fallbacks (actually none..).. and needs 0 port specific kludges (in Linux every CPU needs dedicated treatment).

Unfortunately, I don't like this approach very much as it in my opinion abuses the ptrace(2) interface making sanitized for leaks programs incompatible with debuggers and the whole StopTheWorld() operation could be cleanly implemented on the kernel side as a new syscall. I have actully a semicompleted implementation of this syscall, however I really want to accomplish all the threading issues under ptrace(2) before moving on. The threading issues must be fully reliable in one domain of debugging, before implementing another kernel code. LLVM 9.0 and NetBSD-9 are both branching soon and it will be a good enough solution for the time being. There is still one behavior difference in atexit(3) semantics that raises a false positive in LSan tests and my current goal before the LLVM branching is to address just this. Request for comments on this specific atexit(3) issue is pending for feedback from upstream.

Plan for the next milestone

Modernize GDB/NetBSD support, upstream it and run GDB regression tests for NetBSD/amd64. Switch back to addressing threading kernel issues under a debugger.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

http://netbsd.org/donations/#how-to-donate