NetBSD: the first BSD introducing a modern process plugin framework in LLDB

A featureset for debugging NetBSD applications (without threads) has been merged with upstream LLDB! The number of passing tests this month has been increased from 267/1235 to 622/1247. This is +133% within one month and approximately 50% of successfully passed tests in total! As usual regular housekeeping of ptrace(2) interfaces has been done on the NetBSD side.

During this month I've finished the needed Native Process Plugin with breakpoints fully supported. In order to achieve this I had to address bugs, add missing features and diligently debug the debugger sniffing on the GDB Remote Protocol line. Since NetBSD-8 is approaching, I have performed the also needed housekeeping on the base system distribution side.

What has been done in NetBSD

I've managed to achieve the following goals:

Clean up in ptrace(2) ATF tests

We have created some maintanance burden for the current ptrace(2) regression tests. The main issues with them is code duplication and the splitting between generic (Machine Independent) and port-specific (Machine Dependent) test files. I've eliminated some of the ballast and merged tests into the appropriate directory tests/lib/libc/sys/. The old location (tests/kernel) was a violation of the tests/README recommendation:

When adding new tests, please try to follow the following conventions.

1. For library routines, including system calls, the directory structure of
   the tests should follow the directory structure of the real source tree.
   For instance, interfaces available via the C library should follow:

        src/lib/libc/gen -> src/tests/lib/libc/gen
        src/lib/libc/sys -> src/tests/lib/libc/sys
        ...

The ptrace(2) interface inhabits src/lib/libc/sys so there is no reason not to move it to its proper home.

PTRACE_FORK on !x86 ports

Along with the motivation from Martin Husemann we have investigated the issue with PTRACE_FORK ATF regression tests. It was discovered that these tests aren't functional on evbarm, alpha, shark, sparc and sparc64 and likely on other non-x86 ports. We have discovered that there is a missing SIGTRAP emitted from the child, during the fork(2) handshake. The proper order of operations is as follows:

Only the x86 ports were emitting the second SIGTRAP signal.

The culprit reason has been investigated and narrowed down to the child_return() function in src/sys/arch/x86/x86/syscall.c:

void
child_return(void *arg)
{
    struct lwp *l = arg;
    struct trapframe *tf = l->l_md.md_regs;
    struct proc *p = l->l_proc;
    if (p->p_slflag & PSL_TRACED) {
        ksiginfo_t ksi;
        mutex_enter(proc_lock);
        KSI_INIT_EMPTY(&ksi);
        ksi.ksi_signo = SIGTRAP;
        ksi.ksi_lid = l->l_lid;
        kpsignal(p, &ksi, NULL);
        mutex_exit(proc_lock);
    }
    X86_TF_RAX(tf) = 0;
    X86_TF_RFLAGS(tf) &= ~PSL_C;
    userret(l);
    ktrsysret(SYS_fork, 0, 0);
}

This child_return() function was the only one among all the existing ones for other platforms to contain the needed code for SIGTRAP. The appropriate solution was installed by Martin, as we taught our featured signal routing subsystem to handle early signals from the fork(2) calls.

PT_SYSCALL and PT_SYSCALLEMU

Christos Zoulas addressed the misbehavior with tracing syscall entry and syscall exit code. We can again get an event inside a debugger that a debuggee attempts to trigger a syscall and later return from it. This means that a debugger can access the register layout before executing the appropriate kernel code and read it again after executing the syscall. This allows to monitor exact sysentry arguments and return the values afterwards. Another option is to fake the trap frame with new values, it's sometimes useful for debugging.

With the addition of PT_SYSCALLEMU we can implement a virtual kernel syscall monitor. It means that we can fake syscalls within a debugger. In order to achieve this feature, we need to use the PT_SYSCALL operation, catch SIGTRAP with si_code=TRAP_SCE (syscall entry), call PT_SYSCALLEMU and perform an emulated userspace syscall that would have been done by the kernel, followed by calling another PT_SYSCALL with si_code=TRAP_SCX.

This interface makes it possible enables possibile to introduce the following into NetBSD: truss(1) from FreeBSD and strace(1) from Linux. There used to be a port of at least strace(1) in the past, but it's time to refresh this code. Another immediate consumer is of course in DTrace/libproc... as there are facilities within this library to trace the system call entry and exit in order to catch fork(2) events. Why to catch fork(2)? It can be useful to detach software breakpoints in order to detach them before cloning address space of forker for forkee; and after the operation reapply them again.

What has been done in LLDB

A lot of work has been done with the goal to get breakpoints functional. This target penetrated bugs in the existing local patches and unveiled missing features required to be added. My initial test was tracing a dummy hello-world application in C. I have sniffed the GDB Remote Protocol packets and compared them between Linux and NetBSD. This helped to streamline both versions and bring the NetBSD support to the required Linux level.

As a bonus I helped the OpenBSD people to get their initial code into the LLDB tree. At the moment this system supports only opening core(5) files with a single-thread. This inspired me to add the same capability for NetBSD.

By the end of March all local patches for LLDB have been merged upstream! This turned NetBSD into the first open-source operating system next to Linux & Android to use a Native Process Plugin framework with the debugserver capability. The current FreeBSD support in LLDB is dated and lagging behind and limited to local debugging. At some point FreeBSD should catch up with NetBSD's LLDB support. I predict that the majority of the work will be restricted to... s/NetBSD/FreeBSD/.

Among the features in the upstreamed NetBSD Process Plugin:

Demo

It's getting rather difficult to present all the features of the NetBSD Process Plugin without making the example overly long. This is why I will restrict it to a very basic debugging session hosted on NetBSD.

$ cat crashme.c 
#include 

int
main(int argc, char **argv)
{
        int i = argv;

        while (i-- != 0)
                printf("argv[%d]=%s\n", i, argv[i]);

        return 0;
}
$ gcc -w -g -o crashme crashme.c # -w disable all warnings                                                                                          
$ ./crashme
Memory fault (core dumped) 
chieftec$ lldb ./crashme
(lldb) target create "./crashme"
Current executable set to './crashme' (x86_64).
(lldb) r
Process 612 launched: './crashme' (x86_64)
Process 612 stopped
* thread #1, stop reason = signal SIGSEGV: address access protected (fault address: 0x7f7ffec88518)
    frame #0: 0x000000000040089c crashme`main(argc=1, argv=0x00007f7fffdd6420) at crashme.c:9
   6            int i = argv;
   7    
   8            while (i-- != 0)
-> 9                    printf("argv[%d]=%s\n", i, argv[i]);
   10   
   11           return 0;
   12   }
(lldb) frame var
(int) argc = 1
(char **) argv = 0x00007f7fffdd6420
(int) i = -2268129
(lldb) # i looks wrong, checking argv...
(lldb) p *argv
(char *) $0 = 0x00007f7fffdd6940 "./crashme"
(lldb) # set a brakpoint and restart
(lldb) b main
Breakpoint 1: where = crashme`main + 15 at crashme.c:6, address = 0x000000000040087f
(lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 612 exited with status = 6 (0x00000006) got unexpected response to k packet: Sff
Process 80 launched: './crashme' (x86_64)
Process 80 stopped
* thread #1, stop reason = breakpoint 1.1
    frame #0: 0x000000000040087f crashme`main(argc=1, argv=0x00007f7fff3d17c8) at crashme.c:6
   3    int
   4    main(int argc, char **argv)
   5    {
-> 6            int i = argv;
   7    
   8            while (i-- != 0)
   9                    printf("argv[%d]=%s\n", i, argv[i]);
(lldb) frame var
(int) argc = 1
(char **) argv = 0x00007f7fff3d17c8
(int) i = 0
(lldb) n
Process 80 stopped
* thread #1, stop reason = step over
    frame #0: 0x0000000000400886 crashme`main(argc=1, argv=0x00007f7fff3d17c8) at crashme.c:8
   5    {
   6            int i = argv;
   7    
-> 8            while (i-- != 0)
   9                    printf("argv[%d]=%s\n", i, argv[i]);
   10   
   11           return 0;
(lldb) p i
(int) $1 = -12773432
(lldb) # gotcha i = argv, instead of argc!
(lldb) bt
* thread #1, stop reason = step over
  * frame #0: 0x0000000000400886 crashme`main(argc=1, argv=0x00007f7fff3d17c8) at crashme.c:8
    frame #1: 0x000000000040078b crashme`___start + 229
(lldb) disassemble --frame --mixed

   4    main(int argc, char **argv)
** 5    {

crashme`main:
    0x400870 <+0>:  pushq  %rbp
    0x400871 <+1>:  movq   %rsp, %rbp
    0x400874 <+4>:  subq   $0x20, %rsp
    0x400878 <+8>:  movl   %edi, -0x14(%rbp)
    0x40087b <+11>: movq   %rsi, -0x20(%rbp)

** 6            int i = argv;
   7    

    0x40087f <+15>: movq   -0x20(%rbp), %rax
    0x400883 <+19>: movl   %eax, -0x4(%rbp)

-> 8            while (i-- != 0)

->  0x400886 <+22>: jmp    0x4008b3                  ; <+67> at crashme.c:8

** 9                    printf("argv[%d]=%s\n", i, argv[i]);
   10   

    0x400888 <+24>: movl   -0x4(%rbp), %eax
    0x40088b <+27>: cltq   
    0x40088d <+29>: leaq   (,%rax,8), %rdx
    0x400895 <+37>: movq   -0x20(%rbp), %rax
    0x400899 <+41>: addq   %rdx, %rax
    0x40089c <+44>: movq   (%rax), %rdx
    0x40089f <+47>: movl   -0x4(%rbp), %eax
    0x4008a2 <+50>: movl   %eax, %esi
    0x4008a4 <+52>: movl   $0x400968, %edi           ; imm = 0x400968 
    0x4008a9 <+57>: movl   $0x0, %eax
    0x4008ae <+62>: callq  0x400630                  ; symbol stub for: printf

** 8            while (i-- != 0)

    0x4008b3 <+67>: movl   -0x4(%rbp), %eax
    0x4008b6 <+70>: leal   -0x1(%rax), %edx
    0x4008b9 <+73>: movl   %edx, -0x4(%rbp)
    0x4008bc <+76>: testl  %eax, %eax
    0x4008be <+78>: jne    0x400888                  ; <+24> at crashme.c:9

** 11           return 0;

    0x4008c0 <+80>: movl   $0x0, %eax

** 12   }

    0x4008c5 <+85>: leave  
    0x4008c6 <+86>: retq   
(lldb) version
lldb version 5.0.0 (http://llvm.org/svn/llvm-project/lldb/trunk revision 299109)
(lldb) platform status 
  Platform: host
    Triple: x86_64-unknown-netbsd7.99
OS Version: 7.99.67 (0799006700)
    Kernel: NetBSD 7.99.67 (GENERIC) #1: Mon Apr  3 08:09:29 CEST 2017  root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC
  Hostname: 127.0.0.1
WorkingDir: /public/lldb_devel
    Kernel: NetBSD
   Release: 7.99.67
   Version: NetBSD 7.99.67 (GENERIC) #1: Mon Apr  3 08:09:29 CEST 2017  root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC
(lldb) # thank you!
(lldb) q
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y

Plan for the next milestone

I've listed the following goals for the next milestone.

Beyond the next milestone is x86 32-bit support.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate