Porting ptrace(2) software to NetBSD

Presenter Notes

netbsd

Target audience: system level developers

Author: Kamil Rytarowski

uname: NetBSD 7.99.62 amd64

Date: 26th February 2017

Presenter Notes

Agenda

  • Overview
  • ptrace(2) principles
  • How to start to trace a process
  • Introspecting tracee
  • Breakpoints and watchpoints
  • Monitoring events
  • Managing threads
  • ELF Auxiliary Vector
  • Tracing syscalls
  • News in NetBSD 8.0 (state in progress)
  • Porting Linux code
  • Porting FreeBSD code
  • Porting OpenBSD code

Presenter Notes

Overview

Presenter Notes

ptrace(2) - process tracing and debugging facility

By using process trace one process can take over another, inspect and manipulate its register context, address space and control flow.

Underneath this interface is using the reparenting mechanism. The tracer is new parent, the tracee is a child and userland processes mostly don't notice it.

This interface is used in BSD operating systems and Linux (kernel) and they are relatively closely related.

Other Operating Systems may use different mechanisms - this includes MacOSX, SunOS family and others. These are out of scope for this talk.

Traditionally BSD 4.4 introduced procfs tracing ability to face performance issues with the original ptrace(2) design. In the past ptrace(2) was limited to peeking and poking data in sizeof(int) chunks between the tracer and its tracee. With the advent of PT_IO that's no longer true. Therefore /proc has no longer added value besides compatibiliy with software.

First appeared in Version 7 AT&T UNIX.

Presenter Notes

Function prototypes

NetBSD

1 int
2 ptrace(int request, pid_t pid, void *addr, int data);

Linux

1 long
2 ptrace(enum __ptrace_request request,
3        pid_t pid, void *addr, void *data);

FreeBSD

1 int
2 ptrace(int request, pid_t pid, caddr_t addr, int data);

OpenBSD

1 int
2 ptrace(int request, pid_t pid, caddr_t addr, int data);

Presenter Notes

Included headers

NetBSD

1 #include <sys/types.h>
2 #include <sys/ptrace.h>

Linux

1 #include <sys/ptrace.h>

FreeBSD

1 #include <sys/types.h>
2 #include <sys/ptrace.h>

OpenBSD

1 #include <sys/types.h>
2 #include <sys/ptrace.h>

Presenter Notes

ptrace(2) principles

Presenter Notes

ptrace(2) is a signal-based monitoring mechanism. Parent intercepts from its tracee signals with a wait(2)-like function or a SIGCHLD signal handler.

ptrace(2) is used to mangle the whole process, unlike Linux where this function is used to manage each thread separately.

ptrace(2) is a nonstandard extension of a POSIX-like system.

Most of the time the tracee works normally without performance overhead, but when it receives a signal it stops.

For most ptrace(2) operations the tracee must be stopped.

Presenter Notes

Security

A process must have the same real UID as the tracing process, and also it must not be executing a setuid or setgid executable. (If the tracing process is running as root: these restrictions do not apply.)

Three restrictions apply to all tracing processes, even those running as root.

  • First, no process may trace a system process.
  • Second, no process may trace the process running init(8).
  • Third, if a process has its root directory set with chroot(2), it may not trace another process unless that process' root directory is at or below the tracing process' root.

In order to put a trap by a tracer into the tracee's program, the debugger must violate the PaX MPROTECT restrictions.

1 $ sysctl -d security.pax.mprotect.ptrace
2 security.pax.mprotect.ptrace: When enabled, allow ptrace(2) to \
3 override mprotect permissions on traced processes

Presenter Notes

Events

A debugger can receive additional events - besides regularly intercepted signals:

  • tracee termination,
  • LWP events (thread creation and termination)
  • child events (process fork(2), vfork(2) and vfork(2) done),
  • software breakpoint
  • hardware assisted breakpoint or watchpoint (debug register based)
  • single step trap
  • execve(2) trap - transforming the calling process into a new program
  • system call trap (on syscall entry and on syscall exit)

Presenter Notes

How to start to trace a process

Presenter Notes

Good solution

Ask our parent to trace us:

1 ptrace(PT_TRACE_ME, 0, NULL, 0);
2 /* usually followed by execve(2)-like functions */
3 execve(...); /* when a process is traced this call generates SIGTRAP */

Attach to child or unrelated process:

1 ptrace(PT_ATTACH, pid, NULL, 0);
2 /* process referenced with 'pid' generates SIGSTOP and stops */

Presenter Notes

Bad solution

Old, broken and/or unportable ways:

  • exect(3)

    BSD4.2 specific solution, no longer works. It turns on single-step mode in the process. Formally marked deprecated in NetBSD 8.0.

  • clone(3) with flags CLONE_PTRACE and CLONE_UNTRACED

    Linux specific. Not implemented nor applicable on NetBSD.

Presenter Notes

Example 1 - PT_ATTACH

 1 int status;
 2 pid_t wpid;
 3 int rv;
 4 
 5 /* Attach to a process referenced with unique "pid" identity */
 6 rv = ptrace(PT_ATTACH, pid, NULL, 0);
 7 if (rv == -1)
 8     err(EXIT_FAILURE, "ptrace");
 9 
10 /* Receive SIGSTOP from the new child (after reparenting process) */
11 wpid = waitpid(pid, &status, 0);
12 if (wpid == -1)
13     err(EXIT_FAILURE, "waitpid");
14 if (wpid != pid)
15     errx(EXIT_FAILURE, "unexpected process has been reported");
16 
17 /* Sanity check the SIGSTOP event */
18 if (!WIFSTOPPED(status))
19     errx(EXIT_FAILURE, "child not stopped");
20 if (WSTOPSIG(status) != SIGSTOP)
21     errx(EXIT_FAILURE, "child not stopped with SIGSTOP");

Presenter Notes

Example 2 - PT_TRACE_ME (1/2)

 1 int status;
 2 pid_t wpid;
 3 int rv;
 4 pid_t child;
 5 
 6 child = fork();
 7 if (child < 0)
 8     err(EXIT_FAILURE, "fork");
 9 if (child == 0) {
10     /* Let our parent trace us  */
11     ptrace(PT_TRACE_ME, 0, NULL, 0);
12 
13     /*
14      * call exec()-like functions, when traced it will raise SIGTRAP
15      * (execve(2) always triggers a SIGTRAP in this scenario unless
16      * PT_SYSCALL is in use, which blacklists TRAP_EXEC events)
17      */
18     execlp("/bin/echo", "/bin/echo", NULL);
19 }

Presenter Notes

Example 2 - PT_TRACE_ME (2/2)

 1 /* PT_TRACE_ME does not generate any signal to its parent*/
 2 
 3 /* Receive SIGTRAP from child after exec() call*/
 4 wpid = waitpid(pid, &status, 0);
 5 if (wpid == -1)
 6     err(EXIT_FAILURE, "waitpid");
 7 if (wpid != pid)
 8     errx(EXIT_FAILURE, "unexpected process has been reported");
 9 
10 /* Sanity check the SIGTRAP event */
11 if (!WIFSTOPPED(status))
12     errx(EXIT_FAILURE, "child not stopped");
13 if (WSTOPSIG(status) != SIGTRAP)
14     errx(EXIT_FAILURE, "child not stopped with SIGTRAP");

Presenter Notes

Introspecting tracee

Presenter Notes

Old solution

  • PT_READ_I, PT_READ_D - These requests read a single int of data from the traced process' address space.
  • PT_WRITE_I, PT_WRITE_D - These requests parallel PT_READ_I and PT_READ_D, except that they write rather than read.

Read example:

1 errno = 0;
2 var_value = ptrace(PT_READ_D, child, var_address, 0 /* unused */);
3 /* check "errno != 0" */

Write example:

1 int rv;
2 rv = ptrace(PT_WRITE_D, child, var_address, var_value);
3 /* check if rv != 1 */

This API is by design underperformant and should not be used in new code, unless we want to transfer a single integer between the tracer and the tracee.

Today there is no difference between data and instruction space operations, therefore PT_READ_I & PT_READ_D and PT_WRITE_I & PT_WRITE_D are today exactly the same.

Presenter Notes

New solution PT_IO

This request is a more general interface that can be used instead of PT_READ_D, PT_WRITE_D, PT_READ_I, and PT_WRITE_I. The I/O request is encoded in a struct ptrace_io_desc defined as:

1 struct ptrace_io_desc {
2     int     piod_op;
3     void    *piod_offs;
4     void    *piod_addr;
5     size_t  piod_len;
6 };

where piod_offs is the offset within the traced process where the I/O operation should take place, piod_addr is the buffer in the tracing process, and piod_len is the length of the I/O request. The piod_op field specifies which type of I/O operation to perform.

Possible values to read and write regular data from tracee are:

  • PIOD_READ_D (replaces PT_READ_D)
  • PIOD_WRITE_D (replaces PT_WRITE_D)
  • PIOD_READ_I (replaces PT_READ_I)
  • PIOD_WRITE_I (replaces PT_WRITE_I)

Presenter Notes

Breakpoints and watchpoints

Presenter Notes

Breakpoints

Software breakpoints are set with PIOD_WRITE_I or PT_WRITE_I operations by modifying the instruction segment of the tracee.

Hardware breakpoints are set with Machine Dependent Debug Registers with PT_SETDBREGS.

Presenter Notes

Watchpoints

Hardware watchpoints are set with a Machine Dependent Debug Registers with PT_SETDBREGS.

Presenter Notes

Monitoring events

Presenter Notes

Debugger Monitor

A debugger can catch and differentiate the following events:

  • Process software breakpoint - SIGTRAP with si_code TRAP_BRKPT
  • Process child trap (fork(2), vfork(2), vfork(2) done) - SIGTRAP with si_code TRAP_CHLD
  • Process hardware debug register trap - SIGTRAP with si_code TRAP_DBREG
  • Process exec trap - SIGTRAP with si_code TRAP_EXEC
  • Process LWP trap (thread creation and termination) - SIGTRAP with si_code TRAP_LWP
  • Process trace trap (single step trap) - SIGTRAP with si_code TRAP_TRACE
  • System call entry - SIGTRAP (planned si_code TRAP_SCE)
  • System call exit - SIGTRAP (planned si_code TRAP_SCX)

Additionally a tracer can catch a signal emitted to the tracee and the process termination with its exit status.

Presenter Notes

Debugger Monitor - siginfo(2) (1/2)

To differentiate the event that was caught on NetBSD one must call a wait(2)-like function.

Then if a process was stopped with SIGTRAP the appropriate solution to detect the exact event that caused the process to stop is to use the PT_GET_SIGINFO call.

The PT_SET_SIGINFO request can be used to specify signal information emitted to the tracee. This signal information is specified in struct ptrace_siginfo defined as:

1 typedef struct ptrace_siginfo {
2     siginfo_t       psi_siginfo;
3     lwpid_t         psi_lwpid;
4 } ptrace_siginfo_t;

Where psi_siginfo is set to the signal information structure. The psi_lwpid field describes the LWP address of the signal. Value 0 means the whole process (route signal to all LWPs).

A pointer to this structure is passed in addr. The data argument should be set to sizeof(struct ptrace_siginfo).

Presenter Notes

Debugger Monitor - siginfo(2) (2/2)

In order to pass a faked signal to the tracee, the signal type must match the signal passed to the process with PT_CONTINUE, PT_DETACH.

The PT_GET_SIGINFO request can be used to determine signal information that was received by a debugger. The information is read into the struct ptrace_siginfo pointed to by addr. The data argument should be set to sizeof(struct ptrace_siginfo).

Presenter Notes

Debugger Monitor - EVENT_MASK (1/3)

The PT_SET_EVENT_MASK request can be used to specify which events in the traced process should be reported to the tracing process. These events are specified in a struct ptrace_event defined as:

1 typedef struct ptrace_event {
2     int     pe_set_event;
3 } ptrace_event_t;

Where pe_set_event is the set of events to be reported. This set is formed by OR'ing together the following values:

  • PTRACE_FORK - Report fork(2).
  • PTRACE_VFORK - Report vfork(2).
  • PTRACE_VFORK_DONE - Report parent resumed after vfork(2).
  • PTRACE_LWP_CREATE - Report thread birth.
  • PTRACE_LWP_EXIT - Report thread termination.

Presenter Notes

Debugger Monitor - EVENT_MASK (2/3)

The fork(2) and vfork(2) events can occur with similar operations, like clone(2) or posix_spawn(3). The PTRACE_FORK value means that the process gives birth to its child without pending on its termination or execve(2) operation. If enabled, the child is also traced by the debugger and SIGTRAP is generated twice, first for the parent and second for the child. The PTRACE_VFORK event is the same as PTRACE_FORK, but the parent blocks after giving birth to the child. The PTRACE_VFORK_DONE event can be used to report unblocking of the parent.

A pointer to this structure is passed in addr. The data argument should be set to sizeof(struct ptrace_event).

Presenter Notes

Debugger Monitor - EVENT_MASK (3/3)

PT_GET_EVENT_MASK can be used to determine which events in the traced process will be reported. The information is read into the struct ptrace_event pointed to by addr. The data argument should be set to sizeof(struct ptrace_event).

PT_GET_PROCESS_STATE reads the state information associated with the event that stopped the traced process. The information is reported in a struct ptrace_state defined as:

1 typedef struct ptrace_state {
2     int     pe_report_event;
3     pid_t   pe_other_pid;
4 } ptrace_state_t;

A pointer to this structure is passed in addr. The data argument should be set to sizeof(struct ptrace_state).

Presenter Notes

Managing threads

Presenter Notes

Listing threads

The PT_LWPINFO call can be used to list the threads (LWPs) in a process.

PT_LWPINFO returns information about a thread from the list of threads for the process specified in the pid argument. The addr argument should contain a struct ptrace_lwpinfo defined as:

1 struct ptrace_lwpinfo {
2     lwpid_t pl_lwpid;
3     int pl_event;
4 };

where pl_lwpid contains a thread LWP ID. Information is returned for the thread following the one with the specified ID in the process thread list, or for the first thread if pl_lwpid is 0. Upon return pl_lwpid contains the LWP ID of the thread that was found, or 0 if there is no thread after the one whose LWP ID was supplied in the call. pl_event contains the event that stopped the thread. Possible values are:

  • PL_EVENT_NONE
  • PL_EVENT_SIGNAL
  • PL_EVENT_SUSPENDED

The data argument should contain sizeof(struct ptrace_lwpinfo).

Presenter Notes

Listing threads nits

The PT_LWPINFO call on FreeBSD and NetBSD must not be used interchangeably. The FreeBSD call is used to retrieve the event that stopped the process, while the NetBSD version is used to retrieve the list of threads within a process.

The pl_event field in struct ptrace_lwpinfo must not be used to detect if the whole process received a signal or other type of an event (process termination).

Where in FreeBSD PT_LWPINFO shows which event stopped a child, use PT_GET_SIGINFO on NetBSD instead.

To retrieve the thread that was signaled one should use PT_GET_SIGINFO and read the psi_lwpid member of struct ptrace_siginfo.

A user should be warned that PT_LWPINFO is not obligated to retrieve the threads within a process in a specific order with threads identities (lwpid_t) increasing from the lowest to the highest.

Presenter Notes

Resume and suspend a thread

To allow a thread to execute one can use PT_RESUME.

To prevent a thread from execution a debugger can call PT_SUSPEND.

Traditionally PT_CONTINUE, PT_SYSCALL and PT_STEP can be used to continue only a specific thread, however new code should be using PT_RESUME and PT_SUSPEND operations as they don't prevent us from emitting a signal to a tracee.

To retrieve the list of suspended threads in a process the PT_LWPINFO call should be used.

Presenter Notes

ELF Auxiliary Vector

Presenter Notes

The ELF Auxiliary vector of a tracee can be retrieved with PT_IO and option PIOD_READ_AUXV.

Presenter Notes

Tracing syscalls

Presenter Notes

There are PT_SYSCALL and PT_SYSCALLEMU operations, both need more work and tests.

Presenter Notes

Changes in NetBSD 8.0 (in progress)

Presenter Notes

ptrace(2) news in NetBSD 8.0

The following features have been prepared in the upcoming NetBSD 8.0 release:

  • The pl_event member of struct ptrace_lwpinfo can take the new value PL_EVENT_SIGNAL
  • PT_SET_EVENT_MASK in pe_set_event of struct ptrace_event has new members: PTRACE_VFORK, PTRACE_VFORK_DONE, PTRACE_LWP_CREATE, PTRACE_LWP_EXIT
  • PT_SET_SIGINFO and PT_GET_SIGINFO
  • PT_SET_SIGMASK and PT_GET_SIGMASK
  • PT_GETDBREGS and PT_SETDBREGS on amd64 and i386
  • PT_RESUME and PT_SUSPEND
  • new siginfo(2) codes: TRAP_EXEC, TRAP_CHLD, TRAP_DBREG, TRAP_LWP

Presenter Notes

Porting Linux code

Presenter Notes

Porting Linux ptrace(2) code is complicated as the ptrace(2) calls are used to control threads on Linux and a process on NetBSD.

A debugger for Linux has to perform the job of synchronizing threads - stopping them, terminating, etc in its own code; on NetBSD there is no need for this as a process is a single entity from the ptrace(2) point of view.

Some of the Linux ptrace(2) calls have no direct or indirect equivalents, however these calls are not really applicable on NetBSD.

The only really missing feature on NetBSD is lack of concurrent PT_SYSCALL and PT_STEP execution.

Presenter Notes

  • PTRACE_TRACEME -> PT_TRACE_ME
  • PTRACE_PEEKTEXT and PTRACE_PEEKDATA -> PT_READ_I and PT_READ_D
  • PTRACE_PEEKUSER - there is no direct equivalent on NetBSD, user data on NetBSD might be consisted of mcontext (general purpose registers, floating point registers and Thread Local Storage) and debug registers
  • PTRACE_POKETEXT, PTRACE_POKEDATA -> PT_WRITE_I and PT_WRITE_D
  • PTRACE_GETREGS -> PT_GETREGS
  • PTRACE_GETFPREGS -> PT_GETFPREGS
  • PTRACE_GETREGSET - there is no direct equivalent on NetBSD, use appropriate PT_GET*REGS
  • PTRACE_SETREGS -> PT_SETREGS
  • PTRACE_SETFPREGS -> PT_SETFPREGS
  • PTRACE_SETREGSET - there is no direct equivalent on NetBSD, use appropriate PT_SET*REGS

Presenter Notes

  • PTRACE_GETSIGINFO -> PT_GET_SIGINFO
  • PTRACE_SETSIGINFO -> PT_SET_SIGINFO
  • PTRACE_PEEKSIGINFO - there is no equivalent on NetBSD, not really applicable
  • PTRACE_GETSIGMASK -> PT_GET_SIGMASK
  • PTRACE_SETSIGMASK -> PT_SET_SIGMASK
  • PTRACE_SETOPTIONS -> PT_SET_EVENT_MASK
  • PTRACE_SETOPTIONS option PTRACE_O_EXITKILL - there is no direct equivalent on NetBSD, a tracer should just call PT_KILL before termination
  • PTRACE_SETOPTIONS option PTRACE_O_TRACECLONE - PTRACE_FORK and/or PTRACE_VFORK, clone(2) works like fork(2) (parent not waiting on child) or vfork(2) (parent waiting on child)
  • PTRACE_SETOPTIONS option PTRACE_O_TRACEEXEC - nothing is needed, on NetBSD SIGTRAP is always triggered unless traced with PT_SYSCALL

Presenter Notes

  • PTRACE_SETOPTIONS option PTRACE_O_TRACEEXIT - not applicable on NetBSD, if really needed there is sysctl(7) proc.$PID.stopexit
  • PTRACE_SETOPTIONS option PTRACE_O_TRACEFORK - PTRACE_FORK
  • PTRACE_SETOPTIONS option PTRACE_O_TRACESYSGOOD - not applicable on NetBSD, to trace syscalls use PT_SYSCALL
  • PTRACE_SETOPTIONS option PTRACE_O_TRACEVFORK - PTRACE_VFORK
  • PTRACE_SETOPTIONS option PTRACE_O_TRACEVFORKDONE - PTRACE_VFORK_DONE
  • PTRACE_GETEVENTMSG -> mostly PT_GET_PROCESS_STATE
  • PTRACE_CONT -> PT_CONTINUE
  • PTRACE_SYSCALL -> PT_SYSCALL

Presenter Notes

  • PTRACE_SINGLESTEP -> PT_STEP
  • PTRACE_SYSEMU -> PT_SYSCALLEMU
  • PTRACE_SYSEMU_SINGLESTEP - not supported
  • PTRACE_LISTEN - not supported nor applicable
  • PTRACE_KILL -> PT_KILL
  • PTRACE_INTERRUPT - not supported nor applicable
  • PTRACE_ATTACH -> PT_ATTACH
  • PTRACE_SEIZE - not supported nor applicable
  • PTRACE_DETACH -> PT_DETACH

Presenter Notes

Porting FreeBSD code

Presenter Notes

Both systems, FreeBSD and NetBSD, have closely related ptrace(2) operations.

The major difference is that there is no way on NetBSD to detect syscall entry (TRAP_SCE) and exit trap (TRAP_SCX).

Another difference in syntax is that FreeBSD puts event information into PT_LWPINFO, while NetBSD relies on siginfo(2) and other applicable calls like PT_GET_PROCESS_STATE.

Presenter Notes

  • ELF Auxiliary Vector is stored in the kernel on FreeBSD, and retrievable with PT_IO on NetBSD
  • PT_LWPINFO completely different purposes on NetBSD (list threads) and FreeBSD (detect event that interrupted the tracee)
  • PT_LWPINFO pl_flags PL_FLAG_SCE - currently unsupported, as it is requires the TRAP_SCE siginfo(2) value
  • PT_LWPINFO pl_flags PL_FLAG_SCX - currently unsupported, as it is requires the TRAP_SCX siginfo(2) value
  • PT_LWPINFO pl_flags PL_FLAG_EXEC - TRAP_EXEC
  • PT_LWPINFO pl_flags PL_FLAG_SI - the PT_GET_SIGINFO call is always available
  • PT_LWPINFO pl_flags PL_FLAG_FORKED - TRAP_CHLD and PT_GET_PROCESS_STATE with PTRACE_FORK
  • PT_LWPINFO pl_flags PL_FLAG_CHILD - TRAP_CHLD and PT_GET_PROCESS_STATE with PTRACE_FORK

Presenter Notes

  • PT_LWPINFO pl_sigmask - PT_GET_SIGMASK
  • PT_LWPINFO pl_siglist - currently no use-case, not needed, seems Linux specific clone
  • PT_LWPINFO pl_siginfo - PT_GET_SIGINFO
  • PT_LWPINFO pl_tdname - currently not needed, this value can be retrieved with sysctl(7) if really needed
  • PT_LWPINFO pl_child_pid - TRAP_CHLD and PT_GET_PROCESS_STATE with PTRACE_FORK
  • PT_LWPINFO pl_syscall_code - not applicable, debugger is supposed to use PT_GETREGS
  • PT_LWPINFO pl_syscall_narg - not applicable, debugger is supposed to use PT_GETREGS
  • PT_GETNUMLWPS - PT_LWPINFO and iterate over all threads or use sysctl(7)
  • PT_GETLWPLIST - PT_LWPINFO

Presenter Notes

  • PT_SETSTEP - not applicable, use PT_STEP
  • PT_CLEARSTEP - not applicable, do not use PT_STEP
  • PT_TO_SCE - not applicable, use PT_SYSCALL
  • PT_TO_SCX - not applicable, use PT_SYSCALL
  • PT_FOLLOW_FORK - PT_SET_EVENT_MASK with PTRACE_FORK
  • PT_VM_TIMESTAMP - not applicable
  • PT_VM_ENTRY - currently no use-case, there is CTL_VM (vm.proc.map) in sysctl(7)

Presenter Notes

Porting OpenBSD code

Presenter Notes

Currently OpenBSD ptrace(2) is a subset of NetBSD's ptrace(2).

This system has different semantics to iterate over threads: PT_GET_THREAD_FIRST with PT_GET_THREAD_NEXT. On NetBSD one must use PT_LWPINFO.

Presenter Notes