Porting ptrace(2) software to NetBSD

Presenter Notes

netbsd

Target audience: system level developers

Author: Kamil Rytarowski

uname: NetBSD 7.99.62 amd64

Date: 26th February 2017

Presenter Notes

Agenda

Overview
ptrace(2) principles
How to start to trace a process
Introspecting tracee
Breakpoints and watchpoints
Monitoring events
Managing threads
ELF Auxiliary Vector
Tracing syscalls
News in NetBSD 8.0 (state in progress)
Porting Linux code
Porting FreeBSD code
Porting OpenBSD code

Presenter Notes

Overview

Presenter Notes

ptrace(2) - process tracing and debugging facility

By using process trace one process can take over another, inspect and manipulate its register context, address space and control flow.

Underneath this interface is using the reparenting mechanism. The tracer is new parent, the tracee is a child and userland processes mostly don't notice it.

This interface is used in BSD operating systems and Linux (kernel) and they are relatively closely related.

Other Operating Systems may use different mechanisms - this includes MacOSX, SunOS family and others. These are out of scope for this talk.

Traditionally BSD 4.4 introduced procfs tracing ability to face performance issues with the original ptrace(2) design. In the past ptrace(2) was limited to peeking and poking data in sizeof(int) chunks between the tracer and its tracee. With the advent of PT_IO that's no longer true. Therefore /proc has no longer added value besides compatibiliy with software.

First appeared in Version 7 AT&T UNIX.

Presenter Notes

Function prototypes

NetBSD

1 int
2 ptrace(int request, pid_t pid, void *addr, int data);

Linux

1 long
2 ptrace(enum __ptrace_request request,
3        pid_t pid, void *addr, void *data);

FreeBSD

1 int
2 ptrace(int request, pid_t pid, caddr_t addr, int data);

OpenBSD

1 int
2 ptrace(int request, pid_t pid, caddr_t addr, int data);

Presenter Notes

Included headers

NetBSD

1 #include <sys/types.h>
2 #include <sys/ptrace.h>

Linux

1 #include <sys/ptrace.h>

FreeBSD

1 #include <sys/types.h>
2 #include <sys/ptrace.h>

OpenBSD

1 #include <sys/types.h>
2 #include <sys/ptrace.h>

Presenter Notes

ptrace(2) principles

Presenter Notes

ptrace(2) is a signal-based monitoring mechanism. Parent intercepts from its tracee signals with a wait(2)-like function or a SIGCHLD signal handler.

ptrace(2) is used to mangle the whole process, unlike Linux where this function is used to manage each thread separately.

ptrace(2) is a nonstandard extension of a POSIX-like system.

Most of the time the tracee works normally without performance overhead, but when it receives a signal it stops.

For most ptrace(2) operations the tracee must be stopped.

Presenter Notes

Security

A process must have the same real UID as the tracing process, and also it must not be executing a setuid or setgid executable. (If the tracing process is running as root: these restrictions do not apply.)

Three restrictions apply to all tracing processes, even those running as root.

First, no process may trace a system process.
Second, no process may trace the process running init(8).
Third, if a process has its root directory set with chroot(2), it may not trace another process unless that process' root directory is at or below the tracing process' root.

In order to put a trap by a tracer into the tracee's program, the debugger must violate the PaX MPROTECT restrictions.

1 $ sysctl -d security.pax.mprotect.ptrace
2 security.pax.mprotect.ptrace: When enabled, allow ptrace(2) to \
3 override mprotect permissions on traced processes

Presenter Notes

Events

A debugger can receive additional events - besides regularly intercepted signals:

tracee termination,
LWP events (thread creation and termination)
child events (process fork(2), vfork(2) and vfork(2) done),
software breakpoint
hardware assisted breakpoint or watchpoint (debug register based)
single step trap
execve(2) trap - transforming the calling process into a new program
system call trap (on syscall entry and on syscall exit)

Presenter Notes

How to start to trace a process

Presenter Notes

Good solution

Ask our parent to trace us:

1 ptrace(PT_TRACE_ME, 0, NULL, 0);
2 /* usually followed by execve(2)-like functions */
3 execve(...); /* when a process is traced this call generates SIGTRAP */

Attach to child or unrelated process:

1 ptrace(PT_ATTACH, pid, NULL, 0);
2 /* process referenced with 'pid' generates SIGSTOP and stops */

Presenter Notes

Bad solution

Old, broken and/or unportable ways:

exect(3)

BSD4.2 specific solution, no longer works. It turns on single-step mode in the process. Formally marked deprecated in NetBSD 8.0.
clone(3) with flags CLONE_PTRACE and CLONE_UNTRACED

Linux specific. Not implemented nor applicable on NetBSD.

Presenter Notes

Example 1 - PT_ATTACH

 1 int status;
 2 pid_t wpid;
 3 int rv;
 4 
 5 /* Attach to a process referenced with unique "pid" identity */
 6 rv = ptrace(PT_ATTACH, pid, NULL, 0);
 7 if (rv == -1)
 8     err(EXIT_FAILURE, "ptrace");
 9 
10 /* Receive SIGSTOP from the new child (after reparenting process) */
11 wpid = waitpid(pid, &status, 0);
12 if (wpid == -1)
13     err(EXIT_FAILURE, "waitpid");
14 if (wpid != pid)
15     errx(EXIT_FAILURE, "unexpected process has been reported");
16 
17 /* Sanity check the SIGSTOP event */
18 if (!WIFSTOPPED(status))
19     errx(EXIT_FAILURE, "child not stopped");
20 if (WSTOPSIG(status) != SIGSTOP)
21     errx(EXIT_FAILURE, "child not stopped with SIGSTOP");

Presenter Notes

Example 2 - PT_TRACE_ME (1/2)

 1 int status;
 2 pid_t wpid;
 3 int rv;
 4 pid_t child;
 5 
 6 child = fork();
 7 if (child < 0)
 8     err(EXIT_FAILURE, "fork");
 9 if (child == 0) {
10     /* Let our parent trace us  */
11     ptrace(PT_TRACE_ME, 0, NULL, 0);
12 
13     /*
14      * call exec()-like functions, when traced it will raise SIGTRAP
15      * (execve(2) always triggers a SIGTRAP in this scenario unless
16      * PT_SYSCALL is in use, which blacklists TRAP_EXEC events)
17      */
18     execlp("/bin/echo", "/bin/echo", NULL);
19 }

Presenter Notes

Example 2 - PT_TRACE_ME (2/2)

 1 /* PT_TRACE_ME does not generate any signal to its parent*/
 2 
 3 /* Receive SIGTRAP from child after exec() call*/
 4 wpid = waitpid(pid, &status, 0);
 5 if (wpid == -1)
 6     err(EXIT_FAILURE, "waitpid");
 7 if (wpid != pid)
 8     errx(EXIT_FAILURE, "unexpected process has been reported");
 9 
10 /* Sanity check the SIGTRAP event */
11 if (!WIFSTOPPED(status))
12     errx(EXIT_FAILURE, "child not stopped");
13 if (WSTOPSIG(status) != SIGTRAP)
14     errx(EXIT_FAILURE, "child not stopped with SIGTRAP");

Presenter Notes

Introspecting tracee

Presenter Notes

Old solution

PT_READ_I, PT_READ_D - These requests read a single int of data from the traced process' address space.
PT_WRITE_I, PT_WRITE_D - These requests parallel PT_READ_I and PT_READ_D, except that they write rather than read.

Read example:

1 errno = 0;
2 var_value = ptrace(PT_READ_D, child, var_address, 0 /* unused */);
3 /* check "errno != 0" */

Write example:

1 int rv;
2 rv = ptrace(PT_WRITE_D, child, var_address, var_value);
3 /* check if rv != 1 */

This API is by design underperformant and should not be used in new code, unless we want to transfer a single integer between the tracer and the tracee.

Today there is no difference between data and instruction space operations, therefore PT_READ_I & PT_READ_D and PT_WRITE_I & PT_WRITE_D are today exactly the same.

Presenter Notes

New solution PT_IO

This request is a more general interface that can be used instead of PT_READ_D, PT_WRITE_D, PT_READ_I, and PT_WRITE_I. The I/O request is encoded in a struct ptrace_io_desc defined as:

1 struct ptrace_io_desc {
2     int     piod_op;
3     void    *piod_offs;
4     void    *piod_addr;
5     size_t  piod_len;
6 };

where piod_offs is the offset within the traced process where the I/O operation should take place, piod_addr is the buffer in the tracing process, and piod_len is the length of the I/O request. The piod_op field specifies which type of I/O operation to perform.

Possible values to read and write regular data from tracee are:

PIOD_READ_D (replaces PT_READ_D)
PIOD_WRITE_D (replaces PT_WRITE_D)
PIOD_READ_I (replaces PT_READ_I)
PIOD_WRITE_I (replaces PT_WRITE_I)

Presenter Notes

Breakpoints and watchpoints

Presenter Notes

Breakpoints

Software breakpoints are set with PIOD_WRITE_I or PT_WRITE_I operations by modifying the instruction segment of the tracee.

Hardware breakpoints are set with Machine Dependent Debug Registers with PT_SETDBREGS.

Presenter Notes

Watchpoints

Hardware watchpoints are set with a Machine Dependent Debug Registers with PT_SETDBREGS.

Presenter Notes

Monitoring events

Presenter Notes

Debugger Monitor

A debugger can catch and differentiate the following events:

Process software breakpoint - SIGTRAP with si_code TRAP_BRKPT
Process child trap (fork(2), vfork(2), vfork(2) done) - SIGTRAP with si_code TRAP_CHLD
Process hardware debug register trap - SIGTRAP with si_code TRAP_DBREG
Process exec trap - SIGTRAP with si_code TRAP_EXEC
Process LWP trap (thread creation and termination) - SIGTRAP with si_code TRAP_LWP
Process trace trap (single step trap) - SIGTRAP with si_code TRAP_TRACE
System call entry - SIGTRAP (planned si_code TRAP_SCE)
System call exit - SIGTRAP (planned si_code TRAP_SCX)

Additionally a tracer can catch a signal emitted to the tracee and the process termination with its exit status.

Presenter Notes

Debugger Monitor - siginfo(2) (1/2)

To differentiate the event that was caught on NetBSD one must call a wait(2)-like function.

Then if a process was stopped with SIGTRAP the appropriate solution to detect the exact event that caused the process to stop is to use the PT_GET_SIGINFO call.

The PT_SET_SIGINFO request can be used to specify signal information emitted to the tracee. This signal information is specified in struct ptrace_siginfo defined as:

1 typedef struct ptrace_siginfo {
2     siginfo_t       psi_siginfo;
3     lwpid_t         psi_lwpid;
4 } ptrace_siginfo_t;

Where psi_siginfo is set to the signal information structure. The psi_lwpid field describes the LWP address of the signal. Value 0 means the whole process (route signal to all LWPs).

A pointer to this structure is passed in addr. The data argument should be set to sizeof(struct ptrace_siginfo).

Presenter Notes

Debugger Monitor - siginfo(2) (2/2)

In order to pass a faked signal to the tracee, the signal type must match the signal passed to the process with PT_CONTINUE, PT_DETACH.

The PT_GET_SIGINFO request can be used to determine signal information that was received by a debugger. The information is read into the struct ptrace_siginfo pointed to by addr. The data argument should be set to sizeof(struct ptrace_siginfo).

Presenter Notes

Debugger Monitor - EVENT_MASK (1/3)

The PT_SET_EVENT_MASK request can be used to specify which events in the traced process should be reported to the tracing process. These events are specified in a struct ptrace_event defined as:

1 typedef struct ptrace_event {
2     int     pe_set_event;
3 } ptrace_event_t;

Where pe_set_event is the set of events to be reported. This set is formed by OR'ing together the following values:

PTRACE_FORK - Report fork(2).
PTRACE_VFORK - Report vfork(2).
PTRACE_VFORK_DONE - Report parent resumed after vfork(2).
PTRACE_LWP_CREATE - Report thread birth.
PTRACE_LWP_EXIT - Report thread termination.

Presenter Notes

Debugger Monitor - EVENT_MASK (2/3)

The fork(2) and vfork(2) events can occur with similar operations, like clone(2) or posix_spawn(3). The PTRACE_FORK value means that the process gives birth to its child without pending on its termination or execve(2) operation. If enabled, the child is also traced by the debugger and SIGTRAP is generated twice, first for the parent and second for the child. The PTRACE_VFORK event is the same as PTRACE_FORK, but the parent blocks after giving birth to the child. The PTRACE_VFORK_DONE event can be used to report unblocking of the parent.

A pointer to this structure is passed in addr. The data argument should be set to sizeof(struct ptrace_event).

Presenter Notes

Debugger Monitor - EVENT_MASK (3/3)

PT_GET_EVENT_MASK can be used to determine which events in the traced process will be reported. The information is read into the struct ptrace_event pointed to by addr. The data argument should be set to sizeof(struct ptrace_event).

PT_GET_PROCESS_STATE reads the state information associated with the event that stopped the traced process. The information is reported in a struct ptrace_state defined as:

1 typedef struct ptrace_state {
2     int     pe_report_event;
3     pid_t   pe_other_pid;
4 } ptrace_state_t;

A pointer to this structure is passed in addr. The data argument should be set to sizeof(struct ptrace_state).

Presenter Notes

Managing threads

Presenter Notes

Listing threads

The PT_LWPINFO call can be used to list the threads (LWPs) in a process.

PT_LWPINFO returns information about a thread from the list of threads for the process specified in the pid argument. The addr argument should contain a struct ptrace_lwpinfo defined as:

1 struct ptrace_lwpinfo {
2     lwpid_t pl_lwpid;
3     int pl_event;
4 };

where pl_lwpid contains a thread LWP ID. Information is returned for the thread following the one with the specified ID in the process thread list, or for the first thread if pl_lwpid is 0. Upon return pl_lwpid contains the LWP ID of the thread that was found, or 0 if there is no thread after the one whose LWP ID was supplied in the call. pl_event contains the event that stopped the thread. Possible values are:

PL_EVENT_NONE
PL_EVENT_SIGNAL
PL_EVENT_SUSPENDED

The data argument should contain sizeof(struct ptrace_lwpinfo).

Presenter Notes

Listing threads nits

The PT_LWPINFO call on FreeBSD and NetBSD must not be used interchangeably. The FreeBSD call is used to retrieve the event that stopped the process, while the NetBSD version is used to retrieve the list of threads within a process.

The pl_event field in struct ptrace_lwpinfo must not be used to detect if the whole process received a signal or other type of an event (process termination).

Where in FreeBSD PT_LWPINFO shows which event stopped a child, use PT_GET_SIGINFO on NetBSD instead.

To retrieve the thread that was signaled one should use PT_GET_SIGINFO and read the psi_lwpid member of struct ptrace_siginfo.

A user should be warned that PT_LWPINFO is not obligated to retrieve the threads within a process in a specific order with threads identities (lwpid_t) increasing from the lowest to the highest.

Presenter Notes

Resume and suspend a thread

To allow a thread to execute one can use PT_RESUME.

To prevent a thread from execution a debugger can call PT_SUSPEND.

Traditionally PT_CONTINUE, PT_SYSCALL and PT_STEP can be used to continue only a specific thread, however new code should be using PT_RESUME and PT_SUSPEND operations as they don't prevent us from emitting a signal to a tracee.

To retrieve the list of suspended threads in a process the PT_LWPINFO call should be used.

Presenter Notes

ELF Auxiliary Vector

Presenter Notes

The ELF Auxiliary vector of a tracee can be retrieved with PT_IO and option PIOD_READ_AUXV.

Presenter Notes

Tracing syscalls

Presenter Notes

There are PT_SYSCALL and PT_SYSCALLEMU operations, both need more work and tests.

Presenter Notes

Changes in NetBSD 8.0 (in progress)

Presenter Notes

ptrace(2) news in NetBSD 8.0

The following features have been prepared in the upcoming NetBSD 8.0 release:

The pl_event member of struct ptrace_lwpinfo can take the new value PL_EVENT_SIGNAL
PT_SET_EVENT_MASK in pe_set_event of struct ptrace_event has new members: PTRACE_VFORK, PTRACE_VFORK_DONE, PTRACE_LWP_CREATE, PTRACE_LWP_EXIT
PT_SET_SIGINFO and PT_GET_SIGINFO
PT_SET_SIGMASK and PT_GET_SIGMASK
PT_GETDBREGS and PT_SETDBREGS on amd64 and i386
PT_RESUME and PT_SUSPEND
new siginfo(2) codes: TRAP_EXEC, TRAP_CHLD, TRAP_DBREG, TRAP_LWP

Presenter Notes

Porting Linux code

Presenter Notes

Porting Linux ptrace(2) code is complicated as the ptrace(2) calls are used to control threads on Linux and a process on NetBSD.

A debugger for Linux has to perform the job of synchronizing threads - stopping them, terminating, etc in its own code; on NetBSD there is no need for this as a process is a single entity from the ptrace(2) point of view.

Some of the Linux ptrace(2) calls have no direct or indirect equivalents, however these calls are not really applicable on NetBSD.

The only really missing feature on NetBSD is lack of concurrent PT_SYSCALL and PT_STEP execution.

Presenter Notes

PTRACE_TRACEME -> PT_TRACE_ME
PTRACE_PEEKTEXT and PTRACE_PEEKDATA -> PT_READ_I and PT_READ_D
PTRACE_PEEKUSER - there is no direct equivalent on NetBSD, user data on NetBSD might be consisted of mcontext (general purpose registers, floating point registers and Thread Local Storage) and debug registers
PTRACE_POKETEXT, PTRACE_POKEDATA -> PT_WRITE_I and PT_WRITE_D
PTRACE_GETREGS -> PT_GETREGS
PTRACE_GETFPREGS -> PT_GETFPREGS
PTRACE_GETREGSET - there is no direct equivalent on NetBSD, use appropriate PT_GET*REGS
PTRACE_SETREGS -> PT_SETREGS
PTRACE_SETFPREGS -> PT_SETFPREGS
PTRACE_SETREGSET - there is no direct equivalent on NetBSD, use appropriate PT_SET*REGS

Presenter Notes

PTRACE_GETSIGINFO -> PT_GET_SIGINFO
PTRACE_SETSIGINFO -> PT_SET_SIGINFO
PTRACE_PEEKSIGINFO - there is no equivalent on NetBSD, not really applicable
PTRACE_GETSIGMASK -> PT_GET_SIGMASK
PTRACE_SETSIGMASK -> PT_SET_SIGMASK
PTRACE_SETOPTIONS -> PT_SET_EVENT_MASK
PTRACE_SETOPTIONS option PTRACE_O_EXITKILL - there is no direct equivalent on NetBSD, a tracer should just call PT_KILL before termination
PTRACE_SETOPTIONS option PTRACE_O_TRACECLONE - PTRACE_FORK and/or PTRACE_VFORK, clone(2) works like fork(2) (parent not waiting on child) or vfork(2) (parent waiting on child)
PTRACE_SETOPTIONS option PTRACE_O_TRACEEXEC - nothing is needed, on NetBSD SIGTRAP is always triggered unless traced with PT_SYSCALL

Presenter Notes

PTRACE_SETOPTIONS option PTRACE_O_TRACEEXIT - not applicable on NetBSD, if really needed there is sysctl(7) proc.$PID.stopexit
PTRACE_SETOPTIONS option PTRACE_O_TRACEFORK - PTRACE_FORK
PTRACE_SETOPTIONS option PTRACE_O_TRACESYSGOOD - not applicable on NetBSD, to trace syscalls use PT_SYSCALL
PTRACE_SETOPTIONS option PTRACE_O_TRACEVFORK - PTRACE_VFORK
PTRACE_SETOPTIONS option PTRACE_O_TRACEVFORKDONE - PTRACE_VFORK_DONE
PTRACE_GETEVENTMSG -> mostly PT_GET_PROCESS_STATE
PTRACE_CONT -> PT_CONTINUE
PTRACE_SYSCALL -> PT_SYSCALL

Presenter Notes

PTRACE_SINGLESTEP -> PT_STEP
PTRACE_SYSEMU -> PT_SYSCALLEMU
PTRACE_SYSEMU_SINGLESTEP - not supported
PTRACE_LISTEN - not supported nor applicable
PTRACE_KILL -> PT_KILL
PTRACE_INTERRUPT - not supported nor applicable
PTRACE_ATTACH -> PT_ATTACH
PTRACE_SEIZE - not supported nor applicable
PTRACE_DETACH -> PT_DETACH

Presenter Notes

Porting FreeBSD code

Presenter Notes

Both systems, FreeBSD and NetBSD, have closely related ptrace(2) operations.

The major difference is that there is no way on NetBSD to detect syscall entry (TRAP_SCE) and exit trap (TRAP_SCX).

Another difference in syntax is that FreeBSD puts event information into PT_LWPINFO, while NetBSD relies on siginfo(2) and other applicable calls like PT_GET_PROCESS_STATE.

Presenter Notes

ELF Auxiliary Vector is stored in the kernel on FreeBSD, and retrievable with PT_IO on NetBSD
PT_LWPINFO completely different purposes on NetBSD (list threads) and FreeBSD (detect event that interrupted the tracee)
PT_LWPINFO pl_flags PL_FLAG_SCE - currently unsupported, as it is requires the TRAP_SCE siginfo(2) value
PT_LWPINFO pl_flags PL_FLAG_SCX - currently unsupported, as it is requires the TRAP_SCX siginfo(2) value
PT_LWPINFO pl_flags PL_FLAG_EXEC - TRAP_EXEC
PT_LWPINFO pl_flags PL_FLAG_SI - the PT_GET_SIGINFO call is always available
PT_LWPINFO pl_flags PL_FLAG_FORKED - TRAP_CHLD and PT_GET_PROCESS_STATE with PTRACE_FORK
PT_LWPINFO pl_flags PL_FLAG_CHILD - TRAP_CHLD and PT_GET_PROCESS_STATE with PTRACE_FORK

Presenter Notes

PT_LWPINFO pl_sigmask - PT_GET_SIGMASK
PT_LWPINFO pl_siglist - currently no use-case, not needed, seems Linux specific clone
PT_LWPINFO pl_siginfo - PT_GET_SIGINFO
PT_LWPINFO pl_tdname - currently not needed, this value can be retrieved with sysctl(7) if really needed
PT_LWPINFO pl_child_pid - TRAP_CHLD and PT_GET_PROCESS_STATE with PTRACE_FORK
PT_LWPINFO pl_syscall_code - not applicable, debugger is supposed to use PT_GETREGS
PT_LWPINFO pl_syscall_narg - not applicable, debugger is supposed to use PT_GETREGS
PT_GETNUMLWPS - PT_LWPINFO and iterate over all threads or use sysctl(7)
PT_GETLWPLIST - PT_LWPINFO

Presenter Notes

PT_SETSTEP - not applicable, use PT_STEP
PT_CLEARSTEP - not applicable, do not use PT_STEP
PT_TO_SCE - not applicable, use PT_SYSCALL
PT_TO_SCX - not applicable, use PT_SYSCALL
PT_FOLLOW_FORK - PT_SET_EVENT_MASK with PTRACE_FORK
PT_VM_TIMESTAMP - not applicable
PT_VM_ENTRY - currently no use-case, there is CTL_VM (vm.proc.map) in sysctl(7)

Presenter Notes

Porting OpenBSD code

Presenter Notes

Currently OpenBSD ptrace(2) is a subset of NetBSD's ptrace(2).

This system has different semantics to iterate over threads: PT_GET_THREAD_FIRST with PT_GET_THREAD_NEXT. On NetBSD one must use PT_LWPINFO.

Table of Contents	t
Exposé	ESC
Full screen slides	e
Presenter View	p
Source Files	s
Slide Numbers	n
Toggle screen blanking	b
Show/hide slide context	c
Notes	2
Help	h

Table of Contents

Help