Target audience: system level developers
Author: Kamil Rytarowski
uname: NetBSD 7.99.62 amd64
Date: 26th February 2017
ptrace(2) - process tracing and debugging facility
By using process trace one process can take over another, inspect and manipulate its register context, address space and control flow.
Underneath this interface is using the reparenting mechanism. The tracer is new parent, the tracee is a child and userland processes mostly don't notice it.
This interface is used in BSD operating systems and Linux (kernel) and they are relatively closely related.
Other Operating Systems may use different mechanisms - this includes MacOSX, SunOS family and others. These are out of scope for this talk.
Traditionally BSD 4.4 introduced procfs tracing ability to face performance issues with the original ptrace(2) design. In the past ptrace(2) was limited to peeking and poking data in sizeof(int) chunks between the tracer and its tracee. With the advent of PT_IO that's no longer true. Therefore /proc has no longer added value besides compatibiliy with software.
First appeared in Version 7 AT&T UNIX.
NetBSD
1 int
2 ptrace(int request, pid_t pid, void *addr, int data);
Linux
1 long
2 ptrace(enum __ptrace_request request,
3 pid_t pid, void *addr, void *data);
FreeBSD
1 int
2 ptrace(int request, pid_t pid, caddr_t addr, int data);
OpenBSD
1 int
2 ptrace(int request, pid_t pid, caddr_t addr, int data);
NetBSD
1 #include <sys/types.h>
2 #include <sys/ptrace.h>
Linux
1 #include <sys/ptrace.h>
FreeBSD
1 #include <sys/types.h>
2 #include <sys/ptrace.h>
OpenBSD
1 #include <sys/types.h>
2 #include <sys/ptrace.h>
ptrace(2) is a signal-based monitoring mechanism. Parent intercepts from its tracee signals with a wait(2)-like function or a SIGCHLD signal handler.
ptrace(2) is used to mangle the whole process, unlike Linux where this function is used to manage each thread separately.
ptrace(2) is a nonstandard extension of a POSIX-like system.
Most of the time the tracee works normally without performance overhead, but when it receives a signal it stops.
For most ptrace(2) operations the tracee must be stopped.
A process must have the same real UID as the tracing process, and also it must not be executing a setuid or setgid executable. (If the tracing process is running as root: these restrictions do not apply.)
Three restrictions apply to all tracing processes, even those running as root.
In order to put a trap by a tracer into the tracee's program, the debugger must violate the PaX MPROTECT restrictions.
1 $ sysctl -d security.pax.mprotect.ptrace
2 security.pax.mprotect.ptrace: When enabled, allow ptrace(2) to \
3 override mprotect permissions on traced processes
A debugger can receive additional events - besides regularly intercepted signals:
Ask our parent to trace us:
1 ptrace(PT_TRACE_ME, 0, NULL, 0);
2 /* usually followed by execve(2)-like functions */
3 execve(...); /* when a process is traced this call generates SIGTRAP */
Attach to child or unrelated process:
1 ptrace(PT_ATTACH, pid, NULL, 0);
2 /* process referenced with 'pid' generates SIGSTOP and stops */
Old, broken and/or unportable ways:
exect(3)
BSD4.2 specific solution, no longer works. It turns on single-step mode in the process. Formally marked deprecated in NetBSD 8.0.
clone(3) with flags CLONE_PTRACE and CLONE_UNTRACED
Linux specific. Not implemented nor applicable on NetBSD.
1 int status;
2 pid_t wpid;
3 int rv;
4
5 /* Attach to a process referenced with unique "pid" identity */
6 rv = ptrace(PT_ATTACH, pid, NULL, 0);
7 if (rv == -1)
8 err(EXIT_FAILURE, "ptrace");
9
10 /* Receive SIGSTOP from the new child (after reparenting process) */
11 wpid = waitpid(pid, &status, 0);
12 if (wpid == -1)
13 err(EXIT_FAILURE, "waitpid");
14 if (wpid != pid)
15 errx(EXIT_FAILURE, "unexpected process has been reported");
16
17 /* Sanity check the SIGSTOP event */
18 if (!WIFSTOPPED(status))
19 errx(EXIT_FAILURE, "child not stopped");
20 if (WSTOPSIG(status) != SIGSTOP)
21 errx(EXIT_FAILURE, "child not stopped with SIGSTOP");
1 int status;
2 pid_t wpid;
3 int rv;
4 pid_t child;
5
6 child = fork();
7 if (child < 0)
8 err(EXIT_FAILURE, "fork");
9 if (child == 0) {
10 /* Let our parent trace us */
11 ptrace(PT_TRACE_ME, 0, NULL, 0);
12
13 /*
14 * call exec()-like functions, when traced it will raise SIGTRAP
15 * (execve(2) always triggers a SIGTRAP in this scenario unless
16 * PT_SYSCALL is in use, which blacklists TRAP_EXEC events)
17 */
18 execlp("/bin/echo", "/bin/echo", NULL);
19 }
1 /* PT_TRACE_ME does not generate any signal to its parent*/
2
3 /* Receive SIGTRAP from child after exec() call*/
4 wpid = waitpid(pid, &status, 0);
5 if (wpid == -1)
6 err(EXIT_FAILURE, "waitpid");
7 if (wpid != pid)
8 errx(EXIT_FAILURE, "unexpected process has been reported");
9
10 /* Sanity check the SIGTRAP event */
11 if (!WIFSTOPPED(status))
12 errx(EXIT_FAILURE, "child not stopped");
13 if (WSTOPSIG(status) != SIGTRAP)
14 errx(EXIT_FAILURE, "child not stopped with SIGTRAP");
Read example:
1 errno = 0;
2 var_value = ptrace(PT_READ_D, child, var_address, 0 /* unused */);
3 /* check "errno != 0" */
Write example:
1 int rv;
2 rv = ptrace(PT_WRITE_D, child, var_address, var_value);
3 /* check if rv != 1 */
This API is by design underperformant and should not be used in new code, unless we want to transfer a single integer between the tracer and the tracee.
Today there is no difference between data and instruction space operations, therefore PT_READ_I & PT_READ_D and PT_WRITE_I & PT_WRITE_D are today exactly the same.
This request is a more general interface that can be used instead of PT_READ_D, PT_WRITE_D, PT_READ_I, and PT_WRITE_I. The I/O request is encoded in a struct ptrace_io_desc defined as:
1 struct ptrace_io_desc {
2 int piod_op;
3 void *piod_offs;
4 void *piod_addr;
5 size_t piod_len;
6 };
where piod_offs is the offset within the traced process where the I/O operation should take place, piod_addr is the buffer in the tracing process, and piod_len is the length of the I/O request. The piod_op field specifies which type of I/O operation to perform.
Possible values to read and write regular data from tracee are:
Software breakpoints are set with PIOD_WRITE_I or PT_WRITE_I operations by modifying the instruction segment of the tracee.
Hardware breakpoints are set with Machine Dependent Debug Registers with PT_SETDBREGS.
Hardware watchpoints are set with a Machine Dependent Debug Registers with PT_SETDBREGS.
A debugger can catch and differentiate the following events:
Additionally a tracer can catch a signal emitted to the tracee and the process termination with its exit status.
To differentiate the event that was caught on NetBSD one must call a wait(2)-like function.
Then if a process was stopped with SIGTRAP the appropriate solution to detect the exact event that caused the process to stop is to use the PT_GET_SIGINFO call.
The PT_SET_SIGINFO request can be used to specify signal information emitted to the tracee. This signal information is specified in struct ptrace_siginfo defined as:
1 typedef struct ptrace_siginfo {
2 siginfo_t psi_siginfo;
3 lwpid_t psi_lwpid;
4 } ptrace_siginfo_t;
Where psi_siginfo is set to the signal information structure. The psi_lwpid field describes the LWP address of the signal. Value 0 means the whole process (route signal to all LWPs).
A pointer to this structure is passed in addr. The data argument should be set to sizeof(struct ptrace_siginfo).
In order to pass a faked signal to the tracee, the signal type must match the signal passed to the process with PT_CONTINUE, PT_DETACH.
The PT_GET_SIGINFO request can be used to determine signal information that was received by a debugger. The information is read into the struct ptrace_siginfo pointed to by addr. The data argument should be set to sizeof(struct ptrace_siginfo).
The PT_SET_EVENT_MASK request can be used to specify which events in the traced process should be reported to the tracing process. These events are specified in a struct ptrace_event defined as:
1 typedef struct ptrace_event {
2 int pe_set_event;
3 } ptrace_event_t;
Where pe_set_event is the set of events to be reported. This set is formed by OR'ing together the following values:
The fork(2) and vfork(2) events can occur with similar operations, like clone(2) or posix_spawn(3). The PTRACE_FORK value means that the process gives birth to its child without pending on its termination or execve(2) operation. If enabled, the child is also traced by the debugger and SIGTRAP is generated twice, first for the parent and second for the child. The PTRACE_VFORK event is the same as PTRACE_FORK, but the parent blocks after giving birth to the child. The PTRACE_VFORK_DONE event can be used to report unblocking of the parent.
A pointer to this structure is passed in addr. The data argument should be set to sizeof(struct ptrace_event).
PT_GET_EVENT_MASK can be used to determine which events in the traced process will be reported. The information is read into the struct ptrace_event pointed to by addr. The data argument should be set to sizeof(struct ptrace_event).
PT_GET_PROCESS_STATE reads the state information associated with the event that stopped the traced process. The information is reported in a struct ptrace_state defined as:
1 typedef struct ptrace_state {
2 int pe_report_event;
3 pid_t pe_other_pid;
4 } ptrace_state_t;
A pointer to this structure is passed in addr. The data argument should be set to sizeof(struct ptrace_state).
The PT_LWPINFO call can be used to list the threads (LWPs) in a process.
PT_LWPINFO returns information about a thread from the list of threads for the process specified in the pid argument. The addr argument should contain a struct ptrace_lwpinfo defined as:
1 struct ptrace_lwpinfo {
2 lwpid_t pl_lwpid;
3 int pl_event;
4 };
where pl_lwpid contains a thread LWP ID. Information is returned for the thread following the one with the specified ID in the process thread list, or for the first thread if pl_lwpid is 0. Upon return pl_lwpid contains the LWP ID of the thread that was found, or 0 if there is no thread after the one whose LWP ID was supplied in the call. pl_event contains the event that stopped the thread. Possible values are:
The data argument should contain sizeof(struct ptrace_lwpinfo).
The PT_LWPINFO call on FreeBSD and NetBSD must not be used interchangeably. The FreeBSD call is used to retrieve the event that stopped the process, while the NetBSD version is used to retrieve the list of threads within a process.
The pl_event field in struct ptrace_lwpinfo must not be used to detect if the whole process received a signal or other type of an event (process termination).
Where in FreeBSD PT_LWPINFO shows which event stopped a child, use PT_GET_SIGINFO on NetBSD instead.
To retrieve the thread that was signaled one should use PT_GET_SIGINFO and read the psi_lwpid member of struct ptrace_siginfo.
A user should be warned that PT_LWPINFO is not obligated to retrieve the threads within a process in a specific order with threads identities (lwpid_t) increasing from the lowest to the highest.
To allow a thread to execute one can use PT_RESUME.
To prevent a thread from execution a debugger can call PT_SUSPEND.
Traditionally PT_CONTINUE, PT_SYSCALL and PT_STEP can be used to continue only a specific thread, however new code should be using PT_RESUME and PT_SUSPEND operations as they don't prevent us from emitting a signal to a tracee.
To retrieve the list of suspended threads in a process the PT_LWPINFO call should be used.
The ELF Auxiliary vector of a tracee can be retrieved with PT_IO and option PIOD_READ_AUXV.
There are PT_SYSCALL and PT_SYSCALLEMU operations, both need more work and tests.
The following features have been prepared in the upcoming NetBSD 8.0 release:
Porting Linux ptrace(2) code is complicated as the ptrace(2) calls are used to control threads on Linux and a process on NetBSD.
A debugger for Linux has to perform the job of synchronizing threads - stopping them, terminating, etc in its own code; on NetBSD there is no need for this as a process is a single entity from the ptrace(2) point of view.
Some of the Linux ptrace(2) calls have no direct or indirect equivalents, however these calls are not really applicable on NetBSD.
The only really missing feature on NetBSD is lack of concurrent PT_SYSCALL and PT_STEP execution.
Both systems, FreeBSD and NetBSD, have closely related ptrace(2) operations.
The major difference is that there is no way on NetBSD to detect syscall entry (TRAP_SCE) and exit trap (TRAP_SCX).
Another difference in syntax is that FreeBSD puts event information into PT_LWPINFO, while NetBSD relies on siginfo(2) and other applicable calls like PT_GET_PROCESS_STATE.
Currently OpenBSD ptrace(2) is a subset of NetBSD's ptrace(2).
This system has different semantics to iterate over threads: PT_GET_THREAD_FIRST with PT_GET_THREAD_NEXT. On NetBSD one must use PT_LWPINFO.
Table of Contents | t |
---|---|
Exposé | ESC |
Full screen slides | e |
Presenter View | p |
Source Files | s |
Slide Numbers | n |
Toggle screen blanking | b |
Show/hide slide context | c |
Notes | 2 |
Help | h |