Approaching the end of work on ptrace(2)
This is one of my last reports on enhancements on ptrace(2)
and the surrounding code.
This month I complete a set of older pending tasks.
I've documented dlinfo(3)
and added missing cross-reference from the dlfcn(3)
API.
dlinfo(3)
is a Solaris-style and first appeared in NetBSD 5.1.
Today this API is also implemented in Linux and FreeBSD, making it the right portable tool for certain operations.
Today we support only the RTLD_DI_LINKMAP
operation out of few functionalities.
RTLD_DI_LINKMAP
translates the dlopen(3)
opaque handler to struct link_map
.
This is the exact functionality needed in the LLVM sanitizers. So far we have been using manual extraction of the struct's field with offset over an opaque pointer. Unfortunately the existing algorithm was prone to internal modifications of the ELF loader and it used to break few times a year.
#define _GET_LINK_MAP_BY_DLOPEN_HANDLE(handle, shift) \ ((link_map *)((handle) == nullptr ? nullptr : ((char *)(handle) + (shift)))) #if defined(__x86_64__) #define GET_LINK_MAP_BY_DLOPEN_HANDLE(handle) \ _GET_LINK_MAP_BY_DLOPEN_HANDLE(handle, 264) #elif defined(__i386__) #define GET_LINK_MAP_BY_DLOPEN_HANDLE(handle) \ _GET_LINK_MAP_BY_DLOPEN_HANDLE(handle, 136) #endif
I was pondering how to implement a dedicated interface until I found this interface in rumpkernels! Antti Kantee, a NetBSD developer and author of rumpkernels, implemented this feature but left it undocumented. I quickly scanned this interface, picked the man page from FreeBSD, adapted for NetBSD and pushed to LLVM. The FreeBSD Operating system support in sanitizers also suffered from the offset struct changes and with collaboration with me from the NetBSD side, we switched the appropriate code to a dynamic value. This also improves compatibility with all NetBSD ports that could potentially support the LLVM sanitizers, as previously the offsets were calculated only for x86 platforms (i386, amd64).
ktruss(1)
enhancements
I've switched ktruss(1)
to ptrace descriptive operation names.
Previously ptrace(2)
requests were encoded with a magic value like 0x10 or 0x14.
This style of encoding fields is not human-friendly and required checking appropriate system headers.
Now the output looks like this (reduced to ptrace syscalls only):
3902 1 t_ptrace_waitpid ptrace(PT_ATTACH, 0xfd7, 0, 0) = 0 3902 1 t_ptrace_waitpid ptrace(PT_GET_SIGINFO, 0xfd7, 0x7f7fff51a4f0, 0x88) = 0 3902 1 t_ptrace_waitpid ptrace(PT_LWPINFO, 0xfd7, 0x7f7fff51a4c8, 0x8) = 0 3902 1 t_ptrace_waitpid ptrace(PT_LWPINFO, 0xfd7, 0x7f7fff51a4c8, 0x8) = 0 3902 1 t_ptrace_waitpid ptrace(PT_CONTINUE, 0xfd7, 0x1, 0) = 0 3902 1 t_ptrace_waitpid ptrace(PT_GET_SIGINFO, 0xfd7, 0x7f7fff51a4f0, 0x88) = 0 3902 1 t_ptrace_waitpid ptrace(PT_LWPINFO, 0xfd7, 0x7f7fff51a4c8, 0x8) = 0 3902 1 t_ptrace_waitpid ptrace(PT_LWPINFO, 0xfd7, 0x7f7fff51a4c8, 0x8) = 0 3902 1 t_ptrace_waitpid ptrace(PT_LWPINFO, 0xfd7, 0x7f7fff51a4c8, 0x8) = 0 3902 1 t_ptrace_waitpid ptrace(PT_LWPINFO, 0xfd7, 0x7f7fff51a4c8, 0x8) = 0 3902 1 t_ptrace_waitpid ptrace(PT_LWPINFO, 0xfd7, 0x7f7fff51a4c8, 0x8) = 0 3902 1 t_ptrace_waitpid ptrace(PT_CONTINUE, 0xfd7, 0x1, 0x9) = 0 5449 1 t_ptrace_waitpid ptrace(PT_TRACE_ME, 0, 0, 0) = 0 6086 1 t_ptrace_waitpid ptrace(PT_GET_SIGINFO, 0x1549, 0x7f7fffa7c230, 0x88) = 0 6086 1 t_ptrace_waitpid ptrace(PT_IO, 0x1549, 0x7f7fffa7c210, 0x20) = 0
I have adjusted the error return value of pthread_sigmask(3)
whenever libpthread
is either not loaded or initialized.
Now instead of returning -1
, return errno
on error.
This caught up after the fix in libpthread
by Andrew Doran in 2008 in lib/libpthread/pthread_misc.c
r.1.9.
It remains an open question whether this function shall be used without linked
in the POSIX thread library.
This incompatibility was detected by Bruno Haible (GNU) and documented in gnulib in commit
pthread_sigmask: Avoid test failure on NetBSD 8.0.
r. 4d16a83b0c1fcb6c.
compat_netbsd32(8)
There was a longstanding issue with incompatibility between few syscalls in the native NetBSD ABI and 32-bit compat one. I've addressed this with the following change:
commit 2e2cec309dca0021e364d52597388665c500d66e Author: kamilDate: Sat Jan 18 07:33:24 2020 +0000 Catch up after getpid/getgid/getuid changes in native ABI in 2008 getpid(), getuid() and getgid() used to call respectively sys_getpid(), sys_getuid() and sys_getgid(). In the BSD4.3 compat mode there was a fallback to call sys_getpid_with_ppid() and related functions. In 2008 the compat ifdef was removed in sys/kern/syscalls.master r. 1.216. For purity reasons we probably shall restore the NetBSD original behavior and implement BSD4.3 one as a compat module, however it is not worth the complexity. Align the netbsd32 compat ABI to native ABI and call functions that return two integers as in BSD4.3.
I have finished importing the OpenBSD test (regress/sys/ptrace/ptrace.c
) for unaligned program counter register.
For a long time it was cryptic what the original intention was to test,
whether it was some concept of what values should be rejected in the API for purity reasons,
whether we were testing SIGILL
, whether there was a kernel problem or something else.
Finally, I reached the original committer (Miod Vallat) and he pointed out to me that there was a sparc/OpenBSD bug that could break the kernel.
https://marc.info/?l=openbsd-bugs&m=107558043319084&w=2
Instead of hardcoding magic program counter register on a per-cpu basis, I added tests using 3 types of them for all CPUs. At the end of the day these tests crashed the NetBSD kernel for at least a single port.
ptrace(2)
ATF tests (FreeBSD)
Scanning the ptrace(2)
scenarios in FreeBSD, I decided to expand the ATF framework for
unrelated tracer variation (as opposed to real parent = debugger) for tests: fork1-16
,
vfork1-16
, posix_spawn1-16
.
New tests were also introduced for:
posix_spawn_detach_spawner
, fork_detach_forker
, vfork_detach_vforkerdone
,
posix_spawn_kill_spawner
, fork_kill_forker
, vfork_kill_vforker
, vfork_kill_vforkerdone
.
libpthread(3)
enhancements
I have added missing sanity checks in the libpthread(3)
API functions.
This means that now on entry to calls like pthread_mutex_lock(3)
we will first check whether the passed object
is a valid and alive mutex.
I have retired conditional (preprocessor variable) checks in API families (rwlock, spin-lock) as they were always enabled.
This behavior matches the Linux implementation and is assumed in LLVM sanitizers.
I also found out that these semantics are expected by third-party code, especially WolfSSL.
Our current sanity checking version of pthread_equal(3)
found at least one abuse of the interface in the NSPR package
and its downstream users: Firefox, Thunderbird and Seamonkey.
For the time being a workaround has been applied and the bug is scheduled for proper investigation and fix.
While working on this. I found that the jemalloc (system's malloc) initialization is misused and we initialize a jemalloc's mutex with uninitialized data. A workaround has been applied by Christos Zoulas but we are still looking for proper initialization model of our libc, libpthread and malloc code. There is a problem with mutual dependencies between the components and we are looking for a clean solution.
env(1)
and putenv(3)
The NetBSD project maintains a LLVM buildbot in the LLVM buildfarm.
We build and execute tests for virtually every LLVM project which we find important
(llvm
, clang
, lldb
, lld
, compiler-rt
,
polly
, openmp
, libc++
, libc++abi
etc.)
A frequent issue is that LLVM developers use constructs that are portable only to a selection of OSs that are worth supporting in the tests.
One such incompatibility introduced from time to time is env(1)
with the -u
option.
This option is an extension to POSIX, but supported by FreeBSD, Linux and Darwin. It unsets a variable in the environment.
It was a constant cost to see this regressing over and over again on the NetBSD buildbot and
I decided to just implement it natively.
While there, I have implemented another popular option: -0
.
This switch ends each output line with NUL
, not newline.
While investigating the env(1)
differences between Operating Systems,
I have learnt that putenv(3)
changed its behavior.
This function call originally allocated its content internally, however it was changed in future versions as requested by POSIX
to not allocate memory.
I have audited the base system to see whether we obey these semantics everywhere and detected a use-after-free bug in the PAM
(Pluggable Authentication Modules framework)!
I have fixed the problem with a potential security implication.
commit 5c19ff198d69f48222ab4907235addbfa60d4e2a Author: kamilDate: Sat Feb 8 13:44:35 2020 +0000 Avoid use-after-free bug in PAM environment Traditional BSD putenv(3) was creating an internal copy of the passed argument. Unfortunately this was causing memory leaks and was changed by POSIX to not allocate. Adapt the putenv(3) usage to modern POSIX (and NetBSD) semantics. diff --git a/usr.bin/login/login_pam.c b/usr.bin/login/login_pam.c index 66303006b514..7d877ebeb1c0 100644 --- a/usr.bin/login/login_pam.c +++ b/usr.bin/login/login_pam.c @@ -1,6 +1,6 @@ -/* $NetBSD: login_pam.c,v 1.25 2015/10/29 11:31:52 shm Exp $ */ +/* $NetBSD: login_pam.c,v 1.26 2020/02/08 13:44:35 kamil Exp $ */ /*- * Copyright (c) 1980, 1987, 1988, 1991, 1993, 1994 * The Regents of the University of California. All rights reserved. * @@ -37,11 +37,11 @@ __COPYRIGHT("@(#) Copyright (c) 1980, 1987, 1988, 1991, 1993, 1994\ #ifndef lint #if 0 static char sccsid[] = "@(#)login.c 8.4 (Berkeley) 4/2/94"; #endif -__RCSID("$NetBSD: login_pam.c,v 1.25 2015/10/29 11:31:52 shm Exp $"); +__RCSID("$NetBSD: login_pam.c,v 1.26 2020/02/08 13:44:35 kamil Exp $"); #endif /* not lint */ /* * login [ name ] * login -h hostname (for telnetd, etc.) @@ -600,12 +600,12 @@ skip_auth: */ if ((pamenv = pam_getenvlist(pamh)) != NULL) { char **envitem; for (envitem = pamenv; *envitem; envitem++) { - putenv(*envitem); - free(*envitem); + if (putenv(*envitem) == -1) + free(*envitem); } free(pamenv); }
I have fixed LLVM sanitizers after recent basesystem changes and they are again functional.
There was a need to adapt the code for urio(4)
device driver removal
(USB driver for the Diamond Multimedia Rio500 MP3 player).
With the clang version 9 we were also required to adapt paths for installation of the sanitizers.
I have fixed a race in the ptrace(2)
ATF test resume1
.
There was a bug when two events were triggered concurrently from two competing threads.
This bug appeared after recent changes by Andrew Doran who refactored the internals and so the race was now more frequent
than before.
I am working with students and volunteers who are learning how to use sanitizers in the NetBSD context.
Over the past month we fixed MKLIBCSANITIZER
build with recent GCC 8.x and addressed a number of Undefined Behavior reports in the system.
I have helped to upstream chunks of the NetBSD code to GDB and GCC. I'm working on upstreaming NVMM support to qemu. The patchset is still in review waiting for more feedback before the final merge.
I have imported realpah(1)
utility from FreeBSD as it is used sometimes by existing scripts (even if there are more portable alternatives).
Port remaining ptrace(2)
test scenarios from Linux and FreeBSD to ATF and ensure that they are properly operational.
The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can: