Stabilization of the ptrace(2) threads continued
I have introduced changes to make debuggers more reliable in threaded scenarios. Additionally I have revamped micro-UBSan runtime for newer Clang (version 10git). I have received the OK from core@ to switch our iconv(3) to POSIX conformant iconv(3) and I have adapted where possible and readily known in pkgsrc to the newer API. This month I continued to find a solution to the impasse in LLD that blocks adding NetBSD support.
I have simplified the struct proc and removed a p_oppid
field that stored the numeric process id of the original parent (forker).
This field is not needed as it duplicates p_opptr
(current real parent pointer) that is already safe to
use. So far this has not proven to be unsafe.
I have refactored the signal code making it more verbose to reflect the actual needs of the kernel signal code.
I have fixed a nasty bug in the function that is called when a thread returns from the kernel to userland. There was a tiny time window when in certain scenarios a thread was never stopped on process suspension but was instead resumed causing waitpid(2) polling to never return success as the process can be never stopped with a running thread.
There was a race bug that could cause a nested thread termination call, triggering a panic.
With the above changes I was able to reliably run all ATF tests for LWP events (threading events). I have also bumped the threading tests to atually execute 100 concurrent threads, as the higher number can more easily trigger anomalies. In my observations all tests are now rock solid.
There are now no longer any ptrace(2) tests in ATF marked as flaky or disabled. The two main offenders, vfork(2) events and threading events, are now solid.
Michal Gorny detected another source of instability of threads with a LLDB regression test. It was related to emitting a massive number of concurrent threads. I have helped Michal to address this problem and squash the bug.
All of the above changes are now pulled to NetBSD-9 for future 9.0 release.
There are at the time of writing, 4 failing LLDB threading tests and few more related to debug registers. Both failure types are under investigation. They could be bugs in the NetBSD support in some extent, but maybe there is need to fixup something on the kernel level.
The project is still not 100% accomplished but we are now very close to finishing everything in the domain of threads. I could torture the NetBSD kernel for few hours with a massive number of threads and events without a single crash or failure. On the other hand there are still likely some suspicious corner cases that need proper investigation. There are also some suspicious reports for crashes from syzkaller, the kernel fuzzer. Those still need to be promptly checked.
I have attempted to change our original plan with LLD and instead of mutating the LLD behavior on target basis, write a dedicated LLD wrapper that tunes LLD for NetBSD. My patch is still in review. As an improvement over the previous ones, it wasn't immediately rejected... https://reviews.llvm.org/D69755.
I have upstreamed chunks of code with the following commits:
I have switched the iconv(3) function prototype to POSIX-conformant form. The history of this function is documented in iconv(3) as follows:
STANDARDS iconv_open(), iconv_close(), and iconv() conform to IEEE Std 1003.1-2001 ("POSIX.1"). Historically, the definition of iconv has not been consistent across operating systems. This is due to an unfortunate historical mistake, documented in this e-mail: https://www5.opengroup.org/sophocles2/show_mail.tpl?&source=L&listname=austin-group-l&id=7404. The standards page for the header filedefined the second argument of iconv() as char **, but the standards page for the iconv() implementation defined it as const char **. The standards committee later chose to change the function definition to follow the header file definition (without const), even though the version with const is arguably more correct. NetBSD used initially the const form. It was decided to reject the committee's regression and become (technically) incompatible. This decision was changed in NetBSD 10 and the iconv() prototype was synchronized with the standard.
Meanwhile I fixed what was known to be effected in pkgsrc. Unfortunately Qt4/KDE4 had several build issues and this motivated me to fix its users for the new function through upgrades to the Qt4/KDE5 stack. Many dead packages without upgrade path were dropped from pkgsrc.
As there is a new Clang upgrade coming, I have implemented handlers for new UBSan reports: function_type_mismatch_v1() and implicit_conversion(). The first one is a new ABI for function_type_mismatch() and the second one is completely new.
I took part in the GSoC Mentor Summit in Munich and presented a talk titled "NetBSD version 9. What's new in store?".
Support Michal Gorny in reaching the milestone of passing all threading and debug register tests in LLDB.
The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can: