NetBSD Virtual Machine Monitor

Presenter Notes

netbsd

bhyvecon Ottawa 2019 - The BSD Hypervisor Conference

Speaker: Kamil Rytarowski (not the NVMM author!)

E-mail: kamil@netbsd.org

Date: May 14th, 2019

Location: Ottawa, Canada

Presenter Notes

Bio

Kamil Rytarowski (born 1987)

Krakow, Poland

NetBSD user since 6.1.

NetBSD Foundation member since 2015.

Work areas: kernel, userland, pkgsrc.

Interest: NetBSD on desktop and in particular NetBSD as a workstation.

Current activity in 3rd party software:

  • LLVM committer.
  • GDB & binutils committer.
  • NetBSD maintenance in qemu.

Presenter Notes

Topics

  • Overview of NVMM
  • The libnvmm(3) usage
  • xhyve/NVMM research idea
  • qemu/NVMM

Presenter Notes

Overview

Presenter Notes

NetBSD Virtual Machine Monitor

NVMM

Author: Maxime Villard (maxv @ NetBSD org)

Maxime is an x86 and RISCV maintainer in NetBSD.

Maxime is the author of notable projects such as Kernel ASan and Kernel ASLR (and many others).

Presenter Notes

NetBSD Virtual Machine Monitor

Quick characteristics:

  • Comprehensive virtualization solution for NetBSD
  • Loadable kernel module
  • AMD64 and Intel 64-bit support
  • Fully MP-safe, fine grained locking, up to 128 VMs with 256 VCPUs & 128 GB RAM each
  • Offers C API through libnvmm(3)
  • Instruction simulator runs in userspace
  • Needs recent NetBSD snapshot and modified Qemu (pkgsrc-wip/qemu-nvmm)

Presenter Notes

NetBSD Virtual Machine Monitor

NVMM is divided into two general parts:

  • kernel part - device driver and ioctl(2) interface, hardware specific backend
  • userland part - libnvmm, (mostly) machine independent API

I will pick the following definitions:

  • NVMM backend - the kernel part of NVMM
  • NVMM frontend - the userland part of NVMM
  • emulator - software set of emulated devices and user interaction controllers for creating and running Virtual Machines

Presenter Notes

NetBSD Virtual Machine Monitor

NVMM does not implement an emulator.

NVMM is suitable for existing emulators such as Qemu, VirtualBox, Android Studio or xhyve.

This makes NVMM similar to:

  • HAXM (NetBSD, Linux, Windows, macOS)
  • KVM (Linux)
  • HVF (macOS)
  • WHPX (Windows)

Presenter Notes

NVMM backend

NVMM

Presenter Notes

NVMM backend

 1 /usr/src/sys/dev/nvmm
 2 |-- Makefile
 3 |-- files.nvmm
 4 |-- nvmm.c
 5 |-- nvmm.h
 6 |-- nvmm_internal.h
 7 |-- nvmm_ioctl.h
 8 `-- x86
 9     |-- Makefile
10     |-- nvmm_x86.c
11     |-- nvmm_x86.h
12     |-- nvmm_x86_svm.c
13     |-- nvmm_x86_svmfunc.S
14     |-- nvmm_x86_vmx.c
15     `-- nvmm_x86_vmxfunc.S
16 
17 1 directory, 13 files

Presenter Notes

NVMM backend

 1 $ wc -l * x86/*
 2        1 CVS
 3       13 Makefile
 4       14 files.nvmm
 5     1151 nvmm.c
 6      129 nvmm.h
 7      121 nvmm_internal.h
 8      148 nvmm_ioctl.h
 9        2 x86
10        1 x86/CVS
11        7 x86/Makefile
12      332 x86/nvmm_x86.c
13      269 x86/nvmm_x86.h
14     2371 x86/nvmm_x86_svm.c
15      200 x86/nvmm_x86_svmfunc.S
16     3157 x86/nvmm_x86_vmx.c
17      260 x86/nvmm_x86_vmxfunc.S
18     8176 total

Presenter Notes

NVMM frontend

NVMM

Presenter Notes

NVMM frontend

1 /usr/src/lib/libnvmm
2 |-- Makefile
3 |-- libnvmm.3
4 |-- libnvmm.c
5 |-- libnvmm_x86.c
6 |-- nvmm.h
7 `-- shlib_version
8 
9 0 directories, 6 files

Presenter Notes

NVMM frontend

1 $ wc -l *       
2        1 CVS
3       15 Makefile
4      635 libnvmm.3
5      537 libnvmm.c
6     3218 libnvmm_x86.c
7      107 nvmm.h
8        5 shlib_version
9     4518 total

Presenter Notes

libnvmm(3) vs a pure ioctl(2) API?

Pros:

  • Reduction of attack surface
  • Easier integration with existing emulators (in C/C++)
  • Easier process of development and testing
  • Easier to perform fuzzing of NVMM components running in userland
  • Crashes are not fatal to host

Presenter Notes

libnvmm(3) vs a pure ioctl(2) API?

Cons:

  • Kernel and userland must be synchronized (however the IOCTL interface can change transparently)
  • Performance overhead for extra context switch to userland for operations that could be performed in the kernel without returning to userland (however some other operations can be performed in userland only without going into the kernel)

Presenter Notes

NVMM vs others

From an end-user perspective NVMM is the same in spirit as WHPX, HVF, HAXM, KVM (we are closer to WHPX than KVM).

All of these hardware acceleration APIs could be abstracted with a single API (example: https://github.com/StrikerX3/virt86 ).

In the end we build the same simulators on top of different hypervisors, however basic differences between these solutions are as follows:

  • KVM has rich user and codebase with many open-source tools and bultin paravirt drivers
  • HVF closed source solution that pushes raw CPU capabilities directly to userspace
  • HAXM open-source, targets single CPU architecture, but target multiple kernels
  • WHPX closed source solution that is similar to NVMM with a high level API

Presenter Notes

libnvmm usage

Presenter Notes

Creating NVMM machine

 1 #include <nvmm.h>
 2 
 3 /* mach is an opaque structure */
 4 struct nvmm_machine mach;
 5 
 6 /* create the machine */
 7 nvmm_machine_create(&mach);
 8 
 9 /* create VCPU0 */
10 nvmm_cpuid_t vcpu = 0;
11 nvmm_vcpu_create(&mach, vcpu);

Virtual Machine is associated with its process and gets automatically destroyed on either normal or abnormal exit.

Presenter Notes

VCPU state accessors

 1 struct nvmm_x64_state state;
 2 nvmm_cpuid_t vcpu = 0;
 3 
 4 /* get general purpose registers */
 5 nvmm_vcpu_getstate(&mach, vcpu, &state, NVMM_X64_STATE_GPRS);
 6 
 7 /* modify the state */
 8 state.gprs[NVMM_X64_GPR_RBX] = 0xabcdef;
 9 
10 /* set general purpose registers*/
11 nvmm_vcpu_setstate(&mach, vcpu, &state, NVMM_X64_STATE_GPRS);

Recent commit with explanation how we optimize performance of these calls by reducing the number of syscalls:

https://mail-index.netbsd.org/source-changes/2019/04/28/msg105538.html

Presenter Notes

NVMM memory mapping

There are two mappings:

  • guest virtual addresses (GVA) to guest physical addresses (GPA)
  • host virtual address (HVA) to guest physical addresses (GPA)

mapping

This allows to access guest's memory as a regular buffer in an emulator.

Presenter Notes

NVMM memory mapping

 1 size_t size = PAGE_SIZE;
 2 
 3 /* allocate buffer */
 4 uintptr_t hva = (uintptr_t)mmap(NULL, size,
 5                                 PROT_READ|PROT_WRITE,
 6                                 MAP_ANON|MAP_PRIVATE,
 7                                 -1, 0);
 8 gpaddr_t gpa = 0x3000;
 9 
10 /* prepare a buffer for mapping into GPA */
11 nvmm_hva_map(&mach, hva, size);
12 
13 /* map HVA->GPA */
14 nvmm_gpa_map(&mach, hva, gpa, size, PROT_READ|PROT_WRITE);

Presenter Notes

NVMM main program loop

 1 struct nvmm_exit exit;
 2 nvmm_cpuid_t vcpu = 0;
 3 
 4 while (1) {
 5     nvmm_vcpu_run(&mach, vcpu, &exit);
 6     switch (exit.reason) {
 7     case NVMM_EXIT_NONE:
 8         break; /* nothing to do */
 9     case ... /* several other reasons */
10     }
11 }

NVMM_EXIT_NONE can happen for various reasons, like emitted signal to an emulator.

Now we are able to write a bare bone emulator that performs calculations on registers inside a Virtual Machine.

Presenter Notes

IO and MEM callbacks

 1 static void
 2 nvmm_io_callback(struct nvmm_mem *mem)
 3 {
 4     /* handle unhandled I/O access, route to emulated devices */
 5 }
 6 
 7 static void
 8 nvmm_mem_callback(struct nvmm_io *io)
 9 {
10     /* handle unhandled memory access, route to emulated devices */
11 }
12 
13 static const struct nvmm_callbacks nvmm_callbacks = {
14     .io = nvmm_io_callback,
15     .mem = nvmm_mem_callback
16 };
17 /* Register unhandled operations for I/O and MEM assist */
18 nvmm_callbacks_register(&nvmm_callbacks);
19 /* Callbacks are global per application instance */

Update! Callback are no longer global per application in the most recent API changes.

Presenter Notes

MEM and I/O assist

 1 struct nvmm_exit exit;
 2 nvmm_cpuid_t vcpu = 0;
 3 
 4 while (1) {
 5     nvmm_vcpu_run(&mach, 0, &exit);
 6     switch (exit.reason) {
 7     case NVMM_EXIT_NONE:
 8         break; /* nothing to do */
 9     case NVMM_EXIT_IO:
10         nvmm_assist_io(&mach, vcpu, &exit);
11         break;
12     case NVMM_EXIT_MEM:
13         nvmm_assist_mem(&mach, vcpu, &exit);
14         break;
15     case ... /* several other reasons */
16     }
17 }

Presenter Notes

libnvmm(3)

The API is documented in man-pages and contains an elaborated example code running a toy kernel in a toy emulator.

1 FILES
2  https://www.netbsd.org/~maxv/nvmm/nvmm-demo.zip
3        Functional example (demonstrator).  Contains a virtualizer that
4        uses the libnvmm API, and a small kernel that exercises this
5        virtualizer.
6  src/sys/dev/nvmm/
7        Source code of the kernel NVMM driver.
8  src/lib/libnvmm/
9        Source code of the libnvmm library.

-- libnvmm(3)

Presenter Notes

xhyve/NVMM

Presenter Notes

xhyve/NVMM (research idea)

Idea:

  • BSD licensed emulator that runs NetBSD, Linux and Windows on top of NVMM?

Possible research:

  • xhyve, a lightweight OS X virtualization solution
  • https://github.com/machyve/xhyve

Why xhyve?

  • Work already done to make bhyve standalone and portable out of FreeBSD
  • Reworks bhyve and put device drivers like clocks into userspace application (NVMM as of now does not implement emulated clocks in the kernel)
  • Strips unneeded or unportable parts (bhyvectl, FreeBSD direct booting code...)

Presenter Notes

bhyve model

bhyve

Presenter Notes

xhyve model

bhyve

Presenter Notes

Unwanted parts in xhyve

bhyve

Presenter Notes

Unwanted parts in xhyve

The FreeBSD booting code (userboot.so) has to be crossbuilt from a modified FreeBSD distribution running by the FreeBSD kernel - too much burden for a realistic solution.

This differs from Linux booting as Linux uses a well-defined booting protocol. The existing code is a standalone solution and does not require up to date Linux kernel headers, filesystem definitions of internals neither pregenerated files.

There is a similar direct booting issue in VMD for OpenBSD. If VMD will be ever ported to NetBSD or direct OpenBSD booting supported in an existing simulator, it will be done with bios/EFI booting.

Direct booting of NetBSD is easier to achieve, as we have direct access to up to date system headers. On the other hand booting is usually not a bottleneck and probably better to invest into proper MULTIBOOT/EFI/bios support.

Presenter Notes

xhyve/NVMM idea

bhyve

Presenter Notes

xhyve/NVMM

The vmm_ops methods map cleanly into libnvmm(3) API.

 1 struct vmm_ops vmm_ops_nvmm = {
 2     vmx_init,
 3     vmx_cleanup,  
 4     vmx_vm_init,          /* nvmm_machine_create()... */
 5     vmx_vcpu_init,        /* nvmm_vcpu_create() */
 6     vmx_run,              /* nvmm_vcpu_run() */
 7     vmx_vm_cleanup,
 8     vmx_vcpu_cleanup,
 9     vmx_getreg,           /* nvmm_vcpu_getstate() */
10     vmx_setreg,           /* nvmm_vcpu_setstate() */
11     vmx_getdesc,          /* nvmm_vcpu_getstate() */
12     vmx_setdesc,          /* nvmm_vcpu_setstate() */
13     vmx_getcap,
14     vmx_setcap,
15     vmx_vlapic_init,
16     vmx_vlapic_cleanup,
17     vmx_vcpu_interrupt
18     /* Planned new methods, for code executed with VM context:
19      *  vmm_mem_init
20      *  vmm_mem_alloc: valloc() + nvmm_hva_map() + nvmm_gpa_map()
21      *  vmm_mem_free : nvmm_gpa_unmap() + nvmm_hva_unmap()        */
22 };

Presenter Notes

xhyve/NVMM

Current status:

  • Waiting for stabilization of the APIs (there are still planned changes before branching -9).
  • Waiting for fixing of detected NVMM/kernel bugs.

Issues with xhyve:

  • The xhyve repository is no longer actively developed.
  • xhyve/NVMM will most likely create a new distinct fork/project.

More ideas:

  • Port xhyve (or rather its fork) to HAXM?

Presenter Notes

qemu/NVMM DEMO

Presenter Notes

NetBSD Virtual Machine Monitor

More projects for NVMM:

  • Port VirtualBox on top of NVMM (there is already a functional WHPX backend support)
  • Nested virtualization support (useful for fuzzing and developing)
  • aarch64 support
  • Write or port emulators dedicated for a certain game or application (mostly targetting Windows games)
  • Embedded and server solutions competing with the KVM emulation

Presenter Notes

NetBSD Virtual Machine Monitor

NVMM

Project's home page:

  • https://m00nbsd.net/4e0798b7f2620c965d0dd9d6a7a2f296.html

The NetBSD Foundation blog post:

  • https://blog.netbsd.org/tnf/entry/from_zero_to_nvmm

Presenter Notes