QEMU Deprivileging / dm_restrict

Revision 1

1 Basics

Status: Tech Preview
Architecture(s): x86
Component(s): toolstack

2 Overview

By default, the QEMU device model is run in domain 0. If an attacker can gain control of a QEMU process, it could easily take control of a system.

dm_restrict is a set of operations to restrict QEMU running in domain 0. It consists of two halves:

  1. Mechanisms to restrict QEMU to only being able to affect its own domain
  2. Mechanisms to restrict QEMU’s ability to interact with domain 0.

3 User details

3.1 Getting the right versions of software

Linux: 4.11+

Qemu: 3.0+ (Or the version that comes with Xen 4.12+)

3.2 Setting up a group and userid range

For maximum security, libxl needs to run the devicemodel for each domain under a user id (UID) corresponding to its domain id. There are 32752 possible domain IDs, and so libxl needs 32752 user ids set aside for it. Setting up a group for all devicemodels to run at is also recommended.

The simplest and most effective way to do this is to allocate a contiguous block of UIDs, and create a single user named xen-qemuuser-range-base with the first UID. For example, under Debian:

adduser --system --uid 131072 --group --no-create-home xen-qemuuser-range-base

Two comments on this method:

  1. Most modern systems have 32-bit UIDs, and so can in theory go up to 2^31 (or 2^32 if uids are unsigned). POSIX only guarantees 16-bit UIDs however; UID 65535 is reserved for an invalid value, and 65534 is normally allocated to “nobody”.
  2. Additionally, some container systems have proposed using the upper 16 bits of the uid for a container ID. Using a multiple of 2^16 for the range base (as is done above) will result in all UIDs being interpreted by such systems as a single container ID.

Another, less-secure way is to run all QEMUs as the same UID. To do this, create a user named xen-qemuuser-shared; for example:

adduser --no-create-home --system xen-qemuuser-shared

A final way to set up a separate process for qemus is to allocate one UID per VM, and set the UID in the domain config file with the device_model_user argument. For example, suppose you have a VM named c6-01. You might do the following:

adduser --system --no-create-home --group xen-qemuuser-c6-01

And then in your config file, the following line:

device_model_user="xen-qemuuser-c6-01"

If you use this method, you should also allocate one “reaper” user to be used for killing device models:

adduser --system --no-create-home --group xen-qemuuser-reaper

NOTE: It is important when using device_model_user that EACH VM HAVE A SEPARATE UID, and that none of these UIDs map to root. xl will throw an error a uid maps to zero, but not if multiple VMs have the same uid. Multiple VMs with the same device model uid will cause problems.

It is also important that xen-qemuuser-reaper not have any processes associated with it, as they will be destroyed when deprivileged qemu processes are destroyed.

3.3 Domain config changes

The core domain config change is to add the following line to the domain configuration:

dm_restrict=1

This will perform a number of restrictions, outlined below in the ‘Technical details’ section.

4 Technical details

See docs/design/qemu-deprivilege.md for technical details.

5 Limitations

The following features still need to be implemented:

A number of restrictions still need to be implemented. A compromised device model may be able to do the following:

Additionally, getting PCI passthrough to work securely would require a significant rework of how passthrough works at the moment. It may be implemented at some point but is not a near-term priority.

See SUPPORT.md for security support status.

6 History

Date Revision Version Notes
2018-09-14 1 Xen 4.12 Imported from docs/misc