Plenty of people have asked me over the years how to pass-through generic PCI devices to virtual machines running on XenServer. Whilst it isn’t officially supported by Citrix, it’s none the less perfectly possible to do; just note that your mileage may vary, because clearly it’s not rigorously tested with all the possible different types of device people might want to pass-through (from TV cards, to storage controllers, to USB hubs…!).

The process on XenServer 7.0 differs somewhat from previous releases, in that the Dom0 control domain is now CentOS 7.0-based, and UEFI boot (in addition to BIOS boot) is supported. Hence, I thought it would be worth writing up the latest instructions, for those who are feeling adventurous.

Of course, XenServer officially supports pass-through of GPUs to both Windows and Linux VMs, hence this territory isn’t as uncharted as might first appear: pass-through in itself is fine. The wrinkles will be to do with a particular given piece of hardware.

A Short Introduction to PCI Pass-Through

Firstly, a little primer on what we’re trying to do.

Your host will have a PCI bus, with multiple devices hosted on it, each with its own unique ID on the bus (more on that later; just remember this as „B:D.f”). In addition, each device has a globally unique vendor ID and device ID, which allows the operating system to look up what its human-readable name is in the PCI IDs database text file on the system. For example, vendor ID 10de corresponds to the NVIDIA Corporation, and device ID 11b4 corresponds to the Quadro K4200. Each device can then (optionally) have multiple sub-vendor and sub-device IDs, e.g. if an OEM has its own branded version of a supplier’s component.

Normally, XenServer’s control domain, Dom0, is given all PCI devices by the Xen hypervisor. Drivers in the Linux kernel running in Dom0 each bind to particular PCI device IDs, and thus make the hardware actually do something. XenServer then provides synthetic devices (emulated or para-virtualised) such as SCSI controllers and network cards to the virtual machines, passing the I/O through Dom0 and then out to the real hardware devices.

This is great, because it means the VMs never see the real hardware, and thus we can live migrate VMs around, or start them up on different physical machines, and the virtualised operating systems will be none the wiser.

If, however, we want to give a VM direct access to a piece of hardware, we need to do something different. The main reason one might want to is because the hardware in question isn’t easy to virtualise, i.e. the hypervisor can’t provide a synthetic device to a VM, and somehow then „share out” the real hardware between those synthetic devices. This is the case for everything from an SSL offload card to a GPU.

Aside: Virtual Functions

There are three ways of sharing out a PCI device between VMs. The first is what XenServer does for network cards and storage controllers, where a synthetic device is given to the VM, but then the I/O streams can effectively be mixed together on the real device (e.g. it doesn’t matter that traffic from multiple VMs is streamed out of the same physical network card: that’s what will end up happening at a physical switch anyway). That’s fine if it’s I/O you’re dealing with.

The second is to use software to share out the device. Effectively you have some kind of „manager” of the hardware device that is responsible for sharing it between multiple virtual machines, as is done with NVIDIA GRID GPU virtualisation, where each VM still ends up with a real slice of GPU hardware, but controlled by a process in Dom0.

The third is to virtualise at the hardware device level, and have a PCI device expose multiple virtual functions (VFs). Each VF provides some subset of the functionality of the device, isolated from other VFs at the hardware level. Several VMs can then each be given their own VF (using exactly the same mechanism as passing through an entire PCI device). A couple of examples are certain Intel network cards, and AMD’s MxGPU technology.

OK, So How Do I Pass-Through a Device?

Step 1

Firstly, we have to stop any driver in Dom0 claiming the device. In order to do that, we’ll need to ascertain what the ID of the device we’re interested in passing through is. We’ll use B:D.f (Bus, Device, function) numbering to specify it.

Running lspci will tell you what’s in your system:

davidcot@helical:~$ lspci
00:00.0 Host bridge: Intel Corporation 82X38/X48 Express DRAM Controller
00:01.0 PCI bridge: Intel Corporation 82X38/X48 Express Host-Primary PCI Express Bridge
00:06.0 PCI bridge: Intel Corporation 82X38/X48 Express Host-Secondary PCI Express Bridge
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: NVIDIA Corporation G86 [Quadro NVS 290] (rev a1)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express (rev 02)

Once you’ve found the device you’re interested in, say 04:00.0 for my network card, we tell Dom0 to exclude it from being bound to by normal drivers. You can add to the Dom0 boot line as follows:

/opt/xensource/libexec/xen-cmdline --set-dom0 "xen-pciback.hide=(04:00.0)"

(What this does is edit /boot/grub/grub.cfg for you, or if you’re booting using UEFI, /boot/efi/EFI/xenserver/grub.cfg instead!)

Step 2

Reboot! At the moment, a drive in Dom0 probably still has hold of your device, hence you need to reboot the host to get it relinquished.

Step 3

The easy bit: tell the toolstack to assign the PCI device to the VM. Run:

xe vm-list

And note the UUID of the VM you’re interested in, then:

xe vm-param-set other-config:pci=0/0000:<B:D.f> uuid=<vm uuid>

Where, of course, <B.D.f> is the ID of the device you found in step 1 (like 04:00.0), and <vm uuid> corresponds to the VM you care about.

Step 4

Start your VM. At this point if you run lspci (or equivalent) within the VM, you should now see the device. However, that doesn’t mean it will spring into life, because…

Step 5

Install a device driver for the piece of hardware you passed-through. The operating system within the VM may already ship with a suitable device driver, but it not, you’ll need to go and get the appropriate one from the device manufacturer. This will normally be the standard Linux/Windows/other one that you would use for a physical system; the only difference occurs when you’re using a virtual function, where the VF driver is likely to be a special one.

Health Warnings

As indicated above, pass-through has advantages and disadvantages. You’ll get direct access to the hardware (and hence, for some functions, higher performance), but you’ll forgo luxuries such as the ability to live migrate the virtual machine around (there’s state now sitting on real hardware, versus virtual devices), and the ability to use high availability for that VM (because HA doesn’t take into account how many free PCI devices of the right sort you have in your resource pool).

In addition, not all PCI devices take well to being passed through, and not all servers like doing so (e.g. if you’re extending the PCI bus in a blade system to an expansion module, this can sometimes cause problems). Your mileage may therefore vary.

If you do get stuck, head over to the XenServer discussion forums and people will try to help out, but just note that Citrix doesn’t officially support generic PCI pass-through, hence you’re in the hands of the (very knowledgeable) community.

Conclusion

Hopefully this has helped clear up how pass-through is done on XenServer 7.0; do comment and let us know how you’re using pass-through in your environment, so that we can learn what people want to do, and think about what to officially support on XenServer in the future!

Read More

Tagged with →  
Share →