Whole system virtualization allows us to package software systems inside self-contained virtual machines, each running its own operating system along with a collection of applications and services. The use of "virtual hardware" allows agile software management of machine deployment, and the provision of strong resource isolation between co-located virtual machines enables modern cloud computing platforms. However, these benefits do not come entirely for free: the additional level of indirection can lead to increased runtime overheads and reduced performance.
The past decade has seen huge advances in tackling this problem. CPU virtualization overheads due to dynamic binary rewriting and memory virtualization overheads due to shadow page tables were reduced by a combination of clever algorithms and hardware assistance. A key remaining problem has been I/O virtualization, and most notably tackling the challenges introduced by high-speed networks running at 10Gb/s or 40Gb/s. The following paper shows how to enable a virtual machine to attain "bare metal" performance from high-speed network interface cards (NICs).
Their starting point is to use direct device assignment (sometimes referred to as "PCI pass-through"). The idea here is to dedicate a NIC to a virtual machine, and allow it to access the device registers directly. This means the device driver running in the virtual machine will be able to program DMA transfers to send and receive packets, just like it would if running on a real machine. Configuring an IOMMU to disallow transfers to or from non-owned physical memory ensures security. Getting the virtualization layer out of the way for data transfer is a big win, and goes a long way to reducing the performance overheads. But it turns out there is another, rather surprising, problem: interrupt processing.
The ELI idea is a neat one, and is applicable to more than just network interface cards. With some tweaks, the approach should be able to scale to multiple cores, devices, and virtual machines.
As device speeds increase, the number of packets per second increases too, leading to a high rate of device interrupts. There are various techniques to ameliorate high-interrupt load, such as using larger packets or batching interrupt generation, but these come with potentially undesirable side effects, and even then only partly mitigate the problem. For example, one of the experiments in the paper shows that even with adaptive interrupt batching, a NIC can easily generate many tens of thousands of interrupts per second. This is already a problem for regular machines but it is even worse for virtual machines because as it turns out they incur far higher interrupt processing costs.
As the authors explain, the standard practice for handling a device interrupt that occurs while running in a virtual machine requires two "exits": context-switches from the guest-operating mode into the host-operating mode. The overhead from these exits (and their matched re-entries into guest-operating mode) can lead to performance degradation of 30%–50% for network intensive workloads. This observation leads directly to the authors' idea: Exitless Interrupts (ELI).
Their solution comes in two parts. The first technique is to configure the system to deliver all interrupts directly to the virtual machine running in guest-operating mode, but arrange for those not intended for direct consumption to immediately exit back to the host. The second technique is to allow the guest-operating system running in the virtual machine to directly acknowledge to the hardware it has handled the interrupt. There is a lot of cleverness required to make this work safely and transparently—read the paper for more details! But the key take-away is these two techniques, in combination, allow a virtual machine to attain close to 100% of bare-metal performance when using a 10Gb/s NIC.
The ELI idea is a neat one, and is applicable to more than just network interface cards. With some tweaks and additional features, the general approach should be able to scale to multiple cores, multiple devices, and multiple virtual machines. And if so, perhaps we can finally say that virtualization performance is a solved problem ... for now.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.
No entries found