REGISTER DISCUSSION EXPLORE BLOG HOME


Paravirtualization in a Client Hypervisor Envrionment

Posted by Nils Nieuwejaar on January 25th, 2010 in Client Hypervisor, Virtualization

Client Virtualization In Depth: An ongoing series exploring the technology behind the next generation desktop.

My name is Nils Nieuwejaar. I’ve been a member of the engineering team at Virtual Computer since a little before we opened our doors two years ago. In this forum, I’ll be writing about a variety of technical issues, covering virtualization in general and NxTop in particular.

Our primary focus at Virtual Computer is on solving a host of management problems for our customers. While we make extensive use of virtualization to solve these problems, we don’t usually think of ourselves as a virtualization company in the traditional sense. More to the point, we don’t necessarily expect our customers to have a deep familiarity with virtualization technology.

My goal here is not to make a virtualization expert out of anybody, but simply to give our users enough information to understand how the different components in the full NxTop system work. This knowledge will help them get the best performance out of the system, and help them diagnose any problems that may arise in their environments. I’ll be covering topics that future customers ask the sales team about, that current customers ask the support team about, and that the support team asks the engineering team about.

Fully Virtualized I/O

When discussing NxTop with customers, they frequently have questions about how I/O works in virtualized systems like NxTop. The answer to this simple question turns out to be somewhat complicated.

I’ll start by talking fully virtualized I/O, which may be the simplest case to understand and the most complex to implement. When a typical Windows application wants to read data from disk, it makes a system call into the operating system, and the operating system in turn makes a call into the device driver controlling that disk.

When running on bare metal (i.e., on a real machine instead of a virtual machine), the disk driver builds up a command structure describing the I/O operation it wants to perform (read/write, sector ID, size, etc.), and then writes a “start I/O” command to a special address in the computer’s memory. The disk hardware is notified when that special address is written to. The disk controller reads the command structure from the computer’s memory, triggers the hardware to carry out the operation it describes, and notifies the OS when the operation is complete. The OS then returns from the system call, and the application proceeds.

Bare-Metal I/O Model

The illustration above shows the different components you will find on a typical bare metal system. There are applications running on top of an operating system, and the operating system interacts directly with physical hardware.

The picture below shows a typical simple virtual platform:

Fully Virtualized I/O Model

Again we have applications running on an operating system, but in this case the operating system is running on top of a hypervisor instead of bare metal. A hypervisor is similar to an operating system, but instead of hosting applications like an operating system, it actually hosts operating systems. The hypervisor manages core resources like CPU and memory, and passes guest I/O requests to a virtual hardware platform. The virtual hardware that receives the I/O requests is nothing more than a piece of software owned and operated by a host operating system. The fundamental trick of virtualization is to make the guest operating system believe that it is interacting with real hardware instead of a piece of software.

(side note: whether the hypervisor and virtual hardware should be shown inside, beside, below, or on top of the host operating system varies from system to system, and gets into the distinction between ‘Type 1′ and ‘Type 2′ hypervisors, More on that in a different post.)

In a fully virtualized system, the disk driver believes that it is controlling a real hard drive. So to read a block of data from disk, it will build up exactly the same control structure described above, and it will write the ’start I/O’ command to exactly the same memory address. In this system however, it is the hypervisor that notices that that address has been written instead of a real disk controller. The hypervisor notifies the virtual hardware, which then reads the control structure out of memory, decodes it, and carries out the described operation.

In most cases, the guest’s disk is actually just a file in the host operating system’s file system. When the guest OS wants to read block 100 from its disk, the virtualization layer instead reads block 100 from the file. (side note: This is actually a significant oversimplification. In practice, a guest’s disks are stored in one or more files, each of which has a somewhat complex internal structure. More on this in another post.)

After the software layer reads the data from disk, it copies the data into the guest operating system’s memory, and sends it exactly the same ‘work completed’ signal that a real disk controller would. The guest operating system continues on its merry way, unaware that the disk request was satisfied by virtual hardware rather than physical.

This sounds relatively simple, and indeed it is when reading a single disk block. However, the IDE interface includes dozens of commands, errors, and status variables. It includes programmed I/O and DMA, supports hard drives and CDROMs, and so on. The guest operating system takes a high-level file operation from an application, translates it into very detailed, low-level IDE commands, and then the virtual hardware has to decipher those low-level IDE commands and transform them into file operations on its virtual hard disk. Doing all of this work for every disk operation can be time consuming, which results in poor disk performance for applications running in the guest operating system.

Paravirtualized I/O

To help avoid the performance problems that come with operating on emulated hardware, we use a different I/O model for performance-sensitive devices such as disk and network. This model is referred to as paravirtualization as opposed to the full virtualization we’ve already discussed.

In paravirtualized I/O, the device driver running in the guest operating system understands that it is running on virtualized hardware. Instead of attempting to talk to a physical disk, a paravirtualized (PV) disk driver in the guest will communcate directly with a partner device driver running in the host OS. As illustrated below, the PV driver bypasses the virtual hardware model, avoiding all of the expensive encoding and decoding steps.

Paravirtualized I/O Model

Just a couple of notes on terminology: since there is no disk device per se here – just two cooperating drivers, we typically don’t talk about ‘PV devices’. Instead we mostly talk about PV drivers. If you have ever used VMware’s Tools, VirtualBox’s Guest Additions, or Microsoft’s Hyper-V Integration Services in a virtual machine, you have been using some type of PV drivers.

Each driver actually has two parts: a front end and a back end. The frontend driver runs within the guest. It plugs into the guest operating system’s driver stack in essentially the same way a physical driver would, so the rest of the operating system interacts with it just like a physical device. The backend driver runs in the host operating system. It receives I/O requests from the frontend driver, and executes them. These I/O requests arrive from the guest fully formed in a commonly agreed upon format. There is no expensive decoding/translating process as with emulated devices.

In addition to communicating at a higher level, PV drivers generally offer additional opportunities for improved performance. We can change buffer sizes, queue depths, algorithms, or features at any time. Since the I/O model is simpler, it is easier to identify and fix any performance or correctness problems. For a single point of comparison: in NxTop Engine, basic IDE emulation takes 4 times as much code as the backend disk driver, and runs at a fraction of the speed.

Since the front end and back drivers are generally written in conjunction with one another, there is no ambiguity about the expected behavior. When writing software that emulates a particular hardware device, the documentation of the device’s behavior may be unavailable or incomplete. This may cause the implementor of the virtual hardware to be reluctant to support the highest performing and most complicated mechanisms, instead forcing the device driver to fallback to older, simpler, and better documented mechanisms.

The most significant drawback of the PV I/O model is that you need new drivers for each operating system you want to use as a guest. The devices provided by a fully virtualized system tend to be common enough that drivers will be available for nearly every OS. For PV I/O, you will always need to write new drivers. The NxTop platform includes PV drivers for disk, network, mouse, and USB for Windows XP, Vista, and Windows 7, which covers nearly all of our customers’ needs.

The screenshot below shows a picture of the Windows 7 Device Manager, with the NxTop PV drivers installed on the NxTop Engine.

Windows 7 Device Manager with NxTop

You may have noticed that prior to this I never mentioned whether I was describing the NxTop Engine (i.e., the client) or NxTop Center (i.e., the server). In fact, this discussion applies equally well to both components of the NxTop system. On the server, we make use of Microsoft’s Hyper-V Integration Services when the IT admin is managing and publishing a NxTop image. On the client, we make use of our own PV drivers when the end user is running the published image.

Finally, I should mention that there is another interesting type of I/O in a virtual environment: passthrough. We don’t currently make use of this in NxTop engine, but I’ll talk more about it in future posts.

VN:F [1.6.9_936]
Rating: 0.0/5 (0 votes cast)

| Facebook | digg | StumbleUpon | Trackback

Post a Comment

Subscribe to the Virtual Computer Blog Comments Feed.
Read the Virtual Computer Blog comments policy.

Related posts