REGISTER DISCUSSION EXPLORE BLOG HOME

Virtual Computer Blog

Paravirtualization in a Client Hypervisor Envrionment

January 25th, 2010 by Nils Nieuwejaar

Client Virtualization In Depth: An ongoing series exploring the technology behind the next generation desktop.

My name is Nils Nieuwejaar. I’ve been a member of the engineering team at Virtual Computer since a little before we opened our doors two years ago. In this forum, I’ll be writing about a variety of technical issues, covering virtualization in general and NxTop in particular.

Our primary focus at Virtual Computer is on solving a host of management problems for our customers. While we make extensive use of virtualization to solve these problems, we don’t usually think of ourselves as a virtualization company in the traditional sense. More to the point, we don’t necessarily expect our customers to have a deep familiarity with virtualization technology.

My goal here is not to make a virtualization expert out of anybody, but simply to give our users enough information to understand how the different components in the full NxTop system work. This knowledge will help them get the best performance out of the system, and help them diagnose any problems that may arise in their environments. I’ll be covering topics that future customers ask the sales team about, that current customers ask the support team about, and that the support team asks the engineering team about.

Fully Virtualized I/O

When discussing NxTop with customers, they frequently have questions about how I/O works in virtualized systems like NxTop. The answer to this simple question turns out to be somewhat complicated.

I’ll start by talking fully virtualized I/O, which may be the simplest case to understand and the most complex to implement. When a typical Windows application wants to read data from disk, it makes a system call into the operating system, and the operating system in turn makes a call into the device driver controlling that disk.

When running on bare metal (i.e., on a real machine instead of a virtual machine), the disk driver builds up a command structure describing the I/O operation it wants to perform (read/write, sector ID, size, etc.), and then writes a “start I/O” command to a special address in the computer’s memory. The disk hardware is notified when that special address is written to. The disk controller reads the command structure from the computer’s memory, triggers the hardware to carry out the operation it describes, and notifies the OS when the operation is complete. The OS then returns from the system call, and the application proceeds.

Bare-Metal I/O Model

The illustration above shows the different components you will find on a typical bare metal system. There are applications running on top of an operating system, and the operating system interacts directly with physical hardware.

The picture below shows a typical simple virtual platform:

Fully Virtualized I/O Model

Again we have applications running on an operating system, but in this case the operating system is running on top of a hypervisor instead of bare metal. A hypervisor is similar to an operating system, but instead of hosting applications like an operating system, it actually hosts operating systems. The hypervisor manages core resources like CPU and memory, and passes guest I/O requests to a virtual hardware platform. The virtual hardware that receives the I/O requests is nothing more than a piece of software owned and operated by a host operating system. The fundamental trick of virtualization is to make the guest operating system believe that it is interacting with real hardware instead of a piece of software.

(side note: whether the hypervisor and virtual hardware should be shown inside, beside, below, or on top of the host operating system varies from system to system, and gets into the distinction between ‘Type 1′ and ‘Type 2′ hypervisors, More on that in a different post.)

In a fully virtualized system, the disk driver believes that it is controlling a real hard drive. So to read a block of data from disk, it will build up exactly the same control structure described above, and it will write the ’start I/O’ command to exactly the same memory address. In this system however, it is the hypervisor that notices that that address has been written instead of a real disk controller. The hypervisor notifies the virtual hardware, which then reads the control structure out of memory, decodes it, and carries out the described operation.

In most cases, the guest’s disk is actually just a file in the host operating system’s file system. When the guest OS wants to read block 100 from its disk, the virtualization layer instead reads block 100 from the file. (side note: This is actually a significant oversimplification. In practice, a guest’s disks are stored in one or more files, each of which has a somewhat complex internal structure. More on this in another post.)

After the software layer reads the data from disk, it copies the data into the guest operating system’s memory, and sends it exactly the same ‘work completed’ signal that a real disk controller would. The guest operating system continues on its merry way, unaware that the disk request was satisfied by virtual hardware rather than physical.

This sounds relatively simple, and indeed it is when reading a single disk block. However, the IDE interface includes dozens of commands, errors, and status variables. It includes programmed I/O and DMA, supports hard drives and CDROMs, and so on. The guest operating system takes a high-level file operation from an application, translates it into very detailed, low-level IDE commands, and then the virtual hardware has to decipher those low-level IDE commands and transform them into file operations on its virtual hard disk. Doing all of this work for every disk operation can be time consuming, which results in poor disk performance for applications running in the guest operating system.

Paravirtualized I/O

To help avoid the performance problems that come with operating on emulated hardware, we use a different I/O model for performance-sensitive devices such as disk and network. This model is referred to as paravirtualization as opposed to the full virtualization we’ve already discussed.

In paravirtualized I/O, the device driver running in the guest operating system understands that it is running on virtualized hardware. Instead of attempting to talk to a physical disk, a paravirtualized (PV) disk driver in the guest will communcate directly with a partner device driver running in the host OS. As illustrated below, the PV driver bypasses the virtual hardware model, avoiding all of the expensive encoding and decoding steps.

Paravirtualized I/O Model

Just a couple of notes on terminology: since there is no disk device per se here – just two cooperating drivers, we typically don’t talk about ‘PV devices’. Instead we mostly talk about PV drivers. If you have ever used VMware’s Tools, VirtualBox’s Guest Additions, or Microsoft’s Hyper-V Integration Services in a virtual machine, you have been using some type of PV drivers.

Each driver actually has two parts: a front end and a back end. The frontend driver runs within the guest. It plugs into the guest operating system’s driver stack in essentially the same way a physical driver would, so the rest of the operating system interacts with it just like a physical device. The backend driver runs in the host operating system. It receives I/O requests from the frontend driver, and executes them. These I/O requests arrive from the guest fully formed in a commonly agreed upon format. There is no expensive decoding/translating process as with emulated devices.

In addition to communicating at a higher level, PV drivers generally offer additional opportunities for improved performance. We can change buffer sizes, queue depths, algorithms, or features at any time. Since the I/O model is simpler, it is easier to identify and fix any performance or correctness problems. For a single point of comparison: in NxTop Engine, basic IDE emulation takes 4 times as much code as the backend disk driver, and runs at a fraction of the speed.

Since the front end and back drivers are generally written in conjunction with one another, there is no ambiguity about the expected behavior. When writing software that emulates a particular hardware device, the documentation of the device’s behavior may be unavailable or incomplete. This may cause the implementor of the virtual hardware to be reluctant to support the highest performing and most complicated mechanisms, instead forcing the device driver to fallback to older, simpler, and better documented mechanisms.

The most significant drawback of the PV I/O model is that you need new drivers for each operating system you want to use as a guest. The devices provided by a fully virtualized system tend to be common enough that drivers will be available for nearly every OS. For PV I/O, you will always need to write new drivers. The NxTop platform includes PV drivers for disk, network, mouse, and USB for Windows XP, Vista, and Windows 7, which covers nearly all of our customers’ needs.

The screenshot below shows a picture of the Windows 7 Device Manager, with the NxTop PV drivers installed on the NxTop Engine.

Windows 7 Device Manager with NxTop

You may have noticed that prior to this I never mentioned whether I was describing the NxTop Engine (i.e., the client) or NxTop Center (i.e., the server). In fact, this discussion applies equally well to both components of the NxTop system. On the server, we make use of Microsoft’s Hyper-V Integration Services when the IT admin is managing and publishing a NxTop image. On the client, we make use of our own PV drivers when the end user is running the published image.

Finally, I should mention that there is another interesting type of I/O in a virtual environment: passthrough. We don’t currently make use of this in NxTop engine, but I’ll talk more about it in future posts.

Posted in Client Hypervisor, Virtualization | No Comments

One Big Thing We ‘Got Right’ With Hardware Compatibility

January 22nd, 2010 by Doug Lane

In my last post, I got on a bit of roll about how the various industry players are approaching client hypervisor hardware compatibility (or not as the case may be).  Now that I have that out of my system, I thought I would begin to describe some of the things I believe we did right with our approach for NxTop.  I’ll start with a big one:

NxTop is compatible with, but does not require, Intel vPro and VT-d.

Several of the other client hypervisor products in works are being centered on Intel vPro.  This makes sense on one level, since vPro is at its core a management and control point that is independent of the operating system. A client hypervisor is a very logical extension of that. The rub is that there are many corporate PCs with years of life remaining in them that are not vPro enabled.  There are also many enterprises ordering large volumes of PCs who do not want to pay a premium for vPro-enabled PCs for all classes of users.  When you are dealing with hundreds of thousands of PCs, any incremental cost per unit adds up very quickly, so this is a real consideration in today’s budget conscious times.

One of the major stumbling blocks of server-based desktop virtualization is that while it offers significant management and security benefits, it can generally only be deployed for a subset of an organization’s users.  Limiting client hypervisor compatibility to the highest end of the corporate PC market and not providing backwards compatibility with existing business class PCs would impose the same limitations on the adoption of client-side virtualization.  For obvious reasons, we did not want to see that happen.

The primary aspect of vPro that is relevant to client virtualization is Intel Virtualization Technology for Directed I/O (VT-d).  VT-d extends the base Intel virtualization extensions for the x86 architecture that exist in most business class machines today (VT-x) to include an input/output memory management unit (IOMMU).  The IOMMU makes it possible to securely assign physical hardware components directly to specific virtual machines.  This has many practical uses (particularly in overcoming performance challenges in areas such as graphics), but it has some major downside in that it requires hardware-specific drivers in each virtual machine and makes supporting the full array of graphics cards quite challenging.  Stay tuned for more on this in a future post.

With NxTop, we have achieved a very high level of performance without reliance on IOMMU.  This enables NxTop Engine to run on any platform with VT-x.  So when we go into an enterprise where they are currently buying the latest Dell E-series PCs but they have a bunch of older D630s (usually with a mix of Intel and NVIDIA graphics chips), it’s never a problem to get started.  We are not talking to the client about a utopian management model in the future when all of their current PCs are in the graveyard.  We are saying, “Hey, let’s gets started—TODAY.”

With this as a backdrop, I do not in any way want to leave the impression that we don’t see value in both vPro and VT-d/IOMMU functionality.  Continuing innovation from the processor manufacturers will only expand the set of management and performance features we can offer as part of NxTop, and we are embracing this innovation with open arms.  However, we don’t think client virtualization can take off without support for a wide range of PC platforms both new and existing.

Posted in Client Hypervisor, Virtualization | 1 Comment

The Client Hypervisor Hardware Compatibility Challenge

January 6th, 2010 by Doug Lane

It was good fun winding down 2009 with a spirited debate over on BrianMadden.com about the future prospects of type-1 client hypervisor technology. A topic that came up that I feel warrants a bit more commentary is hardware compatibility for client hypervisors.

As I look back in the rearview mirror at two-plus years of talking with IT professionals and industry analysts about client hypervisor technology, hardware compatibility is by far the topic that has generated the most questions. It isn’t all that surprising, since it is in fact one of the biggest challenges that comes with bringing bare-metal hypervisor technology from the datacenter to the PC. There is critical functionality on end-user PCs that server hypervisors never needed to deal with, such as high performance graphics, wireless, power management, USB peripherals, laptop lid closure events, etc.

If I look outward at the rest of the industry, I see two reactions to this challenge. The reaction among other desktop virtualization startups was to punt on the hypervisor. Implementing a bare-metal client hypervisor is hard, especially for a startup with a finite set of resources. It is much easier to focus efforts elsewhere and in full hand-waving mode say, “Type-1 client hypervisors will eventually be a ubiquitous commodity.” As it happens, this is probably true. However, it is not the case today, it definitely won’t be the case in 2010, and who can say with 100 percent certainty when (or if) it will ever be the case? I’d probably bet a hundred bucks on it but certainly not millions in venture capital. For startups, there is certainly risk in spreading yourself too thin, but there is greater risk in not being in control of your own destiny. There is further risk in not being in control of the end-user experience, since that will ultimately make or break the success of client-side virtualization more than any other factor.

The incumbent desktop virtualization players followed a different approach. They are tackling the client hypervisor but zeroing in on a very small hardware compatibility list. They already have a captive audience (and revenue stream) with their server virtualization and server-hosted desktop virtualization products, so the bar for them is simply to show enough forward progress with client hypervisors to freeze their customers. This worked for a while in 2009, but we saw the freeze start to thaw when the major virtualization players laid a collective client hypervisor goose egg at VMworld 2009. VMware shops that had told us “looks great, but we’re gonna wait to see CVP” came calling again in September. We are seeing the thaw turn into a full melt now that the “late 2009″ client hypervisor target the big guys communicated has come and gone. This won’t last forever, but we are certainly not going to let the window of opportunity we have in early 2010 pass without capitalizing.

I won’t claim that we have achieved universal hardware compatibility with every PC on the planet, but I do believe that through a combination of a superior client hypervisor architecture, hard work, and close collaboration with key PC manufacturers, we are the undisputed leader in client hypervisor hardware compatibility. In my next post, I will provide a detailed explanation of why this is the case.

Posted in Client Hypervisor | 1 Comment