[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago
|
|
|
/******************************************************************************
|
|
|
|
* x86_emulate.h
|
|
|
|
*
|
|
|
|
* Generic x86 (32-bit and 64-bit) instruction decoder and emulator.
|
|
|
|
*
|
|
|
|
* Copyright (c) 2005 Keir Fraser
|
|
|
|
*
|
|
|
|
* From: xen-unstable 10676:af9809f51f81a3c43f276f00c81a52ef558afda4
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef __X86_EMULATE_H__
|
|
|
|
#define __X86_EMULATE_H__
|
|
|
|
|
|
|
|
struct x86_emulate_ctxt;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* x86_emulate_ops:
|
|
|
|
*
|
|
|
|
* These operations represent the instruction emulator's interface to memory.
|
|
|
|
* There are two categories of operation: those that act on ordinary memory
|
|
|
|
* regions (*_std), and those that act on memory regions known to require
|
|
|
|
* special treatment or emulation (*_emulated).
|
|
|
|
*
|
|
|
|
* The emulator assumes that an instruction accesses only one 'emulated memory'
|
|
|
|
* location, that this location is the given linear faulting address (cr2), and
|
|
|
|
* that this is one of the instruction's data operands. Instruction fetches and
|
|
|
|
* stack operations are assumed never to access emulated memory. The emulator
|
|
|
|
* automatically deduces which operand of a string-move operation is accessing
|
|
|
|
* emulated memory, and assumes that the other operand accesses normal memory.
|
|
|
|
*
|
|
|
|
* NOTES:
|
|
|
|
* 1. The emulator isn't very smart about emulated vs. standard memory.
|
|
|
|
* 'Emulated memory' access addresses should be checked for sanity.
|
|
|
|
* 'Normal memory' accesses may fault, and the caller must arrange to
|
|
|
|
* detect and handle reentrancy into the emulator via recursive faults.
|
|
|
|
* Accesses may be unaligned and may cross page boundaries.
|
|
|
|
* 2. If the access fails (cannot emulate, or a standard access faults) then
|
|
|
|
* it is up to the memop to propagate the fault to the guest VM via
|
|
|
|
* some out-of-band mechanism, unknown to the emulator. The memop signals
|
|
|
|
* failure by returning X86EMUL_PROPAGATE_FAULT to the emulator, which will
|
|
|
|
* then immediately bail.
|
|
|
|
* 3. Valid access sizes are 1, 2, 4 and 8 bytes. On x86/32 systems only
|
|
|
|
* cmpxchg8b_emulated need support 8-byte accesses.
|
|
|
|
* 4. The emulator cannot handle 64-bit mode emulation on an x86/32 system.
|
|
|
|
*/
|
|
|
|
/* Access completed successfully: continue emulation as normal. */
|
|
|
|
#define X86EMUL_CONTINUE 0
|
|
|
|
/* Access is unhandleable: bail from emulation and return error to caller. */
|
|
|
|
#define X86EMUL_UNHANDLEABLE 1
|
|
|
|
/* Terminate emulation but return success to the caller. */
|
|
|
|
#define X86EMUL_PROPAGATE_FAULT 2 /* propagate a generated fault to guest */
|
|
|
|
#define X86EMUL_RETRY_INSTR 2 /* retry the instruction for some reason */
|
|
|
|
#define X86EMUL_CMPXCHG_FAILED 2 /* cmpxchg did not see expected value */
|
|
|
|
struct x86_emulate_ops {
|
|
|
|
/*
|
|
|
|
* read_std: Read bytes of standard (non-emulated/special) memory.
|
|
|
|
* Used for instruction fetch, stack operations, and others.
|
|
|
|
* @addr: [IN ] Linear address from which to read.
|
|
|
|
* @val: [OUT] Value read from memory, zero-extended to 'u_long'.
|
|
|
|
* @bytes: [IN ] Number of bytes to read from memory.
|
|
|
|
*/
|
|
|
|
int (*read_std)(unsigned long addr,
|
|
|
|
unsigned long *val,
|
|
|
|
unsigned int bytes, struct x86_emulate_ctxt * ctxt);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* write_std: Write bytes of standard (non-emulated/special) memory.
|
|
|
|
* Used for stack operations, and others.
|
|
|
|
* @addr: [IN ] Linear address to which to write.
|
|
|
|
* @val: [IN ] Value to write to memory (low-order bytes used as
|
|
|
|
* required).
|
|
|
|
* @bytes: [IN ] Number of bytes to write to memory.
|
|
|
|
*/
|
|
|
|
int (*write_std)(unsigned long addr,
|
|
|
|
unsigned long val,
|
|
|
|
unsigned int bytes, struct x86_emulate_ctxt * ctxt);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* read_emulated: Read bytes from emulated/special memory area.
|
|
|
|
* @addr: [IN ] Linear address from which to read.
|
|
|
|
* @val: [OUT] Value read from memory, zero-extended to 'u_long'.
|
|
|
|
* @bytes: [IN ] Number of bytes to read from memory.
|
|
|
|
*/
|
|
|
|
int (*read_emulated) (unsigned long addr,
|
|
|
|
unsigned long *val,
|
|
|
|
unsigned int bytes,
|
|
|
|
struct x86_emulate_ctxt * ctxt);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* write_emulated: Read bytes from emulated/special memory area.
|
|
|
|
* @addr: [IN ] Linear address to which to write.
|
|
|
|
* @val: [IN ] Value to write to memory (low-order bytes used as
|
|
|
|
* required).
|
|
|
|
* @bytes: [IN ] Number of bytes to write to memory.
|
|
|
|
*/
|
|
|
|
int (*write_emulated) (unsigned long addr,
|
|
|
|
unsigned long val,
|
|
|
|
unsigned int bytes,
|
|
|
|
struct x86_emulate_ctxt * ctxt);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* cmpxchg_emulated: Emulate an atomic (LOCKed) CMPXCHG operation on an
|
|
|
|
* emulated/special memory area.
|
|
|
|
* @addr: [IN ] Linear address to access.
|
|
|
|
* @old: [IN ] Value expected to be current at @addr.
|
|
|
|
* @new: [IN ] Value to write to @addr.
|
|
|
|
* @bytes: [IN ] Number of bytes to access using CMPXCHG.
|
|
|
|
*/
|
|
|
|
int (*cmpxchg_emulated) (unsigned long addr,
|
|
|
|
unsigned long old,
|
|
|
|
unsigned long new,
|
|
|
|
unsigned int bytes,
|
|
|
|
struct x86_emulate_ctxt * ctxt);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* cmpxchg8b_emulated: Emulate an atomic (LOCKed) CMPXCHG8B operation on an
|
|
|
|
* emulated/special memory area.
|
|
|
|
* @addr: [IN ] Linear address to access.
|
|
|
|
* @old: [IN ] Value expected to be current at @addr.
|
|
|
|
* @new: [IN ] Value to write to @addr.
|
|
|
|
* NOTES:
|
|
|
|
* 1. This function is only ever called when emulating a real CMPXCHG8B.
|
|
|
|
* 2. This function is *never* called on x86/64 systems.
|
|
|
|
* 2. Not defining this function (i.e., specifying NULL) is equivalent
|
|
|
|
* to defining a function that always returns X86EMUL_UNHANDLEABLE.
|
|
|
|
*/
|
|
|
|
int (*cmpxchg8b_emulated) (unsigned long addr,
|
|
|
|
unsigned long old_lo,
|
|
|
|
unsigned long old_hi,
|
|
|
|
unsigned long new_lo,
|
|
|
|
unsigned long new_hi,
|
|
|
|
struct x86_emulate_ctxt * ctxt);
|
|
|
|
};
|
|
|
|
|
|
|
|
struct cpu_user_regs;
|
|
|
|
|
|
|
|
struct x86_emulate_ctxt {
|
|
|
|
/* Register state before/after emulation. */
|
|
|
|
struct kvm_vcpu *vcpu;
|
|
|
|
|
|
|
|
/* Linear faulting address (if emulating a page-faulting instruction). */
|
|
|
|
unsigned long eflags;
|
|
|
|
unsigned long cr2;
|
|
|
|
|
|
|
|
/* Emulated execution mode, represented by an X86EMUL_MODE value. */
|
|
|
|
int mode;
|
|
|
|
|
|
|
|
unsigned long cs_base;
|
|
|
|
unsigned long ds_base;
|
|
|
|
unsigned long es_base;
|
|
|
|
unsigned long ss_base;
|
|
|
|
unsigned long gs_base;
|
|
|
|
unsigned long fs_base;
|
|
|
|
};
|
|
|
|
|
|
|
|
/* Execution mode, passed to the emulator. */
|
|
|
|
#define X86EMUL_MODE_REAL 0 /* Real mode. */
|
|
|
|
#define X86EMUL_MODE_PROT16 2 /* 16-bit protected mode. */
|
|
|
|
#define X86EMUL_MODE_PROT32 4 /* 32-bit protected mode. */
|
|
|
|
#define X86EMUL_MODE_PROT64 8 /* 64-bit (long) mode. */
|
|
|
|
|
|
|
|
/* Host execution mode. */
|
|
|
|
#if defined(__i386__)
|
|
|
|
#define X86EMUL_MODE_HOST X86EMUL_MODE_PROT32
|
|
|
|
#elif defined(CONFIG_X86_64)
|
[PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net
mailing list: kvm-devel@lists.sourceforge.net
(http://lists.sourceforge.net/lists/listinfo/kvm-devel)
The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture. The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace. Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.
Using this driver, one can start multiple virtual machines on a host.
Each virtual machine is a process on the host; a virtual cpu is a thread in
that process. kill(1), nice(1), top(1) work as expected. In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode. Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm). Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.
The driver supports i386 and x86_64 hosts and guests. All combinations are
allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae
and non-pae paging modes are supported.
SMP hosts and UP guests are supported. At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.
Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch. We plan to address this in two ways:
- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables
Currently a virtual desktop is responsive but consumes a lot of CPU. Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization. Linux/X is slower, probably due
to X being in a separate process.
In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.
Caveats (akpm: might no longer be true):
- The Windows install currently bluescreens due to a problem with the
virtual APIC. We are working on a fix. A temporary workaround is to
use an existing image or install through qemu
- Windows 64-bit does not work. That's also true for qemu, so it's
probably a problem with the device model.
[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago
|
|
|
#define X86EMUL_MODE_HOST X86EMUL_MODE_PROT64
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
|
|
|
* x86_emulate_memop: Emulate an instruction that faulted attempting to
|
|
|
|
* read/write a 'special' memory area.
|
|
|
|
* Returns -1 on failure, 0 on success.
|
|
|
|
*/
|
|
|
|
int x86_emulate_memop(struct x86_emulate_ctxt *ctxt,
|
|
|
|
struct x86_emulate_ops *ops);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Given the 'reg' portion of a ModRM byte, and a register block, return a
|
|
|
|
* pointer into the block that addresses the relevant register.
|
|
|
|
* @highbyte_regs specifies whether to decode AH,CH,DH,BH.
|
|
|
|
*/
|
|
|
|
void *decode_register(u8 modrm_reg, unsigned long *regs,
|
|
|
|
int highbyte_regs);
|
|
|
|
|
|
|
|
#endif /* __X86_EMULATE_H__ */
|