debuggers.hg
changeset 6993:750ad97f37b0
Split up docs. Signed-off-by: Robb Romans <3r@us.ibm.com>
line diff
1.1 --- a/docs/Makefile Tue Sep 20 09:08:26 2005 +0000 1.2 +++ b/docs/Makefile Tue Sep 20 09:17:33 2005 +0000 1.3 @@ -12,7 +12,7 @@ DOXYGEN := doxygen 1.4 1.5 pkgdocdir := /usr/share/doc/xen 1.6 1.7 -DOC_TEX := $(wildcard src/*.tex) 1.8 +DOC_TEX := src/user.tex src/interface.tex 1.9 DOC_PS := $(patsubst src/%.tex,ps/%.ps,$(DOC_TEX)) 1.10 DOC_PDF := $(patsubst src/%.tex,pdf/%.pdf,$(DOC_TEX)) 1.11 DOC_HTML := $(patsubst src/%.tex,html/%/index.html,$(DOC_TEX))
2.1 --- a/docs/src/interface.tex Tue Sep 20 09:08:26 2005 +0000 2.2 +++ b/docs/src/interface.tex Tue Sep 20 09:17:33 2005 +0000 2.3 @@ -87,1084 +87,23 @@ itself, allows the Xen framework to sepa 2.4 mechanism and policy within the system. 2.5 2.6 2.7 - 2.8 -\chapter{Virtual Architecture} 2.9 - 2.10 -On a Xen-based system, the hypervisor itself runs in {\it ring 0}. It 2.11 -has full access to the physical memory available in the system and is 2.12 -responsible for allocating portions of it to the domains. Guest 2.13 -operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as 2.14 -they see fit. Segmentation is used to prevent the guest OS from 2.15 -accessing the portion of the address space that is reserved for 2.16 -Xen. We expect most guest operating systems will use ring 1 for their 2.17 -own operation and place applications in ring 3. 2.18 - 2.19 -In this chapter we consider the basic virtual architecture provided 2.20 -by Xen: the basic CPU state, exception and interrupt handling, and 2.21 -time. Other aspects such as memory and device access are discussed 2.22 -in later chapters. 2.23 - 2.24 -\section{CPU state} 2.25 - 2.26 -All privileged state must be handled by Xen. The guest OS has no 2.27 -direct access to CR3 and is not permitted to update privileged bits in 2.28 -EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen; 2.29 -these are analogous to system calls but occur from ring 1 to ring 0. 2.30 - 2.31 -A list of all hypercalls is given in Appendix~\ref{a:hypercalls}. 2.32 - 2.33 - 2.34 - 2.35 -\section{Exceptions} 2.36 - 2.37 -A virtual IDT is provided --- a domain can submit a table of trap 2.38 -handlers to Xen via the {\tt set\_trap\_table()} hypercall. Most trap 2.39 -handlers are identical to native x86 handlers, although the page-fault 2.40 -handler is somewhat different. 2.41 - 2.42 - 2.43 -\section{Interrupts and events} 2.44 - 2.45 -Interrupts are virtualized by mapping them to \emph{events}, which are 2.46 -delivered asynchronously to the target domain using a callback 2.47 -supplied via the {\tt set\_callbacks()} hypercall. A guest OS can map 2.48 -these events onto its standard interrupt dispatch mechanisms. Xen is 2.49 -responsible for determining the target domain that will handle each 2.50 -physical interrupt source. For more details on the binding of event 2.51 -sources to events, see Chapter~\ref{c:devices}. 2.52 - 2.53 - 2.54 - 2.55 -\section{Time} 2.56 - 2.57 -Guest operating systems need to be aware of the passage of both real 2.58 -(or wallclock) time and their own `virtual time' (the time for 2.59 -which they have been executing). Furthermore, Xen has a notion of 2.60 -time which is used for scheduling. The following notions of 2.61 -time are provided: 2.62 - 2.63 -\begin{description} 2.64 -\item[Cycle counter time.] 2.65 - 2.66 -This provides a fine-grained time reference. The cycle counter time is 2.67 -used to accurately extrapolate the other time references. On SMP machines 2.68 -it is currently assumed that the cycle counter time is synchronized between 2.69 -CPUs. The current x86-based implementation achieves this within inter-CPU 2.70 -communication latencies. 2.71 - 2.72 -\item[System time.] 2.73 - 2.74 -This is a 64-bit counter which holds the number of nanoseconds that 2.75 -have elapsed since system boot. 2.76 - 2.77 - 2.78 -\item[Wall clock time.] 2.79 - 2.80 -This is the time of day in a Unix-style {\tt struct timeval} (seconds 2.81 -and microseconds since 1 January 1970, adjusted by leap seconds). An 2.82 -NTP client hosted by {\it domain 0} can keep this value accurate. 2.83 - 2.84 - 2.85 -\item[Domain virtual time.] 2.86 - 2.87 -This progresses at the same pace as system time, but only while a 2.88 -domain is executing --- it stops while a domain is de-scheduled. 2.89 -Therefore the share of the CPU that a domain receives is indicated by 2.90 -the rate at which its virtual time increases. 2.91 - 2.92 -\end{description} 2.93 - 2.94 - 2.95 -Xen exports timestamps for system time and wall-clock time to guest 2.96 -operating systems through a shared page of memory. Xen also provides 2.97 -the cycle counter time at the instant the timestamps were calculated, 2.98 -and the CPU frequency in Hertz. This allows the guest to extrapolate 2.99 -system and wall-clock times accurately based on the current cycle 2.100 -counter time. 2.101 - 2.102 -Since all time stamps need to be updated and read \emph{atomically} 2.103 -two version numbers are also stored in the shared info page. The 2.104 -first is incremented prior to an update, while the second is only 2.105 -incremented afterwards. Thus a guest can be sure that it read a consistent 2.106 -state by checking the two version numbers are equal. 2.107 - 2.108 -Xen includes a periodic ticker which sends a timer event to the 2.109 -currently executing domain every 10ms. The Xen scheduler also sends a 2.110 -timer event whenever a domain is scheduled; this allows the guest OS 2.111 -to adjust for the time that has passed while it has been inactive. In 2.112 -addition, Xen allows each domain to request that they receive a timer 2.113 -event sent at a specified system time by using the {\tt 2.114 -set\_timer\_op()} hypercall. Guest OSes may use this timer to 2.115 -implement timeout values when they block. 2.116 - 2.117 - 2.118 - 2.119 -%% % akw: demoting this to a section -- not sure if there is any point 2.120 -%% % though, maybe just remove it. 2.121 - 2.122 -\section{Xen CPU Scheduling} 2.123 - 2.124 -Xen offers a uniform API for CPU schedulers. It is possible to choose 2.125 -from a number of schedulers at boot and it should be easy to add more. 2.126 -The BVT, Atropos and Round Robin schedulers are part of the normal 2.127 -Xen distribution. BVT provides proportional fair shares of the CPU to 2.128 -the running domains. Atropos can be used to reserve absolute shares 2.129 -of the CPU for each domain. Round-robin is provided as an example of 2.130 -Xen's internal scheduler API. 2.131 - 2.132 -\paragraph*{Note: SMP host support} 2.133 -Xen has always supported SMP host systems. Domains are statically assigned to 2.134 -CPUs, either at creation time or when manually pinning to a particular CPU. 2.135 -The current schedulers then run locally on each CPU to decide which of the 2.136 -assigned domains should be run there. The user-level control software 2.137 -can be used to perform coarse-grain load-balancing between CPUs. 2.138 +%% chapter Virtual Architecture moved to architecture.tex 2.139 +\include{src/interface/architecture} 2.140 2.141 - 2.142 -%% More information on the characteristics and use of these schedulers is 2.143 -%% available in {\tt Sched-HOWTO.txt}. 2.144 - 2.145 - 2.146 -\section{Privileged operations} 2.147 - 2.148 -Xen exports an extended interface to privileged domains (viz.\ {\it 2.149 - Domain 0}). This allows such domains to build and boot other domains 2.150 -on the server, and provides control interfaces for managing 2.151 -scheduling, memory, networking, and block devices. 2.152 - 2.153 - 2.154 -\chapter{Memory} 2.155 -\label{c:memory} 2.156 - 2.157 -Xen is responsible for managing the allocation of physical memory to 2.158 -domains, and for ensuring safe use of the paging and segmentation 2.159 -hardware. 2.160 - 2.161 - 2.162 -\section{Memory Allocation} 2.163 - 2.164 - 2.165 -Xen resides within a small fixed portion of physical memory; it also 2.166 -reserves the top 64MB of every virtual address space. The remaining 2.167 -physical memory is available for allocation to domains at a page 2.168 -granularity. Xen tracks the ownership and use of each page, which 2.169 -allows it to enforce secure partitioning between domains. 2.170 - 2.171 -Each domain has a maximum and current physical memory allocation. 2.172 -A guest OS may run a `balloon driver' to dynamically adjust its 2.173 -current memory allocation up to its limit. 2.174 - 2.175 - 2.176 -%% XXX SMH: I use machine and physical in the next section (which 2.177 -%% is kinda required for consistency with code); wonder if this 2.178 -%% section should use same terms? 2.179 -%% 2.180 -%% Probably. 2.181 -%% 2.182 -%% Merging this and below section at some point prob makes sense. 2.183 - 2.184 -\section{Pseudo-Physical Memory} 2.185 - 2.186 -Since physical memory is allocated and freed on a page granularity, 2.187 -there is no guarantee that a domain will receive a contiguous stretch 2.188 -of physical memory. However most operating systems do not have good 2.189 -support for operating in a fragmented physical address space. To aid 2.190 -porting such operating systems to run on top of Xen, we make a 2.191 -distinction between \emph{machine memory} and \emph{pseudo-physical 2.192 -memory}. 2.193 - 2.194 -Put simply, machine memory refers to the entire amount of memory 2.195 -installed in the machine, including that reserved by Xen, in use by 2.196 -various domains, or currently unallocated. We consider machine memory 2.197 -to comprise a set of 4K \emph{machine page frames} numbered 2.198 -consecutively starting from 0. Machine frame numbers mean the same 2.199 -within Xen or any domain. 2.200 - 2.201 -Pseudo-physical memory, on the other hand, is a per-domain 2.202 -abstraction. It allows a guest operating system to consider its memory 2.203 -allocation to consist of a contiguous range of physical page frames 2.204 -starting at physical frame 0, despite the fact that the underlying 2.205 -machine page frames may be sparsely allocated and in any order. 2.206 - 2.207 -To achieve this, Xen maintains a globally readable {\it 2.208 -machine-to-physical} table which records the mapping from machine page 2.209 -frames to pseudo-physical ones. In addition, each domain is supplied 2.210 -with a {\it physical-to-machine} table which performs the inverse 2.211 -mapping. Clearly the machine-to-physical table has size proportional 2.212 -to the amount of RAM installed in the machine, while each 2.213 -physical-to-machine table has size proportional to the memory 2.214 -allocation of the given domain. 2.215 - 2.216 -Architecture dependent code in guest operating systems can then use 2.217 -the two tables to provide the abstraction of pseudo-physical 2.218 -memory. In general, only certain specialized parts of the operating 2.219 -system (such as page table management) needs to understand the 2.220 -difference between machine and pseudo-physical addresses. 2.221 - 2.222 -\section{Page Table Updates} 2.223 - 2.224 -In the default mode of operation, Xen enforces read-only access to 2.225 -page tables and requires guest operating systems to explicitly request 2.226 -any modifications. Xen validates all such requests and only applies 2.227 -updates that it deems safe. This is necessary to prevent domains from 2.228 -adding arbitrary mappings to their page tables. 2.229 - 2.230 -To aid validation, Xen associates a type and reference count with each 2.231 -memory page. A page has one of the following 2.232 -mutually-exclusive types at any point in time: page directory ({\sf 2.233 -PD}), page table ({\sf PT}), local descriptor table ({\sf LDT}), 2.234 -global descriptor table ({\sf GDT}), or writable ({\sf RW}). Note that 2.235 -a guest OS may always create readable mappings of its own memory 2.236 -regardless of its current type. 2.237 -%%% XXX: possibly explain more about ref count 'lifecyle' here? 2.238 -This mechanism is used to 2.239 -maintain the invariants required for safety; for example, a domain 2.240 -cannot have a writable mapping to any part of a page table as this 2.241 -would require the page concerned to simultaneously be of types {\sf 2.242 - PT} and {\sf RW}. 2.243 - 2.244 - 2.245 -%\section{Writable Page Tables} 2.246 - 2.247 -Xen also provides an alternative mode of operation in which guests be 2.248 -have the illusion that their page tables are directly writable. Of 2.249 -course this is not really the case, since Xen must still validate 2.250 -modifications to ensure secure partitioning. To this end, Xen traps 2.251 -any write attempt to a memory page of type {\sf PT} (i.e., that is 2.252 -currently part of a page table). If such an access occurs, Xen 2.253 -temporarily allows write access to that page while at the same time 2.254 -{\em disconnecting} it from the page table that is currently in 2.255 -use. This allows the guest to safely make updates to the page because 2.256 -the newly-updated entries cannot be used by the MMU until Xen 2.257 -revalidates and reconnects the page. 2.258 -Reconnection occurs automatically in a number of situations: for 2.259 -example, when the guest modifies a different page-table page, when the 2.260 -domain is preempted, or whenever the guest uses Xen's explicit 2.261 -page-table update interfaces. 2.262 - 2.263 -Finally, Xen also supports a form of \emph{shadow page tables} in 2.264 -which the guest OS uses a independent copy of page tables which are 2.265 -unknown to the hardware (i.e.\ which are never pointed to by {\tt 2.266 -cr3}). Instead Xen propagates changes made to the guest's tables to the 2.267 -real ones, and vice versa. This is useful for logging page writes 2.268 -(e.g.\ for live migration or checkpoint). A full version of the shadow 2.269 -page tables also allows guest OS porting with less effort. 2.270 - 2.271 -\section{Segment Descriptor Tables} 2.272 +%% chapter Memory moved to memory.tex 2.273 +\include{src/interface/memory} 2.274 2.275 -On boot a guest is supplied with a default GDT, which does not reside 2.276 -within its own memory allocation. If the guest wishes to use other 2.277 -than the default `flat' ring-1 and ring-3 segments that this GDT 2.278 -provides, it must register a custom GDT and/or LDT with Xen, 2.279 -allocated from its own memory. Note that a number of GDT 2.280 -entries are reserved by Xen -- any custom GDT must also include 2.281 -sufficient space for these entries. 2.282 - 2.283 -For example, the following hypercall is used to specify a new GDT: 2.284 - 2.285 -\begin{quote} 2.286 -int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em entries}) 2.287 - 2.288 -{\em frame\_list}: An array of up to 16 machine page frames within 2.289 -which the GDT resides. Any frame registered as a GDT frame may only 2.290 -be mapped read-only within the guest's address space (e.g., no 2.291 -writable mappings, no use as a page-table page, and so on). 2.292 - 2.293 -{\em entries}: The number of descriptor-entry slots in the GDT. Note 2.294 -that the table must be large enough to contain Xen's reserved entries; 2.295 -thus we must have `{\em entries $>$ LAST\_RESERVED\_GDT\_ENTRY}\ '. 2.296 -Note also that, after registering the GDT, slots {\em FIRST\_} through 2.297 -{\em LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest and 2.298 -may be overwritten by Xen. 2.299 -\end{quote} 2.300 - 2.301 -The LDT is updated via the generic MMU update mechanism (i.e., via 2.302 -the {\tt mmu\_update()} hypercall. 2.303 - 2.304 -\section{Start of Day} 2.305 - 2.306 -The start-of-day environment for guest operating systems is rather 2.307 -different to that provided by the underlying hardware. In particular, 2.308 -the processor is already executing in protected mode with paging 2.309 -enabled. 2.310 - 2.311 -{\it Domain 0} is created and booted by Xen itself. For all subsequent 2.312 -domains, the analogue of the boot-loader is the {\it domain builder}, 2.313 -user-space software running in {\it domain 0}. The domain builder 2.314 -is responsible for building the initial page tables for a domain 2.315 -and loading its kernel image at the appropriate virtual address. 2.316 - 2.317 - 2.318 - 2.319 -\chapter{Devices} 2.320 -\label{c:devices} 2.321 - 2.322 -Devices such as network and disk are exported to guests using a 2.323 -split device driver. The device driver domain, which accesses the 2.324 -physical device directly also runs a {\em backend} driver, serving 2.325 -requests to that device from guests. Each guest will use a simple 2.326 -{\em frontend} driver, to access the backend. Communication between these 2.327 -domains is composed of two parts: First, data is placed onto a shared 2.328 -memory page between the domains. Second, an event channel between the 2.329 -two domains is used to pass notification that data is outstanding. 2.330 -This separation of notification from data transfer allows message 2.331 -batching, and results in very efficient device access. 2.332 - 2.333 -Event channels are used extensively in device virtualization; each 2.334 -domain has a number of end-points or \emph{ports} each of which 2.335 -may be bound to one of the following \emph{event sources}: 2.336 -\begin{itemize} 2.337 - \item a physical interrupt from a real device, 2.338 - \item a virtual interrupt (callback) from Xen, or 2.339 - \item a signal from another domain 2.340 -\end{itemize} 2.341 - 2.342 -Events are lightweight and do not carry much information beyond 2.343 -the source of the notification. Hence when performing bulk data 2.344 -transfer, events are typically used as synchronization primitives 2.345 -over a shared memory transport. Event channels are managed via 2.346 -the {\tt event\_channel\_op()} hypercall; for more details see 2.347 -Section~\ref{s:idc}. 2.348 - 2.349 -This chapter focuses on some individual device interfaces 2.350 -available to Xen guests. 2.351 - 2.352 -\section{Network I/O} 2.353 - 2.354 -Virtual network device services are provided by shared memory 2.355 -communication with a backend domain. From the point of view of 2.356 -other domains, the backend may be viewed as a virtual ethernet switch 2.357 -element with each domain having one or more virtual network interfaces 2.358 -connected to it. 2.359 - 2.360 -\subsection{Backend Packet Handling} 2.361 - 2.362 -The backend driver is responsible for a variety of actions relating to 2.363 -the transmission and reception of packets from the physical device. 2.364 -With regard to transmission, the backend performs these key actions: 2.365 - 2.366 -\begin{itemize} 2.367 -\item {\bf Validation:} To ensure that domains do not attempt to 2.368 - generate invalid (e.g. spoofed) traffic, the backend driver may 2.369 - validate headers ensuring that source MAC and IP addresses match the 2.370 - interface that they have been sent from. 2.371 - 2.372 - Validation functions can be configured using standard firewall rules 2.373 - ({\small{\tt iptables}} in the case of Linux). 2.374 - 2.375 -\item {\bf Scheduling:} Since a number of domains can share a single 2.376 - physical network interface, the backend must mediate access when 2.377 - several domains each have packets queued for transmission. This 2.378 - general scheduling function subsumes basic shaping or rate-limiting 2.379 - schemes. 2.380 - 2.381 -\item {\bf Logging and Accounting:} The backend domain can be 2.382 - configured with classifier rules that control how packets are 2.383 - accounted or logged. For example, log messages might be generated 2.384 - whenever a domain attempts to send a TCP packet containing a SYN. 2.385 -\end{itemize} 2.386 - 2.387 -On receipt of incoming packets, the backend acts as a simple 2.388 -demultiplexer: Packets are passed to the appropriate virtual 2.389 -interface after any necessary logging and accounting have been carried 2.390 -out. 2.391 - 2.392 -\subsection{Data Transfer} 2.393 - 2.394 -Each virtual interface uses two ``descriptor rings'', one for transmit, 2.395 -the other for receive. Each descriptor identifies a block of contiguous 2.396 -physical memory allocated to the domain. 2.397 - 2.398 -The transmit ring carries packets to transmit from the guest to the 2.399 -backend domain. The return path of the transmit ring carries messages 2.400 -indicating that the contents have been physically transmitted and the 2.401 -backend no longer requires the associated pages of memory. 2.402 +%% chapter Devices moved to devices.tex 2.403 +\include{src/interface/devices} 2.404 2.405 -To receive packets, the guest places descriptors of unused pages on 2.406 -the receive ring. The backend will return received packets by 2.407 -exchanging these pages in the domain's memory with new pages 2.408 -containing the received data, and passing back descriptors regarding 2.409 -the new packets on the ring. This zero-copy approach allows the 2.410 -backend to maintain a pool of free pages to receive packets into, and 2.411 -then deliver them to appropriate domains after examining their 2.412 -headers. 2.413 - 2.414 -% 2.415 -%Real physical addresses are used throughout, with the domain performing 2.416 -%translation from pseudo-physical addresses if that is necessary. 2.417 - 2.418 -If a domain does not keep its receive ring stocked with empty buffers then 2.419 -packets destined to it may be dropped. This provides some defence against 2.420 -receive livelock problems because an overload domain will cease to receive 2.421 -further data. Similarly, on the transmit path, it provides the application 2.422 -with feedback on the rate at which packets are able to leave the system. 2.423 - 2.424 - 2.425 -Flow control on rings is achieved by including a pair of producer 2.426 -indexes on the shared ring page. Each side will maintain a private 2.427 -consumer index indicating the next outstanding message. In this 2.428 -manner, the domains cooperate to divide the ring into two message 2.429 -lists, one in each direction. Notification is decoupled from the 2.430 -immediate placement of new messages on the ring; the event channel 2.431 -will be used to generate notification when {\em either} a certain 2.432 -number of outstanding messages are queued, {\em or} a specified number 2.433 -of nanoseconds have elapsed since the oldest message was placed on the 2.434 -ring. 2.435 - 2.436 -% Not sure if my version is any better -- here is what was here before: 2.437 -%% Synchronization between the backend domain and the guest is achieved using 2.438 -%% counters held in shared memory that is accessible to both. Each ring has 2.439 -%% associated producer and consumer indices indicating the area in the ring 2.440 -%% that holds descriptors that contain data. After receiving {\it n} packets 2.441 -%% or {\t nanoseconds} after receiving the first packet, the hypervisor sends 2.442 -%% an event to the domain. 2.443 - 2.444 -\section{Block I/O} 2.445 - 2.446 -All guest OS disk access goes through the virtual block device VBD 2.447 -interface. This interface allows domains access to portions of block 2.448 -storage devices visible to the the block backend device. The VBD 2.449 -interface is a split driver, similar to the network interface 2.450 -described above. A single shared memory ring is used between the 2.451 -frontend and backend drivers, across which read and write messages are 2.452 -sent. 2.453 - 2.454 -Any block device accessible to the backend domain, including 2.455 -network-based block (iSCSI, *NBD, etc), loopback and LVM/MD devices, 2.456 -can be exported as a VBD. Each VBD is mapped to a device node in the 2.457 -guest, specified in the guest's startup configuration. 2.458 - 2.459 -Old (Xen 1.2) virtual disks are not supported under Xen 2.0, since 2.460 -similar functionality can be achieved using the more complete LVM 2.461 -system, which is already in widespread use. 2.462 - 2.463 -\subsection{Data Transfer} 2.464 - 2.465 -The single ring between the guest and the block backend supports three 2.466 -messages: 2.467 - 2.468 -\begin{description} 2.469 -\item [{\small {\tt PROBE}}:] Return a list of the VBDs available to this guest 2.470 - from the backend. The request includes a descriptor of a free page 2.471 - into which the reply will be written by the backend. 2.472 - 2.473 -\item [{\small {\tt READ}}:] Read data from the specified block device. The 2.474 - front end identifies the device and location to read from and 2.475 - attaches pages for the data to be copied to (typically via DMA from 2.476 - the device). The backend acknowledges completed read requests as 2.477 - they finish. 2.478 - 2.479 -\item [{\small {\tt WRITE}}:] Write data to the specified block device. This 2.480 - functions essentially as {\small {\tt READ}}, except that the data moves to 2.481 - the device instead of from it. 2.482 -\end{description} 2.483 - 2.484 -% um... some old text 2.485 -%% In overview, the same style of descriptor-ring that is used for 2.486 -%% network packets is used here. Each domain has one ring that carries 2.487 -%% operation requests to the hypervisor and carries the results back 2.488 -%% again. 2.489 - 2.490 -%% Rather than copying data, the backend simply maps the domain's buffers 2.491 -%% in order to enable direct DMA to them. The act of mapping the buffers 2.492 -%% also increases the reference counts of the underlying pages, so that 2.493 -%% the unprivileged domain cannot try to return them to the hypervisor, 2.494 -%% install them as page tables, or any other unsafe behaviour. 2.495 -%% %block API here 2.496 - 2.497 - 2.498 -\chapter{Further Information} 2.499 - 2.500 - 2.501 -If you have questions that are not answered by this manual, the 2.502 -sources of information listed below may be of interest to you. Note 2.503 -that bug reports, suggestions and contributions related to the 2.504 -software (or the documentation) should be sent to the Xen developers' 2.505 -mailing list (address below). 2.506 - 2.507 -\section{Other documentation} 2.508 - 2.509 -If you are mainly interested in using (rather than developing for) 2.510 -Xen, the {\em Xen Users' Manual} is distributed in the {\tt docs/} 2.511 -directory of the Xen source distribution. 2.512 - 2.513 -% Various HOWTOs are also available in {\tt docs/HOWTOS}. 2.514 - 2.515 -\section{Online references} 2.516 - 2.517 -The official Xen web site is found at: 2.518 -\begin{quote} 2.519 -{\tt http://www.cl.cam.ac.uk/Research/SRG/netos/xen/} 2.520 -\end{quote} 2.521 - 2.522 -This contains links to the latest versions of all on-line 2.523 -documentation. 2.524 - 2.525 -\section{Mailing lists} 2.526 - 2.527 -There are currently four official Xen mailing lists: 2.528 - 2.529 -\begin{description} 2.530 -\item[xen-devel@lists.xensource.com] Used for development 2.531 -discussions and bug reports. Subscribe at: \\ 2.532 -{\small {\tt http://lists.xensource.com/xen-devel}} 2.533 -\item[xen-users@lists.xensource.com] Used for installation and usage 2.534 -discussions and requests for help. Subscribe at: \\ 2.535 -{\small {\tt http://lists.xensource.com/xen-users}} 2.536 -\item[xen-announce@lists.xensource.com] Used for announcements only. 2.537 -Subscribe at: \\ 2.538 -{\small {\tt http://lists.xensource.com/xen-announce}} 2.539 -\item[xen-changelog@lists.xensource.com] Changelog feed 2.540 -from the unstable and 2.0 trees - developer oriented. Subscribe at: \\ 2.541 -{\small {\tt http://lists.xensource.com/xen-changelog}} 2.542 -\end{description} 2.543 - 2.544 -Of these, xen-devel is the most active. 2.545 - 2.546 - 2.547 +%% chapter Further Information moved to further_info.tex 2.548 +\include{src/interface/further_info} 2.549 2.550 2.551 \appendix 2.552 2.553 -%\newcommand{\hypercall}[1]{\vspace{5mm}{\large\sf #1}} 2.554 - 2.555 - 2.556 - 2.557 - 2.558 - 2.559 -\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}} 2.560 - 2.561 - 2.562 - 2.563 - 2.564 - 2.565 - 2.566 -\chapter{Xen Hypercalls} 2.567 -\label{a:hypercalls} 2.568 - 2.569 -Hypercalls represent the procedural interface to Xen; this appendix 2.570 -categorizes and describes the current set of hypercalls. 2.571 - 2.572 -\section{Invoking Hypercalls} 2.573 - 2.574 -Hypercalls are invoked in a manner analogous to system calls in a 2.575 -conventional operating system; a software interrupt is issued which 2.576 -vectors to an entry point within Xen. On x86\_32 machines the 2.577 -instruction required is {\tt int \$82}; the (real) IDT is setup so 2.578 -that this may only be issued from within ring 1. The particular 2.579 -hypercall to be invoked is contained in {\tt EAX} --- a list 2.580 -mapping these values to symbolic hypercall names can be found 2.581 -in {\tt xen/include/public/xen.h}. 2.582 - 2.583 -On some occasions a set of hypercalls will be required to carry 2.584 -out a higher-level function; a good example is when a guest 2.585 -operating wishes to context switch to a new process which 2.586 -requires updating various privileged CPU state. As an optimization 2.587 -for these cases, there is a generic mechanism to issue a set of 2.588 -hypercalls as a batch: 2.589 - 2.590 -\begin{quote} 2.591 -\hypercall{multicall(void *call\_list, int nr\_calls)} 2.592 - 2.593 -Execute a series of hypervisor calls; {\tt nr\_calls} is the length of 2.594 -the array of {\tt multicall\_entry\_t} structures pointed to be {\tt 2.595 -call\_list}. Each entry contains the hypercall operation code followed 2.596 -by up to 7 word-sized arguments. 2.597 -\end{quote} 2.598 - 2.599 -Note that multicalls are provided purely as an optimization; there is 2.600 -no requirement to use them when first porting a guest operating 2.601 -system. 2.602 - 2.603 - 2.604 -\section{Virtual CPU Setup} 2.605 - 2.606 -At start of day, a guest operating system needs to setup the virtual 2.607 -CPU it is executing on. This includes installing vectors for the 2.608 -virtual IDT so that the guest OS can handle interrupts, page faults, 2.609 -etc. However the very first thing a guest OS must setup is a pair 2.610 -of hypervisor callbacks: these are the entry points which Xen will 2.611 -use when it wishes to notify the guest OS of an occurrence. 2.612 - 2.613 -\begin{quote} 2.614 -\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long 2.615 - event\_address, unsigned long failsafe\_selector, unsigned long 2.616 - failsafe\_address) } 2.617 - 2.618 -Register the normal (``event'') and failsafe callbacks for 2.619 -event processing. In each case the code segment selector and 2.620 -address within that segment are provided. The selectors must 2.621 -have RPL 1; in XenLinux we simply use the kernel's CS for both 2.622 -{\tt event\_selector} and {\tt failsafe\_selector}. 2.623 - 2.624 -The value {\tt event\_address} specifies the address of the guest OSes 2.625 -event handling and dispatch routine; the {\tt failsafe\_address} 2.626 -specifies a separate entry point which is used only if a fault occurs 2.627 -when Xen attempts to use the normal callback. 2.628 -\end{quote} 2.629 - 2.630 - 2.631 -After installing the hypervisor callbacks, the guest OS can 2.632 -install a `virtual IDT' by using the following hypercall: 2.633 - 2.634 -\begin{quote} 2.635 -\hypercall{set\_trap\_table(trap\_info\_t *table)} 2.636 - 2.637 -Install one or more entries into the per-domain 2.638 -trap handler table (essentially a software version of the IDT). 2.639 -Each entry in the array pointed to by {\tt table} includes the 2.640 -exception vector number with the corresponding segment selector 2.641 -and entry point. Most guest OSes can use the same handlers on 2.642 -Xen as when running on the real hardware; an exception is the 2.643 -page fault handler (exception vector 14) where a modified 2.644 -stack-frame layout is used. 2.645 - 2.646 - 2.647 -\end{quote} 2.648 - 2.649 - 2.650 - 2.651 -\section{Scheduling and Timer} 2.652 - 2.653 -Domains are preemptively scheduled by Xen according to the 2.654 -parameters installed by domain 0 (see Section~\ref{s:dom0ops}). 2.655 -In addition, however, a domain may choose to explicitly 2.656 -control certain behavior with the following hypercall: 2.657 - 2.658 -\begin{quote} 2.659 -\hypercall{sched\_op(unsigned long op)} 2.660 - 2.661 -Request scheduling operation from hypervisor. The options are: {\it 2.662 -yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the 2.663 -calling domain runnable but may cause a reschedule if other domains 2.664 -are runnable. {\it block} removes the calling domain from the run 2.665 -queue and cause is to sleeps until an event is delivered to it. {\it 2.666 -shutdown} is used to end the domain's execution; the caller can 2.667 -additionally specify whether the domain should reboot, halt or 2.668 -suspend. 2.669 -\end{quote} 2.670 - 2.671 -To aid the implementation of a process scheduler within a guest OS, 2.672 -Xen provides a virtual programmable timer: 2.673 - 2.674 -\begin{quote} 2.675 -\hypercall{set\_timer\_op(uint64\_t timeout)} 2.676 - 2.677 -Request a timer event to be sent at the specified system time (time 2.678 -in nanoseconds since system boot). The hypercall actually passes the 2.679 -64-bit timeout value as a pair of 32-bit values. 2.680 - 2.681 -\end{quote} 2.682 - 2.683 -Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} 2.684 -allows block-with-timeout semantics. 2.685 - 2.686 - 2.687 -\section{Page Table Management} 2.688 - 2.689 -Since guest operating systems have read-only access to their page 2.690 -tables, Xen must be involved when making any changes. The following 2.691 -multi-purpose hypercall can be used to modify page-table entries, 2.692 -update the machine-to-physical mapping table, flush the TLB, install 2.693 -a new page-table base pointer, and more. 2.694 - 2.695 -\begin{quote} 2.696 -\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} 2.697 - 2.698 -Update the page table for the domain; a set of {\tt count} updates are 2.699 -submitted for processing in a batch, with {\tt success\_count} being 2.700 -updated to report the number of successful updates. 2.701 - 2.702 -Each element of {\tt req[]} contains a pointer (address) and value; 2.703 -the least significant 2-bits of the pointer are used to distinguish 2.704 -the type of update requested as follows: 2.705 -\begin{description} 2.706 - 2.707 -\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or 2.708 -page table entry to the associated value; Xen will check that the 2.709 -update is safe, as described in Chapter~\ref{c:memory}. 2.710 - 2.711 -\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the 2.712 - machine-to-physical table. The calling domain must own the machine 2.713 - page in question (or be privileged). 2.714 - 2.715 -\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations. 2.716 -The set of additional MMU operations is considerable, and includes 2.717 -updating {\tt cr3} (or just re-installing it for a TLB flush), 2.718 -flushing the cache, installing a new LDT, or pinning \& unpinning 2.719 -page-table pages (to ensure their reference count doesn't drop to zero 2.720 -which would require a revalidation of all entries). 2.721 - 2.722 -Further extended commands are used to deal with granting and 2.723 -acquiring page ownership; see Section~\ref{s:idc}. 2.724 - 2.725 - 2.726 -\end{description} 2.727 - 2.728 -More details on the precise format of all commands can be 2.729 -found in {\tt xen/include/public/xen.h}. 2.730 - 2.731 - 2.732 -\end{quote} 2.733 - 2.734 -Explicitly updating batches of page table entries is extremely 2.735 -efficient, but can require a number of alterations to the guest 2.736 -OS. Using the writable page table mode (Chapter~\ref{c:memory}) is 2.737 -recommended for new OS ports. 2.738 - 2.739 -Regardless of which page table update mode is being used, however, 2.740 -there are some occasions (notably handling a demand page fault) where 2.741 -a guest OS will wish to modify exactly one PTE rather than a 2.742 -batch. This is catered for by the following: 2.743 - 2.744 -\begin{quote} 2.745 -\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long 2.746 -val, \\ unsigned long flags)} 2.747 - 2.748 -Update the currently installed PTE for the page {\tt page\_nr} to 2.749 -{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification 2.750 -is safe before applying it. The {\tt flags} determine which kind 2.751 -of TLB flush, if any, should follow the update. 2.752 - 2.753 -\end{quote} 2.754 - 2.755 -Finally, sufficiently privileged domains may occasionally wish to manipulate 2.756 -the pages of others: 2.757 -\begin{quote} 2.758 - 2.759 -\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr, 2.760 -unsigned long val, unsigned long flags, uint16\_t domid)} 2.761 - 2.762 -Identical to {\tt update\_va\_mapping()} save that the pages being 2.763 -mapped must belong to the domain {\tt domid}. 2.764 - 2.765 -\end{quote} 2.766 - 2.767 -This privileged operation is currently used by backend virtual device 2.768 -drivers to safely map pages containing I/O data. 2.769 - 2.770 - 2.771 - 2.772 -\section{Segmentation Support} 2.773 - 2.774 -Xen allows guest OSes to install a custom GDT if they require it; 2.775 -this is context switched transparently whenever a domain is 2.776 -[de]scheduled. The following hypercall is effectively a 2.777 -`safe' version of {\tt lgdt}: 2.778 - 2.779 -\begin{quote} 2.780 -\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} 2.781 - 2.782 -Install a global descriptor table for a domain; {\tt frame\_list} is 2.783 -an array of up to 16 machine page frames within which the GDT resides, 2.784 -with {\tt entries} being the actual number of descriptor-entry 2.785 -slots. All page frames must be mapped read-only within the guest's 2.786 -address space, and the table must be large enough to contain Xen's 2.787 -reserved entries (see {\tt xen/include/public/arch-x86\_32.h}). 2.788 - 2.789 -\end{quote} 2.790 - 2.791 -Many guest OSes will also wish to install LDTs; this is achieved by 2.792 -using {\tt mmu\_update()} with an extended command, passing the 2.793 -linear address of the LDT base along with the number of entries. No 2.794 -special safety checks are required; Xen needs to perform this task 2.795 -simply since {\tt lldt} requires CPL 0. 2.796 - 2.797 - 2.798 -Xen also allows guest operating systems to update just an 2.799 -individual segment descriptor in the GDT or LDT: 2.800 - 2.801 -\begin{quote} 2.802 -\hypercall{update\_descriptor(unsigned long ma, unsigned long word1, 2.803 -unsigned long word2)} 2.804 - 2.805 -Update the GDT/LDT entry at machine address {\tt ma}; the new 2.806 -8-byte descriptor is stored in {\tt word1} and {\tt word2}. 2.807 -Xen performs a number of checks to ensure the descriptor is 2.808 -valid. 2.809 - 2.810 -\end{quote} 2.811 - 2.812 -Guest OSes can use the above in place of context switching entire 2.813 -LDTs (or the GDT) when the number of changing descriptors is small. 2.814 - 2.815 -\section{Context Switching} 2.816 - 2.817 -When a guest OS wishes to context switch between two processes, 2.818 -it can use the page table and segmentation hypercalls described 2.819 -above to perform the the bulk of the privileged work. In addition, 2.820 -however, it will need to invoke Xen to switch the kernel (ring 1) 2.821 -stack pointer: 2.822 - 2.823 -\begin{quote} 2.824 -\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} 2.825 - 2.826 -Request kernel stack switch from hypervisor; {\tt ss} is the new 2.827 -stack segment, which {\tt esp} is the new stack pointer. 2.828 - 2.829 -\end{quote} 2.830 - 2.831 -A final useful hypercall for context switching allows ``lazy'' 2.832 -save and restore of floating point state: 2.833 - 2.834 -\begin{quote} 2.835 -\hypercall{fpu\_taskswitch(void)} 2.836 - 2.837 -This call instructs Xen to set the {\tt TS} bit in the {\tt cr0} 2.838 -control register; this means that the next attempt to use floating 2.839 -point will cause a trap which the guest OS can trap. Typically it will 2.840 -then save/restore the FP state, and clear the {\tt TS} bit. 2.841 -\end{quote} 2.842 - 2.843 -This is provided as an optimization only; guest OSes can also choose 2.844 -to save and restore FP state on all context switches for simplicity. 2.845 - 2.846 - 2.847 -\section{Physical Memory Management} 2.848 - 2.849 -As mentioned previously, each domain has a maximum and current 2.850 -memory allocation. The maximum allocation, set at domain creation 2.851 -time, cannot be modified. However a domain can choose to reduce 2.852 -and subsequently grow its current allocation by using the 2.853 -following call: 2.854 - 2.855 -\begin{quote} 2.856 -\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list, 2.857 - unsigned long nr\_extents, unsigned int extent\_order)} 2.858 - 2.859 -Increase or decrease current memory allocation (as determined by 2.860 -the value of {\tt op}). Each invocation provides a list of 2.861 -extents each of which is $2^s$ pages in size, 2.862 -where $s$ is the value of {\tt extent\_order}. 2.863 - 2.864 -\end{quote} 2.865 - 2.866 -In addition to simply reducing or increasing the current memory 2.867 -allocation via a `balloon driver', this call is also useful for 2.868 -obtaining contiguous regions of machine memory when required (e.g. 2.869 -for certain PCI devices, or if using superpages). 2.870 - 2.871 - 2.872 -\section{Inter-Domain Communication} 2.873 -\label{s:idc} 2.874 - 2.875 -Xen provides a simple asynchronous notification mechanism via 2.876 -\emph{event channels}. Each domain has a set of end-points (or 2.877 -\emph{ports}) which may be bound to an event source (e.g. a physical 2.878 -IRQ, a virtual IRQ, or an port in another domain). When a pair of 2.879 -end-points in two different domains are bound together, then a `send' 2.880 -operation on one will cause an event to be received by the destination 2.881 -domain. 2.882 - 2.883 -The control and use of event channels involves the following hypercall: 2.884 - 2.885 -\begin{quote} 2.886 -\hypercall{event\_channel\_op(evtchn\_op\_t *op)} 2.887 - 2.888 -Inter-domain event-channel management; {\tt op} is a discriminated 2.889 -union which allows the following 7 operations: 2.890 - 2.891 -\begin{description} 2.892 - 2.893 -\item[\it alloc\_unbound:] allocate a free (unbound) local 2.894 - port and prepare for connection from a specified domain. 2.895 -\item[\it bind\_virq:] bind a local port to a virtual 2.896 -IRQ; any particular VIRQ can be bound to at most one port per domain. 2.897 -\item[\it bind\_pirq:] bind a local port to a physical IRQ; 2.898 -once more, a given pIRQ can be bound to at most one port per 2.899 -domain. Furthermore the calling domain must be sufficiently 2.900 -privileged. 2.901 -\item[\it bind\_interdomain:] construct an interdomain event 2.902 -channel; in general, the target domain must have previously allocated 2.903 -an unbound port for this channel, although this can be bypassed by 2.904 -privileged domains during domain setup. 2.905 -\item[\it close:] close an interdomain event channel. 2.906 -\item[\it send:] send an event to the remote end of a 2.907 -interdomain event channel. 2.908 -\item[\it status:] determine the current status of a local port. 2.909 -\end{description} 2.910 - 2.911 -For more details see 2.912 -{\tt xen/include/public/event\_channel.h}. 2.913 - 2.914 -\end{quote} 2.915 - 2.916 -Event channels are the fundamental communication primitive between 2.917 -Xen domains and seamlessly support SMP. However they provide little 2.918 -bandwidth for communication {\sl per se}, and hence are typically 2.919 -married with a piece of shared memory to produce effective and 2.920 -high-performance inter-domain communication. 2.921 - 2.922 -Safe sharing of memory pages between guest OSes is carried out by 2.923 -granting access on a per page basis to individual domains. This is 2.924 -achieved by using the {\tt grant\_table\_op()} hypercall. 2.925 - 2.926 -\begin{quote} 2.927 -\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)} 2.928 - 2.929 -Grant or remove access to a particular page to a particular domain. 2.930 - 2.931 -\end{quote} 2.932 - 2.933 -This is not currently widely in use by guest operating systems, but 2.934 -we intend to integrate support more fully in the near future. 2.935 - 2.936 -\section{PCI Configuration} 2.937 - 2.938 -Domains with physical device access (i.e.\ driver domains) receive 2.939 -limited access to certain PCI devices (bus address space and 2.940 -interrupts). However many guest operating systems attempt to 2.941 -determine the PCI configuration by directly access the PCI BIOS, 2.942 -which cannot be allowed for safety. 2.943 - 2.944 -Instead, Xen provides the following hypercall: 2.945 - 2.946 -\begin{quote} 2.947 -\hypercall{physdev\_op(void *physdev\_op)} 2.948 - 2.949 -Perform a PCI configuration option; depending on the value 2.950 -of {\tt physdev\_op} this can be a PCI config read, a PCI config 2.951 -write, or a small number of other queries. 2.952 - 2.953 -\end{quote} 2.954 - 2.955 - 2.956 -For examples of using {\tt physdev\_op()}, see the 2.957 -Xen-specific PCI code in the linux sparse tree. 2.958 - 2.959 -\section{Administrative Operations} 2.960 -\label{s:dom0ops} 2.961 - 2.962 -A large number of control operations are available to a sufficiently 2.963 -privileged domain (typically domain 0). These allow the creation and 2.964 -management of new domains, for example. A complete list is given 2.965 -below: for more details on any or all of these, please see 2.966 -{\tt xen/include/public/dom0\_ops.h} 2.967 - 2.968 - 2.969 -\begin{quote} 2.970 -\hypercall{dom0\_op(dom0\_op\_t *op)} 2.971 - 2.972 -Administrative domain operations for domain management. The options are: 2.973 - 2.974 -\begin{description} 2.975 -\item [\it DOM0\_CREATEDOMAIN:] create a new domain 2.976 - 2.977 -\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run 2.978 -queue. 2.979 - 2.980 -\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable 2.981 - once again. 2.982 - 2.983 -\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated 2.984 -with a domain 2.985 - 2.986 -\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain 2.987 - 2.988 -\item [\it DOM0\_SCHEDCTL:] 2.989 - 2.990 -\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain 2.991 - 2.992 -\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain 2.993 - 2.994 -\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain 2.995 - 2.996 -\item [\it DOM0\_GETPAGEFRAMEINFO:] 2.997 - 2.998 -\item [\it DOM0\_GETPAGEFRAMEINFO2:] 2.999 - 2.1000 -\item [\it DOM0\_IOPL:] set I/O privilege level 2.1001 - 2.1002 -\item [\it DOM0\_MSR:] read or write model specific registers 2.1003 - 2.1004 -\item [\it DOM0\_DEBUG:] interactively invoke the debugger 2.1005 - 2.1006 -\item [\it DOM0\_SETTIME:] set system time 2.1007 - 2.1008 -\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring 2.1009 - 2.1010 -\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU 2.1011 - 2.1012 -\item [\it DOM0\_GETTBUFS:] get information about the size and location of 2.1013 - the trace buffers (only on trace-buffer enabled builds) 2.1014 - 2.1015 -\item [\it DOM0\_PHYSINFO:] get information about the host machine 2.1016 - 2.1017 -\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions 2.1018 - 2.1019 -\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler 2.1020 - 2.1021 -\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes 2.1022 - 2.1023 -\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain 2.1024 - 2.1025 -\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain 2.1026 - 2.1027 -\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options 2.1028 -\end{description} 2.1029 -\end{quote} 2.1030 - 2.1031 -Most of the above are best understood by looking at the code 2.1032 -implementing them (in {\tt xen/common/dom0\_ops.c}) and in 2.1033 -the user-space tools that use them (mostly in {\tt tools/libxc}). 2.1034 - 2.1035 -\section{Debugging Hypercalls} 2.1036 - 2.1037 -A few additional hypercalls are mainly useful for debugging: 2.1038 - 2.1039 -\begin{quote} 2.1040 -\hypercall{console\_io(int cmd, int count, char *str)} 2.1041 - 2.1042 -Use Xen to interact with the console; operations are: 2.1043 - 2.1044 -{\it CONSOLEIO\_write}: Output count characters from buffer str. 2.1045 - 2.1046 -{\it CONSOLEIO\_read}: Input at most count characters into buffer str. 2.1047 -\end{quote} 2.1048 - 2.1049 -A pair of hypercalls allows access to the underlying debug registers: 2.1050 -\begin{quote} 2.1051 -\hypercall{set\_debugreg(int reg, unsigned long value)} 2.1052 - 2.1053 -Set debug register {\tt reg} to {\tt value} 2.1054 - 2.1055 -\hypercall{get\_debugreg(int reg)} 2.1056 - 2.1057 -Return the contents of the debug register {\tt reg} 2.1058 -\end{quote} 2.1059 - 2.1060 -And finally: 2.1061 -\begin{quote} 2.1062 -\hypercall{xen\_version(int cmd)} 2.1063 - 2.1064 -Request Xen version number. 2.1065 -\end{quote} 2.1066 - 2.1067 -This is useful to ensure that user-space tools are in sync 2.1068 -with the underlying hypervisor. 2.1069 - 2.1070 -\section{Deprecated Hypercalls} 2.1071 - 2.1072 -Xen is under constant development and refinement; as such there 2.1073 -are plans to improve the way in which various pieces of functionality 2.1074 -are exposed to guest OSes. 2.1075 - 2.1076 -\begin{quote} 2.1077 -\hypercall{vm\_assist(unsigned int cmd, unsigned int type)} 2.1078 - 2.1079 -Toggle various memory management modes (in particular wrritable page 2.1080 -tables and superpage support). 2.1081 - 2.1082 -\end{quote} 2.1083 - 2.1084 -This is likely to be replaced with mode values in the shared 2.1085 -information page since this is more resilient for resumption 2.1086 -after migration or checkpoint. 2.1087 - 2.1088 - 2.1089 - 2.1090 - 2.1091 - 2.1092 - 2.1093 +%% chapter hypercalls moved to hypercalls.tex 2.1094 +\include{src/interface/hypercalls} 2.1095 2.1096 2.1097 %% 2.1098 @@ -1173,279 +112,9 @@ after migration or checkpoint. 2.1099 %% new scheduler... not clear how many of them there are... 2.1100 %% 2.1101 2.1102 -\begin{comment} 2.1103 - 2.1104 -\chapter{Scheduling API} 2.1105 - 2.1106 -The scheduling API is used by both the schedulers described above and should 2.1107 -also be used by any new schedulers. It provides a generic interface and also 2.1108 -implements much of the ``boilerplate'' code. 2.1109 - 2.1110 -Schedulers conforming to this API are described by the following 2.1111 -structure: 2.1112 - 2.1113 -\begin{verbatim} 2.1114 -struct scheduler 2.1115 -{ 2.1116 - char *name; /* full name for this scheduler */ 2.1117 - char *opt_name; /* option name for this scheduler */ 2.1118 - unsigned int sched_id; /* ID for this scheduler */ 2.1119 - 2.1120 - int (*init_scheduler) (); 2.1121 - int (*alloc_task) (struct task_struct *); 2.1122 - void (*add_task) (struct task_struct *); 2.1123 - void (*free_task) (struct task_struct *); 2.1124 - void (*rem_task) (struct task_struct *); 2.1125 - void (*wake_up) (struct task_struct *); 2.1126 - void (*do_block) (struct task_struct *); 2.1127 - task_slice_t (*do_schedule) (s_time_t); 2.1128 - int (*control) (struct sched_ctl_cmd *); 2.1129 - int (*adjdom) (struct task_struct *, 2.1130 - struct sched_adjdom_cmd *); 2.1131 - s32 (*reschedule) (struct task_struct *); 2.1132 - void (*dump_settings) (void); 2.1133 - void (*dump_cpu_state) (int); 2.1134 - void (*dump_runq_el) (struct task_struct *); 2.1135 -}; 2.1136 -\end{verbatim} 2.1137 - 2.1138 -The only method that {\em must} be implemented is 2.1139 -{\tt do\_schedule()}. However, if there is not some implementation for the 2.1140 -{\tt wake\_up()} method then waking tasks will not get put on the runqueue! 2.1141 - 2.1142 -The fields of the above structure are described in more detail below. 2.1143 - 2.1144 -\subsubsection{name} 2.1145 - 2.1146 -The name field should point to a descriptive ASCII string. 2.1147 - 2.1148 -\subsubsection{opt\_name} 2.1149 - 2.1150 -This field is the value of the {\tt sched=} boot-time option that will select 2.1151 -this scheduler. 2.1152 - 2.1153 -\subsubsection{sched\_id} 2.1154 - 2.1155 -This is an integer that uniquely identifies this scheduler. There should be a 2.1156 -macro corrsponding to this scheduler ID in {\tt <xen/sched-if.h>}. 2.1157 - 2.1158 -\subsubsection{init\_scheduler} 2.1159 - 2.1160 -\paragraph*{Purpose} 2.1161 - 2.1162 -This is a function for performing any scheduler-specific initialisation. For 2.1163 -instance, it might allocate memory for per-CPU scheduler data and initialise it 2.1164 -appropriately. 2.1165 - 2.1166 -\paragraph*{Call environment} 2.1167 - 2.1168 -This function is called after the initialisation performed by the generic 2.1169 -layer. The function is called exactly once, for the scheduler that has been 2.1170 -selected. 2.1171 - 2.1172 -\paragraph*{Return values} 2.1173 - 2.1174 -This should return negative on failure --- this will cause an 2.1175 -immediate panic and the system will fail to boot. 2.1176 - 2.1177 -\subsubsection{alloc\_task} 2.1178 - 2.1179 -\paragraph*{Purpose} 2.1180 -Called when a {\tt task\_struct} is allocated by the generic scheduler 2.1181 -layer. A particular scheduler implementation may use this method to 2.1182 -allocate per-task data for this task. It may use the {\tt 2.1183 -sched\_priv} pointer in the {\tt task\_struct} to point to this data. 2.1184 - 2.1185 -\paragraph*{Call environment} 2.1186 -The generic layer guarantees that the {\tt sched\_priv} field will 2.1187 -remain intact from the time this method is called until the task is 2.1188 -deallocated (so long as the scheduler implementation does not change 2.1189 -it explicitly!). 2.1190 - 2.1191 -\paragraph*{Return values} 2.1192 -Negative on failure. 2.1193 - 2.1194 -\subsubsection{add\_task} 2.1195 - 2.1196 -\paragraph*{Purpose} 2.1197 - 2.1198 -Called when a task is initially added by the generic layer. 2.1199 - 2.1200 -\paragraph*{Call environment} 2.1201 - 2.1202 -The fields in the {\tt task\_struct} are now filled out and available for use. 2.1203 -Schedulers should implement appropriate initialisation of any per-task private 2.1204 -information in this method. 2.1205 - 2.1206 -\subsubsection{free\_task} 2.1207 - 2.1208 -\paragraph*{Purpose} 2.1209 - 2.1210 -Schedulers should free the space used by any associated private data 2.1211 -structures. 2.1212 - 2.1213 -\paragraph*{Call environment} 2.1214 - 2.1215 -This is called when a {\tt task\_struct} is about to be deallocated. 2.1216 -The generic layer will have done generic task removal operations and 2.1217 -(if implemented) called the scheduler's {\tt rem\_task} method before 2.1218 -this method is called. 2.1219 - 2.1220 -\subsubsection{rem\_task} 2.1221 - 2.1222 -\paragraph*{Purpose} 2.1223 - 2.1224 -This is called when a task is being removed from scheduling (but is 2.1225 -not yet being freed). 2.1226 - 2.1227 -\subsubsection{wake\_up} 2.1228 - 2.1229 -\paragraph*{Purpose} 2.1230 - 2.1231 -Called when a task is woken up, this method should put the task on the runqueue 2.1232 -(or do the scheduler-specific equivalent action). 2.1233 - 2.1234 -\paragraph*{Call environment} 2.1235 - 2.1236 -The task is already set to state RUNNING. 2.1237 - 2.1238 -\subsubsection{do\_block} 2.1239 - 2.1240 -\paragraph*{Purpose} 2.1241 - 2.1242 -This function is called when a task is blocked. This function should 2.1243 -not remove the task from the runqueue. 2.1244 - 2.1245 -\paragraph*{Call environment} 2.1246 - 2.1247 -The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to 2.1248 -TASK\_INTERRUPTIBLE on entry to this method. A call to the {\tt 2.1249 - do\_schedule} method will be made after this method returns, in 2.1250 -order to select the next task to run. 2.1251 - 2.1252 -\subsubsection{do\_schedule} 2.1253 - 2.1254 -This method must be implemented. 2.1255 - 2.1256 -\paragraph*{Purpose} 2.1257 - 2.1258 -The method is called each time a new task must be chosen for scheduling on the 2.1259 -current CPU. The current time as passed as the single argument (the current 2.1260 -task can be found using the {\tt current} macro). 2.1261 - 2.1262 -This method should select the next task to run on this CPU and set it's minimum 2.1263 -time to run as well as returning the data described below. 2.1264 - 2.1265 -This method should also take the appropriate action if the previous 2.1266 -task has blocked, e.g. removing it from the runqueue. 2.1267 - 2.1268 -\paragraph*{Call environment} 2.1269 - 2.1270 -The other fields in the {\tt task\_struct} are updated by the generic layer, 2.1271 -which also performs all Xen-specific tasks and performs the actual task switch 2.1272 -(unless the previous task has been chosen again). 2.1273 - 2.1274 -This method is called with the {\tt schedule\_lock} held for the current CPU 2.1275 -and local interrupts disabled. 2.1276 - 2.1277 -\paragraph*{Return values} 2.1278 - 2.1279 -Must return a {\tt struct task\_slice} describing what task to run and how long 2.1280 -for (at maximum). 2.1281 - 2.1282 -\subsubsection{control} 2.1283 - 2.1284 -\paragraph*{Purpose} 2.1285 - 2.1286 -This method is called for global scheduler control operations. It takes a 2.1287 -pointer to a {\tt struct sched\_ctl\_cmd}, which it should either 2.1288 -source data from or populate with data, depending on the value of the 2.1289 -{\tt direction} field. 2.1290 - 2.1291 -\paragraph*{Call environment} 2.1292 - 2.1293 -The generic layer guarantees that when this method is called, the 2.1294 -caller selected the correct scheduler ID, hence the scheduler's 2.1295 -implementation does not need to sanity-check these parts of the call. 2.1296 - 2.1297 -\paragraph*{Return values} 2.1298 - 2.1299 -This function should return the value to be passed back to user space, hence it 2.1300 -should either be 0 or an appropriate errno value. 2.1301 - 2.1302 -\subsubsection{sched\_adjdom} 2.1303 - 2.1304 -\paragraph*{Purpose} 2.1305 - 2.1306 -This method is called to adjust the scheduling parameters of a particular 2.1307 -domain, or to query their current values. The function should check 2.1308 -the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in 2.1309 -order to determine which of these operations is being performed. 2.1310 - 2.1311 -\paragraph*{Call environment} 2.1312 - 2.1313 -The generic layer guarantees that the caller has specified the correct 2.1314 -control interface version and scheduler ID and that the supplied {\tt 2.1315 -task\_struct} will not be deallocated during the call (hence it is not 2.1316 -necessary to {\tt get\_task\_struct}). 2.1317 - 2.1318 -\paragraph*{Return values} 2.1319 - 2.1320 -This function should return the value to be passed back to user space, hence it 2.1321 -should either be 0 or an appropriate errno value. 2.1322 - 2.1323 -\subsubsection{reschedule} 2.1324 - 2.1325 -\paragraph*{Purpose} 2.1326 - 2.1327 -This method is called to determine if a reschedule is required as a result of a 2.1328 -particular task. 2.1329 - 2.1330 -\paragraph*{Call environment} 2.1331 -The generic layer will cause a reschedule if the current domain is the idle 2.1332 -task or it has exceeded its minimum time slice before a reschedule. The 2.1333 -generic layer guarantees that the task passed is not currently running but is 2.1334 -on the runqueue. 2.1335 - 2.1336 -\paragraph*{Return values} 2.1337 - 2.1338 -Should return a mask of CPUs to cause a reschedule on. 2.1339 - 2.1340 -\subsubsection{dump\_settings} 2.1341 - 2.1342 -\paragraph*{Purpose} 2.1343 - 2.1344 -If implemented, this should dump any private global settings for this 2.1345 -scheduler to the console. 2.1346 - 2.1347 -\paragraph*{Call environment} 2.1348 - 2.1349 -This function is called with interrupts enabled. 2.1350 - 2.1351 -\subsubsection{dump\_cpu\_state} 2.1352 - 2.1353 -\paragraph*{Purpose} 2.1354 - 2.1355 -This method should dump any private settings for the specified CPU. 2.1356 - 2.1357 -\paragraph*{Call environment} 2.1358 - 2.1359 -This function is called with interrupts disabled and the {\tt schedule\_lock} 2.1360 -for the specified CPU held. 2.1361 - 2.1362 -\subsubsection{dump\_runq\_el} 2.1363 - 2.1364 -\paragraph*{Purpose} 2.1365 - 2.1366 -This method should dump any private settings for the specified task. 2.1367 - 2.1368 -\paragraph*{Call environment} 2.1369 - 2.1370 -This function is called with interrupts disabled and the {\tt schedule\_lock} 2.1371 -for the task's CPU held. 2.1372 - 2.1373 -\end{comment} 2.1374 - 2.1375 +%% \include{src/interface/scheduling} 2.1376 +%% scheduling information moved to scheduling.tex 2.1377 +%% still commented out 2.1378 2.1379 2.1380 2.1381 @@ -1457,74 +126,9 @@ for the task's CPU held. 2.1382 %% (and/or kip's stuff?) and write about that instead? 2.1383 %% 2.1384 2.1385 -\begin{comment} 2.1386 - 2.1387 -\chapter{Debugging} 2.1388 - 2.1389 -Xen provides tools for debugging both Xen and guest OSes. Currently, the 2.1390 -Pervasive Debugger provides a GDB stub, which provides facilities for symbolic 2.1391 -debugging of Xen itself and of OS kernels running on top of Xen. The Trace 2.1392 -Buffer provides a lightweight means to log data about Xen's internal state and 2.1393 -behaviour at runtime, for later analysis. 2.1394 - 2.1395 -\section{Pervasive Debugger} 2.1396 - 2.1397 -Information on using the pervasive debugger is available in pdb.txt. 2.1398 - 2.1399 - 2.1400 -\section{Trace Buffer} 2.1401 - 2.1402 -The trace buffer provides a means to observe Xen's operation from domain 0. 2.1403 -Trace events, inserted at key points in Xen's code, record data that can be 2.1404 -read by the {\tt xentrace} tool. Recording these events has a low overhead 2.1405 -and hence the trace buffer may be useful for debugging timing-sensitive 2.1406 -behaviours. 2.1407 - 2.1408 -\subsection{Internal API} 2.1409 - 2.1410 -To use the trace buffer functionality from within Xen, you must {\tt \#include 2.1411 -<xen/trace.h>}, which contains definitions related to the trace buffer. Trace 2.1412 -events are inserted into the buffer using the {\tt TRACE\_xD} ({\tt x} = 0, 1, 2.1413 -2, 3, 4 or 5) macros. These all take an event number, plus {\tt x} additional 2.1414 -(32-bit) data as their arguments. For trace buffer-enabled builds of Xen these 2.1415 -will insert the event ID and data into the trace buffer, along with the current 2.1416 -value of the CPU cycle-counter. For builds without the trace buffer enabled, 2.1417 -the macros expand to no-ops and thus can be left in place without incurring 2.1418 -overheads. 2.1419 - 2.1420 -\subsection{Trace-enabled builds} 2.1421 - 2.1422 -By default, the trace buffer is enabled only in debug builds (i.e. {\tt NDEBUG} 2.1423 -is not defined). It can be enabled separately by defining {\tt TRACE\_BUFFER}, 2.1424 -either in {\tt <xen/config.h>} or on the gcc command line. 2.1425 - 2.1426 -The size (in pages) of the per-CPU trace buffers can be specified using the 2.1427 -{\tt tbuf\_size=n } boot parameter to Xen. If the size is set to 0, the trace 2.1428 -buffers will be disabled. 2.1429 - 2.1430 -\subsection{Dumping trace data} 2.1431 - 2.1432 -When running a trace buffer build of Xen, trace data are written continuously 2.1433 -into the buffer data areas, with newer data overwriting older data. This data 2.1434 -can be captured using the {\tt xentrace} program in domain 0. 2.1435 - 2.1436 -The {\tt xentrace} tool uses {\tt /dev/mem} in domain 0 to map the trace 2.1437 -buffers into its address space. It then periodically polls all the buffers for 2.1438 -new data, dumping out any new records from each buffer in turn. As a result, 2.1439 -for machines with multiple (logical) CPUs, the trace buffer output will not be 2.1440 -in overall chronological order. 2.1441 - 2.1442 -The output from {\tt xentrace} can be post-processed using {\tt 2.1443 -xentrace\_cpusplit} (used to split trace data out into per-cpu log files) and 2.1444 -{\tt xentrace\_format} (used to pretty-print trace data). For the predefined 2.1445 -trace points, there is an example format file in {\tt tools/xentrace/formats }. 2.1446 - 2.1447 -For more information, see the manual pages for {\tt xentrace}, {\tt 2.1448 -xentrace\_format} and {\tt xentrace\_cpusplit}. 2.1449 - 2.1450 -\end{comment} 2.1451 - 2.1452 - 2.1453 +%% \include{src/interface/debugging} 2.1454 +%% debugging information moved to debugging.tex 2.1455 +%% still commented out 2.1456 2.1457 2.1458 \end{document}
3.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 3.2 +++ b/docs/src/interface/architecture.tex Tue Sep 20 09:17:33 2005 +0000 3.3 @@ -0,0 +1,140 @@ 3.4 +\chapter{Virtual Architecture} 3.5 + 3.6 +On a Xen-based system, the hypervisor itself runs in {\it ring 0}. It 3.7 +has full access to the physical memory available in the system and is 3.8 +responsible for allocating portions of it to the domains. Guest 3.9 +operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as 3.10 +they see fit. Segmentation is used to prevent the guest OS from 3.11 +accessing the portion of the address space that is reserved for Xen. 3.12 +We expect most guest operating systems will use ring 1 for their own 3.13 +operation and place applications in ring 3. 3.14 + 3.15 +In this chapter we consider the basic virtual architecture provided by 3.16 +Xen: the basic CPU state, exception and interrupt handling, and time. 3.17 +Other aspects such as memory and device access are discussed in later 3.18 +chapters. 3.19 + 3.20 + 3.21 +\section{CPU state} 3.22 + 3.23 +All privileged state must be handled by Xen. The guest OS has no 3.24 +direct access to CR3 and is not permitted to update privileged bits in 3.25 +EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen; 3.26 +these are analogous to system calls but occur from ring 1 to ring 0. 3.27 + 3.28 +A list of all hypercalls is given in Appendix~\ref{a:hypercalls}. 3.29 + 3.30 + 3.31 +\section{Exceptions} 3.32 + 3.33 +A virtual IDT is provided --- a domain can submit a table of trap 3.34 +handlers to Xen via the {\tt set\_trap\_table()} hypercall. Most trap 3.35 +handlers are identical to native x86 handlers, although the page-fault 3.36 +handler is somewhat different. 3.37 + 3.38 + 3.39 +\section{Interrupts and events} 3.40 + 3.41 +Interrupts are virtualized by mapping them to \emph{events}, which are 3.42 +delivered asynchronously to the target domain using a callback 3.43 +supplied via the {\tt set\_callbacks()} hypercall. A guest OS can map 3.44 +these events onto its standard interrupt dispatch mechanisms. Xen is 3.45 +responsible for determining the target domain that will handle each 3.46 +physical interrupt source. For more details on the binding of event 3.47 +sources to events, see Chapter~\ref{c:devices}. 3.48 + 3.49 + 3.50 +\section{Time} 3.51 + 3.52 +Guest operating systems need to be aware of the passage of both real 3.53 +(or wallclock) time and their own `virtual time' (the time for which 3.54 +they have been executing). Furthermore, Xen has a notion of time which 3.55 +is used for scheduling. The following notions of time are provided: 3.56 + 3.57 +\begin{description} 3.58 +\item[Cycle counter time.] 3.59 + 3.60 + This provides a fine-grained time reference. The cycle counter time 3.61 + is used to accurately extrapolate the other time references. On SMP 3.62 + machines it is currently assumed that the cycle counter time is 3.63 + synchronized between CPUs. The current x86-based implementation 3.64 + achieves this within inter-CPU communication latencies. 3.65 + 3.66 +\item[System time.] 3.67 + 3.68 + This is a 64-bit counter which holds the number of nanoseconds that 3.69 + have elapsed since system boot. 3.70 + 3.71 +\item[Wall clock time.] 3.72 + 3.73 + This is the time of day in a Unix-style {\tt struct timeval} 3.74 + (seconds and microseconds since 1 January 1970, adjusted by leap 3.75 + seconds). An NTP client hosted by {\it domain 0} can keep this 3.76 + value accurate. 3.77 + 3.78 +\item[Domain virtual time.] 3.79 + 3.80 + This progresses at the same pace as system time, but only while a 3.81 + domain is executing --- it stops while a domain is de-scheduled. 3.82 + Therefore the share of the CPU that a domain receives is indicated 3.83 + by the rate at which its virtual time increases. 3.84 + 3.85 +\end{description} 3.86 + 3.87 + 3.88 +Xen exports timestamps for system time and wall-clock time to guest 3.89 +operating systems through a shared page of memory. Xen also provides 3.90 +the cycle counter time at the instant the timestamps were calculated, 3.91 +and the CPU frequency in Hertz. This allows the guest to extrapolate 3.92 +system and wall-clock times accurately based on the current cycle 3.93 +counter time. 3.94 + 3.95 +Since all time stamps need to be updated and read \emph{atomically} 3.96 +two version numbers are also stored in the shared info page. The first 3.97 +is incremented prior to an update, while the second is only 3.98 +incremented afterwards. Thus a guest can be sure that it read a 3.99 +consistent state by checking the two version numbers are equal. 3.100 + 3.101 +Xen includes a periodic ticker which sends a timer event to the 3.102 +currently executing domain every 10ms. The Xen scheduler also sends a 3.103 +timer event whenever a domain is scheduled; this allows the guest OS 3.104 +to adjust for the time that has passed while it has been inactive. In 3.105 +addition, Xen allows each domain to request that they receive a timer 3.106 +event sent at a specified system time by using the {\tt 3.107 + set\_timer\_op()} hypercall. Guest OSes may use this timer to 3.108 +implement timeout values when they block. 3.109 + 3.110 + 3.111 + 3.112 +%% % akw: demoting this to a section -- not sure if there is any point 3.113 +%% % though, maybe just remove it. 3.114 + 3.115 +\section{Xen CPU Scheduling} 3.116 + 3.117 +Xen offers a uniform API for CPU schedulers. It is possible to choose 3.118 +from a number of schedulers at boot and it should be easy to add more. 3.119 +The BVT, Atropos and Round Robin schedulers are part of the normal Xen 3.120 +distribution. BVT provides proportional fair shares of the CPU to the 3.121 +running domains. Atropos can be used to reserve absolute shares of 3.122 +the CPU for each domain. Round-robin is provided as an example of 3.123 +Xen's internal scheduler API. 3.124 + 3.125 +\paragraph*{Note: SMP host support} 3.126 +Xen has always supported SMP host systems. Domains are statically 3.127 +assigned to CPUs, either at creation time or when manually pinning to 3.128 +a particular CPU. The current schedulers then run locally on each CPU 3.129 +to decide which of the assigned domains should be run there. The 3.130 +user-level control software can be used to perform coarse-grain 3.131 +load-balancing between CPUs. 3.132 + 3.133 + 3.134 +%% More information on the characteristics and use of these schedulers 3.135 +%% is available in {\tt Sched-HOWTO.txt}. 3.136 + 3.137 + 3.138 +\section{Privileged operations} 3.139 + 3.140 +Xen exports an extended interface to privileged domains (viz.\ {\it 3.141 + Domain 0}). This allows such domains to build and boot other domains 3.142 +on the server, and provides control interfaces for managing 3.143 +scheduling, memory, networking, and block devices.
4.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 4.2 +++ b/docs/src/interface/debugging.tex Tue Sep 20 09:17:33 2005 +0000 4.3 @@ -0,0 +1,62 @@ 4.4 +\chapter{Debugging} 4.5 + 4.6 +Xen provides tools for debugging both Xen and guest OSes. Currently, the 4.7 +Pervasive Debugger provides a GDB stub, which provides facilities for symbolic 4.8 +debugging of Xen itself and of OS kernels running on top of Xen. The Trace 4.9 +Buffer provides a lightweight means to log data about Xen's internal state and 4.10 +behaviour at runtime, for later analysis. 4.11 + 4.12 +\section{Pervasive Debugger} 4.13 + 4.14 +Information on using the pervasive debugger is available in pdb.txt. 4.15 + 4.16 + 4.17 +\section{Trace Buffer} 4.18 + 4.19 +The trace buffer provides a means to observe Xen's operation from domain 0. 4.20 +Trace events, inserted at key points in Xen's code, record data that can be 4.21 +read by the {\tt xentrace} tool. Recording these events has a low overhead 4.22 +and hence the trace buffer may be useful for debugging timing-sensitive 4.23 +behaviours. 4.24 + 4.25 +\subsection{Internal API} 4.26 + 4.27 +To use the trace buffer functionality from within Xen, you must {\tt \#include 4.28 +<xen/trace.h>}, which contains definitions related to the trace buffer. Trace 4.29 +events are inserted into the buffer using the {\tt TRACE\_xD} ({\tt x} = 0, 1, 4.30 +2, 3, 4 or 5) macros. These all take an event number, plus {\tt x} additional 4.31 +(32-bit) data as their arguments. For trace buffer-enabled builds of Xen these 4.32 +will insert the event ID and data into the trace buffer, along with the current 4.33 +value of the CPU cycle-counter. For builds without the trace buffer enabled, 4.34 +the macros expand to no-ops and thus can be left in place without incurring 4.35 +overheads. 4.36 + 4.37 +\subsection{Trace-enabled builds} 4.38 + 4.39 +By default, the trace buffer is enabled only in debug builds (i.e. {\tt NDEBUG} 4.40 +is not defined). It can be enabled separately by defining {\tt TRACE\_BUFFER}, 4.41 +either in {\tt <xen/config.h>} or on the gcc command line. 4.42 + 4.43 +The size (in pages) of the per-CPU trace buffers can be specified using the 4.44 +{\tt tbuf\_size=n } boot parameter to Xen. If the size is set to 0, the trace 4.45 +buffers will be disabled. 4.46 + 4.47 +\subsection{Dumping trace data} 4.48 + 4.49 +When running a trace buffer build of Xen, trace data are written continuously 4.50 +into the buffer data areas, with newer data overwriting older data. This data 4.51 +can be captured using the {\tt xentrace} program in domain 0. 4.52 + 4.53 +The {\tt xentrace} tool uses {\tt /dev/mem} in domain 0 to map the trace 4.54 +buffers into its address space. It then periodically polls all the buffers for 4.55 +new data, dumping out any new records from each buffer in turn. As a result, 4.56 +for machines with multiple (logical) CPUs, the trace buffer output will not be 4.57 +in overall chronological order. 4.58 + 4.59 +The output from {\tt xentrace} can be post-processed using {\tt 4.60 +xentrace\_cpusplit} (used to split trace data out into per-cpu log files) and 4.61 +{\tt xentrace\_format} (used to pretty-print trace data). For the predefined 4.62 +trace points, there is an example format file in {\tt tools/xentrace/formats }. 4.63 + 4.64 +For more information, see the manual pages for {\tt xentrace}, {\tt 4.65 +xentrace\_format} and {\tt xentrace\_cpusplit}.
5.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 5.2 +++ b/docs/src/interface/devices.tex Tue Sep 20 09:17:33 2005 +0000 5.3 @@ -0,0 +1,178 @@ 5.4 +\chapter{Devices} 5.5 +\label{c:devices} 5.6 + 5.7 +Devices such as network and disk are exported to guests using a split 5.8 +device driver. The device driver domain, which accesses the physical 5.9 +device directly also runs a \emph{backend} driver, serving requests to 5.10 +that device from guests. Each guest will use a simple \emph{frontend} 5.11 +driver, to access the backend. Communication between these domains is 5.12 +composed of two parts: First, data is placed onto a shared memory page 5.13 +between the domains. Second, an event channel between the two domains 5.14 +is used to pass notification that data is outstanding. This 5.15 +separation of notification from data transfer allows message batching, 5.16 +and results in very efficient device access. 5.17 + 5.18 +Event channels are used extensively in device virtualization; each 5.19 +domain has a number of end-points or \emph{ports} each of which may be 5.20 +bound to one of the following \emph{event sources}: 5.21 +\begin{itemize} 5.22 + \item a physical interrupt from a real device, 5.23 + \item a virtual interrupt (callback) from Xen, or 5.24 + \item a signal from another domain 5.25 +\end{itemize} 5.26 + 5.27 +Events are lightweight and do not carry much information beyond the 5.28 +source of the notification. Hence when performing bulk data transfer, 5.29 +events are typically used as synchronization primitives over a shared 5.30 +memory transport. Event channels are managed via the {\tt 5.31 + event\_channel\_op()} hypercall; for more details see 5.32 +Section~\ref{s:idc}. 5.33 + 5.34 +This chapter focuses on some individual device interfaces available to 5.35 +Xen guests. 5.36 + 5.37 + 5.38 +\section{Network I/O} 5.39 + 5.40 +Virtual network device services are provided by shared memory 5.41 +communication with a backend domain. From the point of view of other 5.42 +domains, the backend may be viewed as a virtual ethernet switch 5.43 +element with each domain having one or more virtual network interfaces 5.44 +connected to it. 5.45 + 5.46 +\subsection{Backend Packet Handling} 5.47 + 5.48 +The backend driver is responsible for a variety of actions relating to 5.49 +the transmission and reception of packets from the physical device. 5.50 +With regard to transmission, the backend performs these key actions: 5.51 + 5.52 +\begin{itemize} 5.53 +\item {\bf Validation:} To ensure that domains do not attempt to 5.54 + generate invalid (e.g. spoofed) traffic, the backend driver may 5.55 + validate headers ensuring that source MAC and IP addresses match the 5.56 + interface that they have been sent from. 5.57 + 5.58 + Validation functions can be configured using standard firewall rules 5.59 + ({\small{\tt iptables}} in the case of Linux). 5.60 + 5.61 +\item {\bf Scheduling:} Since a number of domains can share a single 5.62 + physical network interface, the backend must mediate access when 5.63 + several domains each have packets queued for transmission. This 5.64 + general scheduling function subsumes basic shaping or rate-limiting 5.65 + schemes. 5.66 + 5.67 +\item {\bf Logging and Accounting:} The backend domain can be 5.68 + configured with classifier rules that control how packets are 5.69 + accounted or logged. For example, log messages might be generated 5.70 + whenever a domain attempts to send a TCP packet containing a SYN. 5.71 +\end{itemize} 5.72 + 5.73 +On receipt of incoming packets, the backend acts as a simple 5.74 +demultiplexer: Packets are passed to the appropriate virtual interface 5.75 +after any necessary logging and accounting have been carried out. 5.76 + 5.77 +\subsection{Data Transfer} 5.78 + 5.79 +Each virtual interface uses two ``descriptor rings'', one for 5.80 +transmit, the other for receive. Each descriptor identifies a block 5.81 +of contiguous physical memory allocated to the domain. 5.82 + 5.83 +The transmit ring carries packets to transmit from the guest to the 5.84 +backend domain. The return path of the transmit ring carries messages 5.85 +indicating that the contents have been physically transmitted and the 5.86 +backend no longer requires the associated pages of memory. 5.87 + 5.88 +To receive packets, the guest places descriptors of unused pages on 5.89 +the receive ring. The backend will return received packets by 5.90 +exchanging these pages in the domain's memory with new pages 5.91 +containing the received data, and passing back descriptors regarding 5.92 +the new packets on the ring. This zero-copy approach allows the 5.93 +backend to maintain a pool of free pages to receive packets into, and 5.94 +then deliver them to appropriate domains after examining their 5.95 +headers. 5.96 + 5.97 +% Real physical addresses are used throughout, with the domain 5.98 +% performing translation from pseudo-physical addresses if that is 5.99 +% necessary. 5.100 + 5.101 +If a domain does not keep its receive ring stocked with empty buffers 5.102 +then packets destined to it may be dropped. This provides some 5.103 +defence against receive livelock problems because an overload domain 5.104 +will cease to receive further data. Similarly, on the transmit path, 5.105 +it provides the application with feedback on the rate at which packets 5.106 +are able to leave the system. 5.107 + 5.108 +Flow control on rings is achieved by including a pair of producer 5.109 +indexes on the shared ring page. Each side will maintain a private 5.110 +consumer index indicating the next outstanding message. In this 5.111 +manner, the domains cooperate to divide the ring into two message 5.112 +lists, one in each direction. Notification is decoupled from the 5.113 +immediate placement of new messages on the ring; the event channel 5.114 +will be used to generate notification when {\em either} a certain 5.115 +number of outstanding messages are queued, {\em or} a specified number 5.116 +of nanoseconds have elapsed since the oldest message was placed on the 5.117 +ring. 5.118 + 5.119 +%% Not sure if my version is any better -- here is what was here 5.120 +%% before: Synchronization between the backend domain and the guest is 5.121 +%% achieved using counters held in shared memory that is accessible to 5.122 +%% both. Each ring has associated producer and consumer indices 5.123 +%% indicating the area in the ring that holds descriptors that contain 5.124 +%% data. After receiving {\it n} packets or {\t nanoseconds} after 5.125 +%% receiving the first packet, the hypervisor sends an event to the 5.126 +%% domain. 5.127 + 5.128 + 5.129 +\section{Block I/O} 5.130 + 5.131 +All guest OS disk access goes through the virtual block device VBD 5.132 +interface. This interface allows domains access to portions of block 5.133 +storage devices visible to the the block backend device. The VBD 5.134 +interface is a split driver, similar to the network interface 5.135 +described above. A single shared memory ring is used between the 5.136 +frontend and backend drivers, across which read and write messages are 5.137 +sent. 5.138 + 5.139 +Any block device accessible to the backend domain, including 5.140 +network-based block (iSCSI, *NBD, etc), loopback and LVM/MD devices, 5.141 +can be exported as a VBD. Each VBD is mapped to a device node in the 5.142 +guest, specified in the guest's startup configuration. 5.143 + 5.144 +Old (Xen 1.2) virtual disks are not supported under Xen 2.0, since 5.145 +similar functionality can be achieved using the more complete LVM 5.146 +system, which is already in widespread use. 5.147 + 5.148 +\subsection{Data Transfer} 5.149 + 5.150 +The single ring between the guest and the block backend supports three 5.151 +messages: 5.152 + 5.153 +\begin{description} 5.154 +\item [{\small {\tt PROBE}}:] Return a list of the VBDs available to 5.155 + this guest from the backend. The request includes a descriptor of a 5.156 + free page into which the reply will be written by the backend. 5.157 + 5.158 +\item [{\small {\tt READ}}:] Read data from the specified block 5.159 + device. The front end identifies the device and location to read 5.160 + from and attaches pages for the data to be copied to (typically via 5.161 + DMA from the device). The backend acknowledges completed read 5.162 + requests as they finish. 5.163 + 5.164 +\item [{\small {\tt WRITE}}:] Write data to the specified block 5.165 + device. This functions essentially as {\small {\tt READ}}, except 5.166 + that the data moves to the device instead of from it. 5.167 +\end{description} 5.168 + 5.169 +%% um... some old text: In overview, the same style of descriptor-ring 5.170 +%% that is used for network packets is used here. Each domain has one 5.171 +%% ring that carries operation requests to the hypervisor and carries 5.172 +%% the results back again. 5.173 + 5.174 +%% Rather than copying data, the backend simply maps the domain's 5.175 +%% buffers in order to enable direct DMA to them. The act of mapping 5.176 +%% the buffers also increases the reference counts of the underlying 5.177 +%% pages, so that the unprivileged domain cannot try to return them to 5.178 +%% the hypervisor, install them as page tables, or any other unsafe 5.179 +%% behaviour. 5.180 +%% 5.181 +%% % block API here
6.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 6.2 +++ b/docs/src/interface/further_info.tex Tue Sep 20 09:17:33 2005 +0000 6.3 @@ -0,0 +1,49 @@ 6.4 +\chapter{Further Information} 6.5 + 6.6 +If you have questions that are not answered by this manual, the 6.7 +sources of information listed below may be of interest to you. Note 6.8 +that bug reports, suggestions and contributions related to the 6.9 +software (or the documentation) should be sent to the Xen developers' 6.10 +mailing list (address below). 6.11 + 6.12 + 6.13 +\section{Other documentation} 6.14 + 6.15 +If you are mainly interested in using (rather than developing for) 6.16 +Xen, the \emph{Xen Users' Manual} is distributed in the {\tt docs/} 6.17 +directory of the Xen source distribution. 6.18 + 6.19 +% Various HOWTOs are also available in {\tt docs/HOWTOS}. 6.20 + 6.21 + 6.22 +\section{Online references} 6.23 + 6.24 +The official Xen web site is found at: 6.25 +\begin{quote} 6.26 +{\tt http://www.cl.cam.ac.uk/Research/SRG/netos/xen/} 6.27 +\end{quote} 6.28 + 6.29 +This contains links to the latest versions of all on-line 6.30 +documentation. 6.31 + 6.32 + 6.33 +\section{Mailing lists} 6.34 + 6.35 +There are currently four official Xen mailing lists: 6.36 + 6.37 +\begin{description} 6.38 +\item[xen-devel@lists.xensource.com] Used for development 6.39 + discussions and bug reports. Subscribe at: \\ 6.40 + {\small {\tt http://lists.xensource.com/xen-devel}} 6.41 +\item[xen-users@lists.xensource.com] Used for installation and usage 6.42 + discussions and requests for help. Subscribe at: \\ 6.43 + {\small {\tt http://lists.xensource.com/xen-users}} 6.44 +\item[xen-announce@lists.xensource.com] Used for announcements only. 6.45 + Subscribe at: \\ 6.46 + {\small {\tt http://lists.xensource.com/xen-announce}} 6.47 +\item[xen-changelog@lists.xensource.com] Changelog feed 6.48 + from the unstable and 2.0 trees - developer oriented. Subscribe at: \\ 6.49 + {\small {\tt http://lists.xensource.com/xen-changelog}} 6.50 +\end{description} 6.51 + 6.52 +Of these, xen-devel is the most active.
7.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 7.2 +++ b/docs/src/interface/hypercalls.tex Tue Sep 20 09:17:33 2005 +0000 7.3 @@ -0,0 +1,524 @@ 7.4 + 7.5 +\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}} 7.6 + 7.7 +\chapter{Xen Hypercalls} 7.8 +\label{a:hypercalls} 7.9 + 7.10 +Hypercalls represent the procedural interface to Xen; this appendix 7.11 +categorizes and describes the current set of hypercalls. 7.12 + 7.13 +\section{Invoking Hypercalls} 7.14 + 7.15 +Hypercalls are invoked in a manner analogous to system calls in a 7.16 +conventional operating system; a software interrupt is issued which 7.17 +vectors to an entry point within Xen. On x86\_32 machines the 7.18 +instruction required is {\tt int \$82}; the (real) IDT is setup so 7.19 +that this may only be issued from within ring 1. The particular 7.20 +hypercall to be invoked is contained in {\tt EAX} --- a list 7.21 +mapping these values to symbolic hypercall names can be found 7.22 +in {\tt xen/include/public/xen.h}. 7.23 + 7.24 +On some occasions a set of hypercalls will be required to carry 7.25 +out a higher-level function; a good example is when a guest 7.26 +operating wishes to context switch to a new process which 7.27 +requires updating various privileged CPU state. As an optimization 7.28 +for these cases, there is a generic mechanism to issue a set of 7.29 +hypercalls as a batch: 7.30 + 7.31 +\begin{quote} 7.32 +\hypercall{multicall(void *call\_list, int nr\_calls)} 7.33 + 7.34 +Execute a series of hypervisor calls; {\tt nr\_calls} is the length of 7.35 +the array of {\tt multicall\_entry\_t} structures pointed to be {\tt 7.36 +call\_list}. Each entry contains the hypercall operation code followed 7.37 +by up to 7 word-sized arguments. 7.38 +\end{quote} 7.39 + 7.40 +Note that multicalls are provided purely as an optimization; there is 7.41 +no requirement to use them when first porting a guest operating 7.42 +system. 7.43 + 7.44 + 7.45 +\section{Virtual CPU Setup} 7.46 + 7.47 +At start of day, a guest operating system needs to setup the virtual 7.48 +CPU it is executing on. This includes installing vectors for the 7.49 +virtual IDT so that the guest OS can handle interrupts, page faults, 7.50 +etc. However the very first thing a guest OS must setup is a pair 7.51 +of hypervisor callbacks: these are the entry points which Xen will 7.52 +use when it wishes to notify the guest OS of an occurrence. 7.53 + 7.54 +\begin{quote} 7.55 +\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long 7.56 + event\_address, unsigned long failsafe\_selector, unsigned long 7.57 + failsafe\_address) } 7.58 + 7.59 +Register the normal (``event'') and failsafe callbacks for 7.60 +event processing. In each case the code segment selector and 7.61 +address within that segment are provided. The selectors must 7.62 +have RPL 1; in XenLinux we simply use the kernel's CS for both 7.63 +{\tt event\_selector} and {\tt failsafe\_selector}. 7.64 + 7.65 +The value {\tt event\_address} specifies the address of the guest OSes 7.66 +event handling and dispatch routine; the {\tt failsafe\_address} 7.67 +specifies a separate entry point which is used only if a fault occurs 7.68 +when Xen attempts to use the normal callback. 7.69 +\end{quote} 7.70 + 7.71 + 7.72 +After installing the hypervisor callbacks, the guest OS can 7.73 +install a `virtual IDT' by using the following hypercall: 7.74 + 7.75 +\begin{quote} 7.76 +\hypercall{set\_trap\_table(trap\_info\_t *table)} 7.77 + 7.78 +Install one or more entries into the per-domain 7.79 +trap handler table (essentially a software version of the IDT). 7.80 +Each entry in the array pointed to by {\tt table} includes the 7.81 +exception vector number with the corresponding segment selector 7.82 +and entry point. Most guest OSes can use the same handlers on 7.83 +Xen as when running on the real hardware; an exception is the 7.84 +page fault handler (exception vector 14) where a modified 7.85 +stack-frame layout is used. 7.86 + 7.87 + 7.88 +\end{quote} 7.89 + 7.90 + 7.91 + 7.92 +\section{Scheduling and Timer} 7.93 + 7.94 +Domains are preemptively scheduled by Xen according to the 7.95 +parameters installed by domain 0 (see Section~\ref{s:dom0ops}). 7.96 +In addition, however, a domain may choose to explicitly 7.97 +control certain behavior with the following hypercall: 7.98 + 7.99 +\begin{quote} 7.100 +\hypercall{sched\_op(unsigned long op)} 7.101 + 7.102 +Request scheduling operation from hypervisor. The options are: {\it 7.103 +yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the 7.104 +calling domain runnable but may cause a reschedule if other domains 7.105 +are runnable. {\it block} removes the calling domain from the run 7.106 +queue and cause is to sleeps until an event is delivered to it. {\it 7.107 +shutdown} is used to end the domain's execution; the caller can 7.108 +additionally specify whether the domain should reboot, halt or 7.109 +suspend. 7.110 +\end{quote} 7.111 + 7.112 +To aid the implementation of a process scheduler within a guest OS, 7.113 +Xen provides a virtual programmable timer: 7.114 + 7.115 +\begin{quote} 7.116 +\hypercall{set\_timer\_op(uint64\_t timeout)} 7.117 + 7.118 +Request a timer event to be sent at the specified system time (time 7.119 +in nanoseconds since system boot). The hypercall actually passes the 7.120 +64-bit timeout value as a pair of 32-bit values. 7.121 + 7.122 +\end{quote} 7.123 + 7.124 +Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} 7.125 +allows block-with-timeout semantics. 7.126 + 7.127 + 7.128 +\section{Page Table Management} 7.129 + 7.130 +Since guest operating systems have read-only access to their page 7.131 +tables, Xen must be involved when making any changes. The following 7.132 +multi-purpose hypercall can be used to modify page-table entries, 7.133 +update the machine-to-physical mapping table, flush the TLB, install 7.134 +a new page-table base pointer, and more. 7.135 + 7.136 +\begin{quote} 7.137 +\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} 7.138 + 7.139 +Update the page table for the domain; a set of {\tt count} updates are 7.140 +submitted for processing in a batch, with {\tt success\_count} being 7.141 +updated to report the number of successful updates. 7.142 + 7.143 +Each element of {\tt req[]} contains a pointer (address) and value; 7.144 +the least significant 2-bits of the pointer are used to distinguish 7.145 +the type of update requested as follows: 7.146 +\begin{description} 7.147 + 7.148 +\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or 7.149 +page table entry to the associated value; Xen will check that the 7.150 +update is safe, as described in Chapter~\ref{c:memory}. 7.151 + 7.152 +\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the 7.153 + machine-to-physical table. The calling domain must own the machine 7.154 + page in question (or be privileged). 7.155 + 7.156 +\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations. 7.157 +The set of additional MMU operations is considerable, and includes 7.158 +updating {\tt cr3} (or just re-installing it for a TLB flush), 7.159 +flushing the cache, installing a new LDT, or pinning \& unpinning 7.160 +page-table pages (to ensure their reference count doesn't drop to zero 7.161 +which would require a revalidation of all entries). 7.162 + 7.163 +Further extended commands are used to deal with granting and 7.164 +acquiring page ownership; see Section~\ref{s:idc}. 7.165 + 7.166 + 7.167 +\end{description} 7.168 + 7.169 +More details on the precise format of all commands can be 7.170 +found in {\tt xen/include/public/xen.h}. 7.171 + 7.172 + 7.173 +\end{quote} 7.174 + 7.175 +Explicitly updating batches of page table entries is extremely 7.176 +efficient, but can require a number of alterations to the guest 7.177 +OS. Using the writable page table mode (Chapter~\ref{c:memory}) is 7.178 +recommended for new OS ports. 7.179 + 7.180 +Regardless of which page table update mode is being used, however, 7.181 +there are some occasions (notably handling a demand page fault) where 7.182 +a guest OS will wish to modify exactly one PTE rather than a 7.183 +batch. This is catered for by the following: 7.184 + 7.185 +\begin{quote} 7.186 +\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long 7.187 +val, \\ unsigned long flags)} 7.188 + 7.189 +Update the currently installed PTE for the page {\tt page\_nr} to 7.190 +{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification 7.191 +is safe before applying it. The {\tt flags} determine which kind 7.192 +of TLB flush, if any, should follow the update. 7.193 + 7.194 +\end{quote} 7.195 + 7.196 +Finally, sufficiently privileged domains may occasionally wish to manipulate 7.197 +the pages of others: 7.198 +\begin{quote} 7.199 + 7.200 +\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr, 7.201 +unsigned long val, unsigned long flags, uint16\_t domid)} 7.202 + 7.203 +Identical to {\tt update\_va\_mapping()} save that the pages being 7.204 +mapped must belong to the domain {\tt domid}. 7.205 + 7.206 +\end{quote} 7.207 + 7.208 +This privileged operation is currently used by backend virtual device 7.209 +drivers to safely map pages containing I/O data. 7.210 + 7.211 + 7.212 + 7.213 +\section{Segmentation Support} 7.214 + 7.215 +Xen allows guest OSes to install a custom GDT if they require it; 7.216 +this is context switched transparently whenever a domain is 7.217 +[de]scheduled. The following hypercall is effectively a 7.218 +`safe' version of {\tt lgdt}: 7.219 + 7.220 +\begin{quote} 7.221 +\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} 7.222 + 7.223 +Install a global descriptor table for a domain; {\tt frame\_list} is 7.224 +an array of up to 16 machine page frames within which the GDT resides, 7.225 +with {\tt entries} being the actual number of descriptor-entry 7.226 +slots. All page frames must be mapped read-only within the guest's 7.227 +address space, and the table must be large enough to contain Xen's 7.228 +reserved entries (see {\tt xen/include/public/arch-x86\_32.h}). 7.229 + 7.230 +\end{quote} 7.231 + 7.232 +Many guest OSes will also wish to install LDTs; this is achieved by 7.233 +using {\tt mmu\_update()} with an extended command, passing the 7.234 +linear address of the LDT base along with the number of entries. No 7.235 +special safety checks are required; Xen needs to perform this task 7.236 +simply since {\tt lldt} requires CPL 0. 7.237 + 7.238 + 7.239 +Xen also allows guest operating systems to update just an 7.240 +individual segment descriptor in the GDT or LDT: 7.241 + 7.242 +\begin{quote} 7.243 +\hypercall{update\_descriptor(unsigned long ma, unsigned long word1, 7.244 +unsigned long word2)} 7.245 + 7.246 +Update the GDT/LDT entry at machine address {\tt ma}; the new 7.247 +8-byte descriptor is stored in {\tt word1} and {\tt word2}. 7.248 +Xen performs a number of checks to ensure the descriptor is 7.249 +valid. 7.250 + 7.251 +\end{quote} 7.252 + 7.253 +Guest OSes can use the above in place of context switching entire 7.254 +LDTs (or the GDT) when the number of changing descriptors is small. 7.255 + 7.256 +\section{Context Switching} 7.257 + 7.258 +When a guest OS wishes to context switch between two processes, 7.259 +it can use the page table and segmentation hypercalls described 7.260 +above to perform the the bulk of the privileged work. In addition, 7.261 +however, it will need to invoke Xen to switch the kernel (ring 1) 7.262 +stack pointer: 7.263 + 7.264 +\begin{quote} 7.265 +\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} 7.266 + 7.267 +Request kernel stack switch from hypervisor; {\tt ss} is the new 7.268 +stack segment, which {\tt esp} is the new stack pointer. 7.269 + 7.270 +\end{quote} 7.271 + 7.272 +A final useful hypercall for context switching allows ``lazy'' 7.273 +save and restore of floating point state: 7.274 + 7.275 +\begin{quote} 7.276 +\hypercall{fpu\_taskswitch(void)} 7.277 + 7.278 +This call instructs Xen to set the {\tt TS} bit in the {\tt cr0} 7.279 +control register; this means that the next attempt to use floating 7.280 +point will cause a trap which the guest OS can trap. Typically it will 7.281 +then save/restore the FP state, and clear the {\tt TS} bit. 7.282 +\end{quote} 7.283 + 7.284 +This is provided as an optimization only; guest OSes can also choose 7.285 +to save and restore FP state on all context switches for simplicity. 7.286 + 7.287 + 7.288 +\section{Physical Memory Management} 7.289 + 7.290 +As mentioned previously, each domain has a maximum and current 7.291 +memory allocation. The maximum allocation, set at domain creation 7.292 +time, cannot be modified. However a domain can choose to reduce 7.293 +and subsequently grow its current allocation by using the 7.294 +following call: 7.295 + 7.296 +\begin{quote} 7.297 +\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list, 7.298 + unsigned long nr\_extents, unsigned int extent\_order)} 7.299 + 7.300 +Increase or decrease current memory allocation (as determined by 7.301 +the value of {\tt op}). Each invocation provides a list of 7.302 +extents each of which is $2^s$ pages in size, 7.303 +where $s$ is the value of {\tt extent\_order}. 7.304 + 7.305 +\end{quote} 7.306 + 7.307 +In addition to simply reducing or increasing the current memory 7.308 +allocation via a `balloon driver', this call is also useful for 7.309 +obtaining contiguous regions of machine memory when required (e.g. 7.310 +for certain PCI devices, or if using superpages). 7.311 + 7.312 + 7.313 +\section{Inter-Domain Communication} 7.314 +\label{s:idc} 7.315 + 7.316 +Xen provides a simple asynchronous notification mechanism via 7.317 +\emph{event channels}. Each domain has a set of end-points (or 7.318 +\emph{ports}) which may be bound to an event source (e.g. a physical 7.319 +IRQ, a virtual IRQ, or an port in another domain). When a pair of 7.320 +end-points in two different domains are bound together, then a `send' 7.321 +operation on one will cause an event to be received by the destination 7.322 +domain. 7.323 + 7.324 +The control and use of event channels involves the following hypercall: 7.325 + 7.326 +\begin{quote} 7.327 +\hypercall{event\_channel\_op(evtchn\_op\_t *op)} 7.328 + 7.329 +Inter-domain event-channel management; {\tt op} is a discriminated 7.330 +union which allows the following 7 operations: 7.331 + 7.332 +\begin{description} 7.333 + 7.334 +\item[\it alloc\_unbound:] allocate a free (unbound) local 7.335 + port and prepare for connection from a specified domain. 7.336 +\item[\it bind\_virq:] bind a local port to a virtual 7.337 +IRQ; any particular VIRQ can be bound to at most one port per domain. 7.338 +\item[\it bind\_pirq:] bind a local port to a physical IRQ; 7.339 +once more, a given pIRQ can be bound to at most one port per 7.340 +domain. Furthermore the calling domain must be sufficiently 7.341 +privileged. 7.342 +\item[\it bind\_interdomain:] construct an interdomain event 7.343 +channel; in general, the target domain must have previously allocated 7.344 +an unbound port for this channel, although this can be bypassed by 7.345 +privileged domains during domain setup. 7.346 +\item[\it close:] close an interdomain event channel. 7.347 +\item[\it send:] send an event to the remote end of a 7.348 +interdomain event channel. 7.349 +\item[\it status:] determine the current status of a local port. 7.350 +\end{description} 7.351 + 7.352 +For more details see 7.353 +{\tt xen/include/public/event\_channel.h}. 7.354 + 7.355 +\end{quote} 7.356 + 7.357 +Event channels are the fundamental communication primitive between 7.358 +Xen domains and seamlessly support SMP. However they provide little 7.359 +bandwidth for communication {\sl per se}, and hence are typically 7.360 +married with a piece of shared memory to produce effective and 7.361 +high-performance inter-domain communication. 7.362 + 7.363 +Safe sharing of memory pages between guest OSes is carried out by 7.364 +granting access on a per page basis to individual domains. This is 7.365 +achieved by using the {\tt grant\_table\_op()} hypercall. 7.366 + 7.367 +\begin{quote} 7.368 +\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)} 7.369 + 7.370 +Grant or remove access to a particular page to a particular domain. 7.371 + 7.372 +\end{quote} 7.373 + 7.374 +This is not currently widely in use by guest operating systems, but 7.375 +we intend to integrate support more fully in the near future. 7.376 + 7.377 +\section{PCI Configuration} 7.378 + 7.379 +Domains with physical device access (i.e.\ driver domains) receive 7.380 +limited access to certain PCI devices (bus address space and 7.381 +interrupts). However many guest operating systems attempt to 7.382 +determine the PCI configuration by directly access the PCI BIOS, 7.383 +which cannot be allowed for safety. 7.384 + 7.385 +Instead, Xen provides the following hypercall: 7.386 + 7.387 +\begin{quote} 7.388 +\hypercall{physdev\_op(void *physdev\_op)} 7.389 + 7.390 +Perform a PCI configuration option; depending on the value 7.391 +of {\tt physdev\_op} this can be a PCI config read, a PCI config 7.392 +write, or a small number of other queries. 7.393 + 7.394 +\end{quote} 7.395 + 7.396 + 7.397 +For examples of using {\tt physdev\_op()}, see the 7.398 +Xen-specific PCI code in the linux sparse tree. 7.399 + 7.400 +\section{Administrative Operations} 7.401 +\label{s:dom0ops} 7.402 + 7.403 +A large number of control operations are available to a sufficiently 7.404 +privileged domain (typically domain 0). These allow the creation and 7.405 +management of new domains, for example. A complete list is given 7.406 +below: for more details on any or all of these, please see 7.407 +{\tt xen/include/public/dom0\_ops.h} 7.408 + 7.409 + 7.410 +\begin{quote} 7.411 +\hypercall{dom0\_op(dom0\_op\_t *op)} 7.412 + 7.413 +Administrative domain operations for domain management. The options are: 7.414 + 7.415 +\begin{description} 7.416 +\item [\it DOM0\_CREATEDOMAIN:] create a new domain 7.417 + 7.418 +\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run 7.419 +queue. 7.420 + 7.421 +\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable 7.422 + once again. 7.423 + 7.424 +\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated 7.425 +with a domain 7.426 + 7.427 +\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain 7.428 + 7.429 +\item [\it DOM0\_SCHEDCTL:] 7.430 + 7.431 +\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain 7.432 + 7.433 +\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain 7.434 + 7.435 +\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain 7.436 + 7.437 +\item [\it DOM0\_GETPAGEFRAMEINFO:] 7.438 + 7.439 +\item [\it DOM0\_GETPAGEFRAMEINFO2:] 7.440 + 7.441 +\item [\it DOM0\_IOPL:] set I/O privilege level 7.442 + 7.443 +\item [\it DOM0\_MSR:] read or write model specific registers 7.444 + 7.445 +\item [\it DOM0\_DEBUG:] interactively invoke the debugger 7.446 + 7.447 +\item [\it DOM0\_SETTIME:] set system time 7.448 + 7.449 +\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring 7.450 + 7.451 +\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU 7.452 + 7.453 +\item [\it DOM0\_GETTBUFS:] get information about the size and location of 7.454 + the trace buffers (only on trace-buffer enabled builds) 7.455 + 7.456 +\item [\it DOM0\_PHYSINFO:] get information about the host machine 7.457 + 7.458 +\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions 7.459 + 7.460 +\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler 7.461 + 7.462 +\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes 7.463 + 7.464 +\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain 7.465 + 7.466 +\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain 7.467 + 7.468 +\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options 7.469 +\end{description} 7.470 +\end{quote} 7.471 + 7.472 +Most of the above are best understood by looking at the code 7.473 +implementing them (in {\tt xen/common/dom0\_ops.c}) and in 7.474 +the user-space tools that use them (mostly in {\tt tools/libxc}). 7.475 + 7.476 +\section{Debugging Hypercalls} 7.477 + 7.478 +A few additional hypercalls are mainly useful for debugging: 7.479 + 7.480 +\begin{quote} 7.481 +\hypercall{console\_io(int cmd, int count, char *str)} 7.482 + 7.483 +Use Xen to interact with the console; operations are: 7.484 + 7.485 +{\it CONSOLEIO\_write}: Output count characters from buffer str. 7.486 + 7.487 +{\it CONSOLEIO\_read}: Input at most count characters into buffer str. 7.488 +\end{quote} 7.489 + 7.490 +A pair of hypercalls allows access to the underlying debug registers: 7.491 +\begin{quote} 7.492 +\hypercall{set\_debugreg(int reg, unsigned long value)} 7.493 + 7.494 +Set debug register {\tt reg} to {\tt value} 7.495 + 7.496 +\hypercall{get\_debugreg(int reg)} 7.497 + 7.498 +Return the contents of the debug register {\tt reg} 7.499 +\end{quote} 7.500 + 7.501 +And finally: 7.502 +\begin{quote} 7.503 +\hypercall{xen\_version(int cmd)} 7.504 + 7.505 +Request Xen version number. 7.506 +\end{quote} 7.507 + 7.508 +This is useful to ensure that user-space tools are in sync 7.509 +with the underlying hypervisor. 7.510 + 7.511 +\section{Deprecated Hypercalls} 7.512 + 7.513 +Xen is under constant development and refinement; as such there 7.514 +are plans to improve the way in which various pieces of functionality 7.515 +are exposed to guest OSes. 7.516 + 7.517 +\begin{quote} 7.518 +\hypercall{vm\_assist(unsigned int cmd, unsigned int type)} 7.519 + 7.520 +Toggle various memory management modes (in particular wrritable page 7.521 +tables and superpage support). 7.522 + 7.523 +\end{quote} 7.524 + 7.525 +This is likely to be replaced with mode values in the shared 7.526 +information page since this is more resilient for resumption 7.527 +after migration or checkpoint.
8.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 8.2 +++ b/docs/src/interface/memory.tex Tue Sep 20 09:17:33 2005 +0000 8.3 @@ -0,0 +1,162 @@ 8.4 +\chapter{Memory} 8.5 +\label{c:memory} 8.6 + 8.7 +Xen is responsible for managing the allocation of physical memory to 8.8 +domains, and for ensuring safe use of the paging and segmentation 8.9 +hardware. 8.10 + 8.11 + 8.12 +\section{Memory Allocation} 8.13 + 8.14 +Xen resides within a small fixed portion of physical memory; it also 8.15 +reserves the top 64MB of every virtual address space. The remaining 8.16 +physical memory is available for allocation to domains at a page 8.17 +granularity. Xen tracks the ownership and use of each page, which 8.18 +allows it to enforce secure partitioning between domains. 8.19 + 8.20 +Each domain has a maximum and current physical memory allocation. A 8.21 +guest OS may run a `balloon driver' to dynamically adjust its current 8.22 +memory allocation up to its limit. 8.23 + 8.24 + 8.25 +%% XXX SMH: I use machine and physical in the next section (which is 8.26 +%% kinda required for consistency with code); wonder if this section 8.27 +%% should use same terms? 8.28 +%% 8.29 +%% Probably. 8.30 +%% 8.31 +%% Merging this and below section at some point prob makes sense. 8.32 + 8.33 +\section{Pseudo-Physical Memory} 8.34 + 8.35 +Since physical memory is allocated and freed on a page granularity, 8.36 +there is no guarantee that a domain will receive a contiguous stretch 8.37 +of physical memory. However most operating systems do not have good 8.38 +support for operating in a fragmented physical address space. To aid 8.39 +porting such operating systems to run on top of Xen, we make a 8.40 +distinction between \emph{machine memory} and \emph{pseudo-physical 8.41 + memory}. 8.42 + 8.43 +Put simply, machine memory refers to the entire amount of memory 8.44 +installed in the machine, including that reserved by Xen, in use by 8.45 +various domains, or currently unallocated. We consider machine memory 8.46 +to comprise a set of 4K \emph{machine page frames} numbered 8.47 +consecutively starting from 0. Machine frame numbers mean the same 8.48 +within Xen or any domain. 8.49 + 8.50 +Pseudo-physical memory, on the other hand, is a per-domain 8.51 +abstraction. It allows a guest operating system to consider its memory 8.52 +allocation to consist of a contiguous range of physical page frames 8.53 +starting at physical frame 0, despite the fact that the underlying 8.54 +machine page frames may be sparsely allocated and in any order. 8.55 + 8.56 +To achieve this, Xen maintains a globally readable {\it 8.57 + machine-to-physical} table which records the mapping from machine 8.58 +page frames to pseudo-physical ones. In addition, each domain is 8.59 +supplied with a {\it physical-to-machine} table which performs the 8.60 +inverse mapping. Clearly the machine-to-physical table has size 8.61 +proportional to the amount of RAM installed in the machine, while each 8.62 +physical-to-machine table has size proportional to the memory 8.63 +allocation of the given domain. 8.64 + 8.65 +Architecture dependent code in guest operating systems can then use 8.66 +the two tables to provide the abstraction of pseudo-physical memory. 8.67 +In general, only certain specialized parts of the operating system 8.68 +(such as page table management) needs to understand the difference 8.69 +between machine and pseudo-physical addresses. 8.70 + 8.71 + 8.72 +\section{Page Table Updates} 8.73 + 8.74 +In the default mode of operation, Xen enforces read-only access to 8.75 +page tables and requires guest operating systems to explicitly request 8.76 +any modifications. Xen validates all such requests and only applies 8.77 +updates that it deems safe. This is necessary to prevent domains from 8.78 +adding arbitrary mappings to their page tables. 8.79 + 8.80 +To aid validation, Xen associates a type and reference count with each 8.81 +memory page. A page has one of the following mutually-exclusive types 8.82 +at any point in time: page directory ({\sf PD}), page table ({\sf 8.83 + PT}), local descriptor table ({\sf LDT}), global descriptor table 8.84 +({\sf GDT}), or writable ({\sf RW}). Note that a guest OS may always 8.85 +create readable mappings of its own memory regardless of its current 8.86 +type. 8.87 + 8.88 +%%% XXX: possibly explain more about ref count 'lifecyle' here? 8.89 +This mechanism is used to maintain the invariants required for safety; 8.90 +for example, a domain cannot have a writable mapping to any part of a 8.91 +page table as this would require the page concerned to simultaneously 8.92 +be of types {\sf PT} and {\sf RW}. 8.93 + 8.94 + 8.95 +% \section{Writable Page Tables} 8.96 + 8.97 +Xen also provides an alternative mode of operation in which guests be 8.98 +have the illusion that their page tables are directly writable. Of 8.99 +course this is not really the case, since Xen must still validate 8.100 +modifications to ensure secure partitioning. To this end, Xen traps 8.101 +any write attempt to a memory page of type {\sf PT} (i.e., that is 8.102 +currently part of a page table). If such an access occurs, Xen 8.103 +temporarily allows write access to that page while at the same time 8.104 +\emph{disconnecting} it from the page table that is currently in use. 8.105 +This allows the guest to safely make updates to the page because the 8.106 +newly-updated entries cannot be used by the MMU until Xen revalidates 8.107 +and reconnects the page. Reconnection occurs automatically in a 8.108 +number of situations: for example, when the guest modifies a different 8.109 +page-table page, when the domain is preempted, or whenever the guest 8.110 +uses Xen's explicit page-table update interfaces. 8.111 + 8.112 +Finally, Xen also supports a form of \emph{shadow page tables} in 8.113 +which the guest OS uses a independent copy of page tables which are 8.114 +unknown to the hardware (i.e.\ which are never pointed to by {\tt 8.115 + cr3}). Instead Xen propagates changes made to the guest's tables to 8.116 +the real ones, and vice versa. This is useful for logging page writes 8.117 +(e.g.\ for live migration or checkpoint). A full version of the shadow 8.118 +page tables also allows guest OS porting with less effort. 8.119 + 8.120 + 8.121 +\section{Segment Descriptor Tables} 8.122 + 8.123 +On boot a guest is supplied with a default GDT, which does not reside 8.124 +within its own memory allocation. If the guest wishes to use other 8.125 +than the default `flat' ring-1 and ring-3 segments that this GDT 8.126 +provides, it must register a custom GDT and/or LDT with Xen, allocated 8.127 +from its own memory. Note that a number of GDT entries are reserved by 8.128 +Xen -- any custom GDT must also include sufficient space for these 8.129 +entries. 8.130 + 8.131 +For example, the following hypercall is used to specify a new GDT: 8.132 + 8.133 +\begin{quote} 8.134 + int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em 8.135 + entries}) 8.136 + 8.137 + \emph{frame\_list}: An array of up to 16 machine page frames within 8.138 + which the GDT resides. Any frame registered as a GDT frame may only 8.139 + be mapped read-only within the guest's address space (e.g., no 8.140 + writable mappings, no use as a page-table page, and so on). 8.141 + 8.142 + \emph{entries}: The number of descriptor-entry slots in the GDT. 8.143 + Note that the table must be large enough to contain Xen's reserved 8.144 + entries; thus we must have `{\em entries $>$ 8.145 + LAST\_RESERVED\_GDT\_ENTRY}\ '. Note also that, after registering 8.146 + the GDT, slots \emph{FIRST\_} through 8.147 + \emph{LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest 8.148 + and may be overwritten by Xen. 8.149 +\end{quote} 8.150 + 8.151 +The LDT is updated via the generic MMU update mechanism (i.e., via the 8.152 +{\tt mmu\_update()} hypercall. 8.153 + 8.154 +\section{Start of Day} 8.155 + 8.156 +The start-of-day environment for guest operating systems is rather 8.157 +different to that provided by the underlying hardware. In particular, 8.158 +the processor is already executing in protected mode with paging 8.159 +enabled. 8.160 + 8.161 +{\it Domain 0} is created and booted by Xen itself. For all subsequent 8.162 +domains, the analogue of the boot-loader is the {\it domain builder}, 8.163 +user-space software running in {\it domain 0}. The domain builder is 8.164 +responsible for building the initial page tables for a domain and 8.165 +loading its kernel image at the appropriate virtual address.
9.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 9.2 +++ b/docs/src/interface/scheduling.tex Tue Sep 20 09:17:33 2005 +0000 9.3 @@ -0,0 +1,268 @@ 9.4 +\chapter{Scheduling API} 9.5 + 9.6 +The scheduling API is used by both the schedulers described above and should 9.7 +also be used by any new schedulers. It provides a generic interface and also 9.8 +implements much of the ``boilerplate'' code. 9.9 + 9.10 +Schedulers conforming to this API are described by the following 9.11 +structure: 9.12 + 9.13 +\begin{verbatim} 9.14 +struct scheduler 9.15 +{ 9.16 + char *name; /* full name for this scheduler */ 9.17 + char *opt_name; /* option name for this scheduler */ 9.18 + unsigned int sched_id; /* ID for this scheduler */ 9.19 + 9.20 + int (*init_scheduler) (); 9.21 + int (*alloc_task) (struct task_struct *); 9.22 + void (*add_task) (struct task_struct *); 9.23 + void (*free_task) (struct task_struct *); 9.24 + void (*rem_task) (struct task_struct *); 9.25 + void (*wake_up) (struct task_struct *); 9.26 + void (*do_block) (struct task_struct *); 9.27 + task_slice_t (*do_schedule) (s_time_t); 9.28 + int (*control) (struct sched_ctl_cmd *); 9.29 + int (*adjdom) (struct task_struct *, 9.30 + struct sched_adjdom_cmd *); 9.31 + s32 (*reschedule) (struct task_struct *); 9.32 + void (*dump_settings) (void); 9.33 + void (*dump_cpu_state) (int); 9.34 + void (*dump_runq_el) (struct task_struct *); 9.35 +}; 9.36 +\end{verbatim} 9.37 + 9.38 +The only method that {\em must} be implemented is 9.39 +{\tt do\_schedule()}. However, if there is not some implementation for the 9.40 +{\tt wake\_up()} method then waking tasks will not get put on the runqueue! 9.41 + 9.42 +The fields of the above structure are described in more detail below. 9.43 + 9.44 +\subsubsection{name} 9.45 + 9.46 +The name field should point to a descriptive ASCII string. 9.47 + 9.48 +\subsubsection{opt\_name} 9.49 + 9.50 +This field is the value of the {\tt sched=} boot-time option that will select 9.51 +this scheduler. 9.52 + 9.53 +\subsubsection{sched\_id} 9.54 + 9.55 +This is an integer that uniquely identifies this scheduler. There should be a 9.56 +macro corrsponding to this scheduler ID in {\tt <xen/sched-if.h>}. 9.57 + 9.58 +\subsubsection{init\_scheduler} 9.59 + 9.60 +\paragraph*{Purpose} 9.61 + 9.62 +This is a function for performing any scheduler-specific initialisation. For 9.63 +instance, it might allocate memory for per-CPU scheduler data and initialise it 9.64 +appropriately. 9.65 + 9.66 +\paragraph*{Call environment} 9.67 + 9.68 +This function is called after the initialisation performed by the generic 9.69 +layer. The function is called exactly once, for the scheduler that has been 9.70 +selected. 9.71 + 9.72 +\paragraph*{Return values} 9.73 + 9.74 +This should return negative on failure --- this will cause an 9.75 +immediate panic and the system will fail to boot. 9.76 + 9.77 +\subsubsection{alloc\_task} 9.78 + 9.79 +\paragraph*{Purpose} 9.80 +Called when a {\tt task\_struct} is allocated by the generic scheduler 9.81 +layer. A particular scheduler implementation may use this method to 9.82 +allocate per-task data for this task. It may use the {\tt 9.83 +sched\_priv} pointer in the {\tt task\_struct} to point to this data. 9.84 + 9.85 +\paragraph*{Call environment} 9.86 +The generic layer guarantees that the {\tt sched\_priv} field will 9.87 +remain intact from the time this method is called until the task is 9.88 +deallocated (so long as the scheduler implementation does not change 9.89 +it explicitly!). 9.90 + 9.91 +\paragraph*{Return values} 9.92 +Negative on failure. 9.93 + 9.94 +\subsubsection{add\_task} 9.95 + 9.96 +\paragraph*{Purpose} 9.97 + 9.98 +Called when a task is initially added by the generic layer. 9.99 + 9.100 +\paragraph*{Call environment} 9.101 + 9.102 +The fields in the {\tt task\_struct} are now filled out and available for use. 9.103 +Schedulers should implement appropriate initialisation of any per-task private 9.104 +information in this method. 9.105 + 9.106 +\subsubsection{free\_task} 9.107 + 9.108 +\paragraph*{Purpose} 9.109 + 9.110 +Schedulers should free the space used by any associated private data 9.111 +structures. 9.112 + 9.113 +\paragraph*{Call environment} 9.114 + 9.115 +This is called when a {\tt task\_struct} is about to be deallocated. 9.116 +The generic layer will have done generic task removal operations and 9.117 +(if implemented) called the scheduler's {\tt rem\_task} method before 9.118 +this method is called. 9.119 + 9.120 +\subsubsection{rem\_task} 9.121 + 9.122 +\paragraph*{Purpose} 9.123 + 9.124 +This is called when a task is being removed from scheduling (but is 9.125 +not yet being freed). 9.126 + 9.127 +\subsubsection{wake\_up} 9.128 + 9.129 +\paragraph*{Purpose} 9.130 + 9.131 +Called when a task is woken up, this method should put the task on the runqueue 9.132 +(or do the scheduler-specific equivalent action). 9.133 + 9.134 +\paragraph*{Call environment} 9.135 + 9.136 +The task is already set to state RUNNING. 9.137 + 9.138 +\subsubsection{do\_block} 9.139 + 9.140 +\paragraph*{Purpose} 9.141 + 9.142 +This function is called when a task is blocked. This function should 9.143 +not remove the task from the runqueue. 9.144 + 9.145 +\paragraph*{Call environment} 9.146 + 9.147 +The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to 9.148 +TASK\_INTERRUPTIBLE on entry to this method. A call to the {\tt 9.149 + do\_schedule} method will be made after this method returns, in 9.150 +order to select the next task to run. 9.151 + 9.152 +\subsubsection{do\_schedule} 9.153 + 9.154 +This method must be implemented. 9.155 + 9.156 +\paragraph*{Purpose} 9.157 + 9.158 +The method is called each time a new task must be chosen for scheduling on the 9.159 +current CPU. The current time as passed as the single argument (the current 9.160 +task can be found using the {\tt current} macro). 9.161 + 9.162 +This method should select the next task to run on this CPU and set it's minimum 9.163 +time to run as well as returning the data described below. 9.164 + 9.165 +This method should also take the appropriate action if the previous 9.166 +task has blocked, e.g. removing it from the runqueue. 9.167 + 9.168 +\paragraph*{Call environment} 9.169 + 9.170 +The other fields in the {\tt task\_struct} are updated by the generic layer, 9.171 +which also performs all Xen-specific tasks and performs the actual task switch 9.172 +(unless the previous task has been chosen again). 9.173 + 9.174 +This method is called with the {\tt schedule\_lock} held for the current CPU 9.175 +and local interrupts disabled. 9.176 + 9.177 +\paragraph*{Return values} 9.178 + 9.179 +Must return a {\tt struct task\_slice} describing what task to run and how long 9.180 +for (at maximum). 9.181 + 9.182 +\subsubsection{control} 9.183 + 9.184 +\paragraph*{Purpose} 9.185 + 9.186 +This method is called for global scheduler control operations. It takes a 9.187 +pointer to a {\tt struct sched\_ctl\_cmd}, which it should either 9.188 +source data from or populate with data, depending on the value of the 9.189 +{\tt direction} field. 9.190 + 9.191 +\paragraph*{Call environment} 9.192 + 9.193 +The generic layer guarantees that when this method is called, the 9.194 +caller selected the correct scheduler ID, hence the scheduler's 9.195 +implementation does not need to sanity-check these parts of the call. 9.196 + 9.197 +\paragraph*{Return values} 9.198 + 9.199 +This function should return the value to be passed back to user space, hence it 9.200 +should either be 0 or an appropriate errno value. 9.201 + 9.202 +\subsubsection{sched\_adjdom} 9.203 + 9.204 +\paragraph*{Purpose} 9.205 + 9.206 +This method is called to adjust the scheduling parameters of a particular 9.207 +domain, or to query their current values. The function should check 9.208 +the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in 9.209 +order to determine which of these operations is being performed. 9.210 + 9.211 +\paragraph*{Call environment} 9.212 + 9.213 +The generic layer guarantees that the caller has specified the correct 9.214 +control interface version and scheduler ID and that the supplied {\tt 9.215 +task\_struct} will not be deallocated during the call (hence it is not 9.216 +necessary to {\tt get\_task\_struct}). 9.217 + 9.218 +\paragraph*{Return values} 9.219 + 9.220 +This function should return the value to be passed back to user space, hence it 9.221 +should either be 0 or an appropriate errno value. 9.222 + 9.223 +\subsubsection{reschedule} 9.224 + 9.225 +\paragraph*{Purpose} 9.226 + 9.227 +This method is called to determine if a reschedule is required as a result of a 9.228 +particular task. 9.229 + 9.230 +\paragraph*{Call environment} 9.231 +The generic layer will cause a reschedule if the current domain is the idle 9.232 +task or it has exceeded its minimum time slice before a reschedule. The 9.233 +generic layer guarantees that the task passed is not currently running but is 9.234 +on the runqueue. 9.235 + 9.236 +\paragraph*{Return values} 9.237 + 9.238 +Should return a mask of CPUs to cause a reschedule on. 9.239 + 9.240 +\subsubsection{dump\_settings} 9.241 + 9.242 +\paragraph*{Purpose} 9.243 + 9.244 +If implemented, this should dump any private global settings for this 9.245 +scheduler to the console. 9.246 + 9.247 +\paragraph*{Call environment} 9.248 + 9.249 +This function is called with interrupts enabled. 9.250 + 9.251 +\subsubsection{dump\_cpu\_state} 9.252 + 9.253 +\paragraph*{Purpose} 9.254 + 9.255 +This method should dump any private settings for the specified CPU. 9.256 + 9.257 +\paragraph*{Call environment} 9.258 + 9.259 +This function is called with interrupts disabled and the {\tt schedule\_lock} 9.260 +for the specified CPU held. 9.261 + 9.262 +\subsubsection{dump\_runq\_el} 9.263 + 9.264 +\paragraph*{Purpose} 9.265 + 9.266 +This method should dump any private settings for the specified task. 9.267 + 9.268 +\paragraph*{Call environment} 9.269 + 9.270 +This function is called with interrupts disabled and the {\tt schedule\_lock} 9.271 +for the task's CPU held.
10.1 --- a/docs/src/user.tex Tue Sep 20 09:08:26 2005 +0000 10.2 +++ b/docs/src/user.tex Tue Sep 20 09:17:33 2005 +0000 10.3 @@ -59,1803 +59,36 @@ Contributions of material, suggestions a 10.4 \renewcommand{\floatpagefraction}{.8} 10.5 \setstretch{1.1} 10.6 10.7 + 10.8 \part{Introduction and Tutorial} 10.9 -\chapter{Introduction} 10.10 - 10.11 -Xen is a {\em paravirtualising} virtual machine monitor (VMM), or 10.12 -`hypervisor', for the x86 processor architecture. Xen can securely 10.13 -execute multiple virtual machines on a single physical system with 10.14 -close-to-native performance. The virtual machine technology 10.15 -facilitates enterprise-grade functionality, including: 10.16 - 10.17 -\begin{itemize} 10.18 -\item Virtual machines with performance close to native 10.19 - hardware. 10.20 -\item Live migration of running virtual machines between physical hosts. 10.21 -\item Excellent hardware support (supports most Linux device drivers). 10.22 -\item Sandboxed, restartable device drivers. 10.23 -\end{itemize} 10.24 - 10.25 -Paravirtualisation permits very high performance virtualisation, 10.26 -even on architectures like x86 that are traditionally 10.27 -very hard to virtualise. 10.28 -The drawback of this approach is that it requires operating systems to 10.29 -be {\em ported} to run on Xen. Porting an OS to run on Xen is similar 10.30 -to supporting a new hardware platform, however the process 10.31 -is simplified because the paravirtual machine architecture is very 10.32 -similar to the underlying native hardware. Even though operating system 10.33 -kernels must explicitly support Xen, a key feature is that user space 10.34 -applications and libraries {\em do not} require modification. 10.35 - 10.36 -Xen support is available for increasingly many operating systems: 10.37 -right now, Linux 2.4, Linux 2.6 and NetBSD are available for Xen 2.0. 10.38 -A FreeBSD port is undergoing testing and will be incorporated into the 10.39 -release soon. Other OS ports, including Plan 9, are in progress. We 10.40 -hope that that arch-xen patches will be incorporated into the 10.41 -mainstream releases of these operating systems in due course (as has 10.42 -already happened for NetBSD). 10.43 - 10.44 -Possible usage scenarios for Xen include: 10.45 -\begin{description} 10.46 -\item [Kernel development.] Test and debug kernel modifications in a 10.47 - sandboxed virtual machine --- no need for a separate test 10.48 - machine. 10.49 -\item [Multiple OS configurations.] Run multiple operating systems 10.50 - simultaneously, for instance for compatibility or QA purposes. 10.51 -\item [Server consolidation.] Move multiple servers onto a single 10.52 - physical host with performance and fault isolation provided at 10.53 - virtual machine boundaries. 10.54 -\item [Cluster computing.] Management at VM granularity provides more 10.55 - flexibility than separately managing each physical host, but 10.56 - better control and isolation than single-system image solutions, 10.57 - particularly by using live migration for load balancing. 10.58 -\item [Hardware support for custom OSes.] Allow development of new OSes 10.59 - while benefiting from the wide-ranging hardware support of 10.60 - existing OSes such as Linux. 10.61 -\end{description} 10.62 - 10.63 -\section{Structure of a Xen-Based System} 10.64 - 10.65 -A Xen system has multiple layers, the lowest and most privileged of 10.66 -which is Xen itself. 10.67 -Xen in turn may host multiple {\em guest} operating systems, each of 10.68 -which is executed within a secure virtual machine (in Xen terminology, 10.69 -a {\em domain}). Domains are scheduled by Xen to make effective use of 10.70 -the available physical CPUs. Each guest OS manages its own 10.71 -applications, which includes responsibility for scheduling each 10.72 -application within the time allotted to the VM by Xen. 10.73 - 10.74 -The first domain, {\em domain 0}, is created automatically when the 10.75 -system boots and has special management privileges. Domain 0 builds 10.76 -other domains and manages their virtual devices. It also performs 10.77 -administrative tasks such as suspending, resuming and migrating other 10.78 -virtual machines. 10.79 - 10.80 -Within domain 0, a process called \emph{xend} runs to manage the system. 10.81 -\Xend is responsible for managing virtual machines and providing access 10.82 -to their consoles. Commands are issued to \xend over an HTTP 10.83 -interface, either from a command-line tool or from a web browser. 10.84 - 10.85 -\section{Hardware Support} 10.86 - 10.87 -Xen currently runs only on the x86 architecture, requiring a `P6' or 10.88 -newer processor (e.g. Pentium Pro, Celeron, Pentium II, Pentium III, 10.89 -Pentium IV, Xeon, AMD Athlon, AMD Duron). Multiprocessor machines are 10.90 -supported, and we also have basic support for HyperThreading (SMT), 10.91 -although this remains a topic for ongoing research. A port 10.92 -specifically for x86/64 is in progress, although Xen already runs on 10.93 -such systems in 32-bit legacy mode. In addition a port to the IA64 10.94 -architecture is approaching completion. We hope to add other 10.95 -architectures such as PPC and ARM in due course. 10.96 - 10.97 - 10.98 -Xen can currently use up to 4GB of memory. It is possible for x86 10.99 -machines to address up to 64GB of physical memory but there are no 10.100 -current plans to support these systems: The x86/64 port is the 10.101 -planned route to supporting larger memory sizes. 10.102 - 10.103 -Xen offloads most of the hardware support issues to the guest OS 10.104 -running in Domain~0. Xen itself contains only the code required to 10.105 -detect and start secondary processors, set up interrupt routing, and 10.106 -perform PCI bus enumeration. Device drivers run within a privileged 10.107 -guest OS rather than within Xen itself. This approach provides 10.108 -compatibility with the majority of device hardware supported by Linux. 10.109 -The default XenLinux build contains support for relatively modern 10.110 -server-class network and disk hardware, but you can add support for 10.111 -other hardware by configuring your XenLinux kernel in the normal way. 10.112 - 10.113 -\section{History} 10.114 - 10.115 -Xen was originally developed by the Systems Research Group at the 10.116 -University of Cambridge Computer Laboratory as part of the XenoServers 10.117 -project, funded by the UK-EPSRC. 10.118 -XenoServers aim to provide a `public infrastructure for 10.119 -global distributed computing', and Xen plays a key part in that, 10.120 -allowing us to efficiently partition a single machine to enable 10.121 -multiple independent clients to run their operating systems and 10.122 -applications in an environment providing protection, resource 10.123 -isolation and accounting. The project web page contains further 10.124 -information along with pointers to papers and technical reports: 10.125 -\path{http://www.cl.cam.ac.uk/xeno} 10.126 - 10.127 -Xen has since grown into a fully-fledged project in its own right, 10.128 -enabling us to investigate interesting research issues regarding the 10.129 -best techniques for virtualising resources such as the CPU, memory, 10.130 -disk and network. The project has been bolstered by support from 10.131 -Intel Research Cambridge, and HP Labs, who are now working closely 10.132 -with us. 10.133 - 10.134 -Xen was first described in a paper presented at SOSP in 10.135 -2003\footnote{\tt 10.136 -http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf}, and the first 10.137 -public release (1.0) was made that October. Since then, Xen has 10.138 -significantly matured and is now used in production scenarios on 10.139 -many sites. 10.140 - 10.141 -Xen 2.0 features greatly enhanced hardware support, configuration 10.142 -flexibility, usability and a larger complement of supported operating 10.143 -systems. This latest release takes Xen a step closer to becoming the 10.144 -definitive open source solution for virtualisation. 10.145 - 10.146 -\chapter{Installation} 10.147 - 10.148 -The Xen distribution includes three main components: Xen itself, ports 10.149 -of Linux 2.4 and 2.6 and NetBSD to run on Xen, and the user-space 10.150 -tools required to manage a Xen-based system. This chapter describes 10.151 -how to install the Xen 2.0 distribution from source. Alternatively, 10.152 -there may be pre-built packages available as part of your operating 10.153 -system distribution. 10.154 - 10.155 -\section{Prerequisites} 10.156 -\label{sec:prerequisites} 10.157 - 10.158 -The following is a full list of prerequisites. Items marked `$\dag$' 10.159 -are required by the \xend control tools, and hence required if you 10.160 -want to run more than one virtual machine; items marked `$*$' are only 10.161 -required if you wish to build from source. 10.162 -\begin{itemize} 10.163 -\item A working Linux distribution using the GRUB bootloader and 10.164 -running on a P6-class (or newer) CPU. 10.165 -\item [$\dag$] The \path{iproute2} package. 10.166 -\item [$\dag$] The Linux bridge-utils\footnote{Available from 10.167 -{\tt http://bridge.sourceforge.net}} (e.g., \path{/sbin/brctl}) 10.168 -\item [$\dag$] An installation of Twisted v1.3 or 10.169 -above\footnote{Available from {\tt 10.170 -http://www.twistedmatrix.com}}. There may be a binary package 10.171 -available for your distribution; alternatively it can be installed by 10.172 -running `{\sl make install-twisted}' in the root of the Xen source 10.173 -tree. 10.174 -\item [$*$] Build tools (gcc v3.2.x or v3.3.x, binutils, GNU make). 10.175 -\item [$*$] Development installation of libcurl (e.g., libcurl-devel) 10.176 -\item [$*$] Development installation of zlib (e.g., zlib-dev). 10.177 -\item [$*$] Development installation of Python v2.2 or later (e.g., python-dev). 10.178 -\item [$*$] \LaTeX and transfig are required to build the documentation. 10.179 -\end{itemize} 10.180 - 10.181 -Once you have satisfied the relevant prerequisites, you can 10.182 -now install either a binary or source distribution of Xen. 10.183 - 10.184 -\section{Installing from Binary Tarball} 10.185 - 10.186 -Pre-built tarballs are available for download from the Xen 10.187 -download page 10.188 -\begin{quote} 10.189 -{\tt http://xen.sf.net} 10.190 -\end{quote} 10.191 - 10.192 -Once you've downloaded the tarball, simply unpack and install: 10.193 -\begin{verbatim} 10.194 -# tar zxvf xen-2.0-install.tgz 10.195 -# cd xen-2.0-install 10.196 -# sh ./install.sh 10.197 -\end{verbatim} 10.198 - 10.199 -Once you've installed the binaries you need to configure 10.200 -your system as described in Section~\ref{s:configure}. 10.201 - 10.202 -\section{Installing from Source} 10.203 - 10.204 -This section describes how to obtain, build, and install 10.205 -Xen from source. 10.206 - 10.207 -\subsection{Obtaining the Source} 10.208 - 10.209 -The Xen source tree is available as either a compressed source tar 10.210 -ball or as a clone of our master BitKeeper repository. 10.211 - 10.212 -\begin{description} 10.213 -\item[Obtaining the Source Tarball]\mbox{} \\ 10.214 -Stable versions (and daily snapshots) of the Xen source tree are 10.215 -available as compressed tarballs from the Xen download page 10.216 -\begin{quote} 10.217 -{\tt http://xen.sf.net} 10.218 -\end{quote} 10.219 - 10.220 -\item[Using BitKeeper]\mbox{} \\ 10.221 -If you wish to install Xen from a clone of our latest BitKeeper 10.222 -repository then you will need to install the BitKeeper tools. 10.223 -Download instructions for BitKeeper can be obtained by filling out the 10.224 -form at: 10.225 - 10.226 -\begin{quote} 10.227 -{\tt http://www.bitmover.com/cgi-bin/download.cgi} 10.228 -\end{quote} 10.229 -The public master BK repository for the 2.0 release lives at: 10.230 -\begin{quote} 10.231 -{\tt bk://xen.bkbits.net/xen-2.0.bk} 10.232 -\end{quote} 10.233 -You can use BitKeeper to 10.234 -download it and keep it updated with the latest features and fixes. 10.235 - 10.236 -Change to the directory in which you want to put the source code, then 10.237 -run: 10.238 -\begin{verbatim} 10.239 -# bk clone bk://xen.bkbits.net/xen-2.0.bk 10.240 -\end{verbatim} 10.241 - 10.242 -Under your current directory, a new directory named \path{xen-2.0.bk} 10.243 -has been created, which contains all the source code for Xen, the OS 10.244 -ports, and the control tools. You can update your repository with the 10.245 -latest changes at any time by running: 10.246 -\begin{verbatim} 10.247 -# cd xen-2.0.bk # to change into the local repository 10.248 -# bk pull # to update the repository 10.249 -\end{verbatim} 10.250 -\end{description} 10.251 - 10.252 -%\section{The distribution} 10.253 -% 10.254 -%The Xen source code repository is structured as follows: 10.255 -% 10.256 -%\begin{description} 10.257 -%\item[\path{tools/}] Xen node controller daemon (Xend), command line tools, 10.258 -% control libraries 10.259 -%\item[\path{xen/}] The Xen VMM. 10.260 -%\item[\path{linux-*-xen-sparse/}] Xen support for Linux. 10.261 -%\item[\path{linux-*-patches/}] Experimental patches for Linux. 10.262 -%\item[\path{netbsd-*-xen-sparse/}] Xen support for NetBSD. 10.263 -%\item[\path{docs/}] Various documentation files for users and developers. 10.264 -%\item[\path{extras/}] Bonus extras. 10.265 -%\end{description} 10.266 - 10.267 -\subsection{Building from Source} 10.268 - 10.269 -The top-level Xen Makefile includes a target `world' that will do the 10.270 -following: 10.271 - 10.272 -\begin{itemize} 10.273 -\item Build Xen 10.274 -\item Build the control tools, including \xend 10.275 -\item Download (if necessary) and unpack the Linux 2.6 source code, 10.276 - and patch it for use with Xen 10.277 -\item Build a Linux kernel to use in domain 0 and a smaller 10.278 - unprivileged kernel, which can optionally be used for 10.279 - unprivileged virtual machines. 10.280 -\end{itemize} 10.281 - 10.282 - 10.283 -After the build has completed you should have a top-level 10.284 -directory called \path{dist/} in which all resulting targets 10.285 -will be placed; of particular interest are the two kernels 10.286 -XenLinux kernel images, one with a `-xen0' extension 10.287 -which contains hardware device drivers and drivers for Xen's virtual 10.288 -devices, and one with a `-xenU' extension that just contains the 10.289 -virtual ones. These are found in \path{dist/install/boot/} along 10.290 -with the image for Xen itself and the configuration files used 10.291 -during the build. 10.292 10.293 -The NetBSD port can be built using: 10.294 -\begin{quote} 10.295 -\begin{verbatim} 10.296 -# make netbsd20 10.297 -\end{verbatim} 10.298 -\end{quote} 10.299 -NetBSD port is built using a snapshot of the netbsd-2-0 cvs branch. 10.300 -The snapshot is downloaded as part of the build process, if it is not 10.301 -yet present in the \path{NETBSD\_SRC\_PATH} search path. The build 10.302 -process also downloads a toolchain which includes all the tools 10.303 -necessary to build the NetBSD kernel under Linux. 10.304 - 10.305 -To customize further the set of kernels built you need to edit 10.306 -the top-level Makefile. Look for the line: 10.307 - 10.308 -\begin{quote} 10.309 -\begin{verbatim} 10.310 -KERNELS ?= mk.linux-2.6-xen0 mk.linux-2.6-xenU 10.311 -\end{verbatim} 10.312 -\end{quote} 10.313 - 10.314 -You can edit this line to include any set of operating system kernels 10.315 -which have configurations in the top-level \path{buildconfigs/} 10.316 -directory, for example \path{mk.linux-2.4-xenU} to build a Linux 2.4 10.317 -kernel containing only virtual device drivers. 10.318 - 10.319 -%% Inspect the Makefile if you want to see what goes on during a build. 10.320 -%% Building Xen and the tools is straightforward, but XenLinux is more 10.321 -%% complicated. The makefile needs a `pristine' Linux kernel tree to which 10.322 -%% it will then add the Xen architecture files. You can tell the 10.323 -%% makefile the location of the appropriate Linux compressed tar file by 10.324 -%% setting the LINUX\_SRC environment variable, e.g. \\ 10.325 -%% \verb!# LINUX_SRC=/tmp/linux-2.6.11.tar.bz2 make world! \\ or by 10.326 -%% placing the tar file somewhere in the search path of {\tt 10.327 -%% LINUX\_SRC\_PATH} which defaults to `{\tt .:..}'. If the makefile 10.328 -%% can't find a suitable kernel tar file it attempts to download it from 10.329 -%% kernel.org (this won't work if you're behind a firewall). 10.330 - 10.331 -%% After untaring the pristine kernel tree, the makefile uses the {\tt 10.332 -%% mkbuildtree} script to add the Xen patches to the kernel. 10.333 - 10.334 - 10.335 -%% The procedure is similar to build the Linux 2.4 port: \\ 10.336 -%% \verb!# LINUX_SRC=/path/to/linux2.4/source make linux24! 10.337 - 10.338 - 10.339 -%% \framebox{\parbox{5in}{ 10.340 -%% {\bf Distro specific:} \\ 10.341 -%% {\it Gentoo} --- if not using udev (most installations, currently), you'll need 10.342 -%% to enable devfs and devfs mount at boot time in the xen0 config. 10.343 -%% }} 10.344 - 10.345 -\subsection{Custom XenLinux Builds} 10.346 - 10.347 -% If you have an SMP machine you may wish to give the {\tt '-j4'} 10.348 -% argument to make to get a parallel build. 10.349 - 10.350 -If you wish to build a customized XenLinux kernel (e.g. to support 10.351 -additional devices or enable distribution-required features), you can 10.352 -use the standard Linux configuration mechanisms, specifying that the 10.353 -architecture being built for is \path{xen}, e.g: 10.354 -\begin{quote} 10.355 -\begin{verbatim} 10.356 -# cd linux-2.6.11-xen0 10.357 -# make ARCH=xen xconfig 10.358 -# cd .. 10.359 -# make 10.360 -\end{verbatim} 10.361 -\end{quote} 10.362 - 10.363 -You can also copy an existing Linux configuration (\path{.config}) 10.364 -into \path{linux-2.6.11-xen0} and execute: 10.365 -\begin{quote} 10.366 -\begin{verbatim} 10.367 -# make ARCH=xen oldconfig 10.368 -\end{verbatim} 10.369 -\end{quote} 10.370 - 10.371 -You may be prompted with some Xen-specific options; we 10.372 -advise accepting the defaults for these options. 10.373 - 10.374 -Note that the only difference between the two types of Linux kernel 10.375 -that are built is the configuration file used for each. The "U" 10.376 -suffixed (unprivileged) versions don't contain any of the physical 10.377 -hardware device drivers, leading to a 30\% reduction in size; hence 10.378 -you may prefer these for your non-privileged domains. The `0' 10.379 -suffixed privileged versions can be used to boot the system, as well 10.380 -as in driver domains and unprivileged domains. 10.381 - 10.382 - 10.383 -\subsection{Installing the Binaries} 10.384 - 10.385 - 10.386 -The files produced by the build process are stored under the 10.387 -\path{dist/install/} directory. To install them in their default 10.388 -locations, do: 10.389 -\begin{quote} 10.390 -\begin{verbatim} 10.391 -# make install 10.392 -\end{verbatim} 10.393 -\end{quote} 10.394 - 10.395 - 10.396 -Alternatively, users with special installation requirements may wish 10.397 -to install them manually by copying the files to their appropriate 10.398 -destinations. 10.399 - 10.400 -%% Files in \path{install/boot/} include: 10.401 -%% \begin{itemize} 10.402 -%% \item \path{install/boot/xen-2.0.gz} Link to the Xen 'kernel' 10.403 -%% \item \path{install/boot/vmlinuz-2.6-xen0} Link to domain 0 XenLinux kernel 10.404 -%% \item \path{install/boot/vmlinuz-2.6-xenU} Link to unprivileged XenLinux kernel 10.405 -%% \end{itemize} 10.406 - 10.407 -The \path{dist/install/boot} directory will also contain the config files 10.408 -used for building the XenLinux kernels, and also versions of Xen and 10.409 -XenLinux kernels that contain debug symbols (\path{xen-syms-2.0.6} and 10.410 -\path{vmlinux-syms-2.6.11.11-xen0}) which are essential for interpreting crash 10.411 -dumps. Retain these files as the developers may wish to see them if 10.412 -you post on the mailing list. 10.413 - 10.414 - 10.415 - 10.416 - 10.417 - 10.418 -\section{Configuration} 10.419 -\label{s:configure} 10.420 -Once you have built and installed the Xen distribution, it is 10.421 -simple to prepare the machine for booting and running Xen. 10.422 - 10.423 -\subsection{GRUB Configuration} 10.424 - 10.425 -An entry should be added to \path{grub.conf} (often found under 10.426 -\path{/boot/} or \path{/boot/grub/}) to allow Xen / XenLinux to boot. 10.427 -This file is sometimes called \path{menu.lst}, depending on your 10.428 -distribution. The entry should look something like the following: 10.429 - 10.430 -{\small 10.431 -\begin{verbatim} 10.432 -title Xen 2.0 / XenLinux 2.6 10.433 - kernel /boot/xen-2.0.gz dom0_mem=131072 10.434 - module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro console=tty0 10.435 -\end{verbatim} 10.436 -} 10.437 +%% Chapter Introduction moved to introduction.tex 10.438 +\include{src/user/introduction} 10.439 10.440 -The kernel line tells GRUB where to find Xen itself and what boot 10.441 -parameters should be passed to it (in this case, setting domain 0's 10.442 -memory allocation in kilobytes and the settings for the serial port). For more 10.443 -details on the various Xen boot parameters see Section~\ref{s:xboot}. 10.444 - 10.445 -The module line of the configuration describes the location of the 10.446 -XenLinux kernel that Xen should start and the parameters that should 10.447 -be passed to it (these are standard Linux parameters, identifying the 10.448 -root device and specifying it be initially mounted read only and 10.449 -instructing that console output be sent to the screen). Some 10.450 -distributions such as SuSE do not require the \path{ro} parameter. 10.451 - 10.452 -%% \framebox{\parbox{5in}{ 10.453 -%% {\bf Distro specific:} \\ 10.454 -%% {\it SuSE} --- Omit the {\tt ro} option from the XenLinux kernel 10.455 -%% command line, since the partition won't be remounted rw during boot. 10.456 -%% }} 10.457 - 10.458 - 10.459 -If you want to use an initrd, just add another \path{module} line to 10.460 -the configuration, as usual: 10.461 -{\small 10.462 -\begin{verbatim} 10.463 - module /boot/my_initrd.gz 10.464 -\end{verbatim} 10.465 -} 10.466 - 10.467 -As always when installing a new kernel, it is recommended that you do 10.468 -not delete existing menu options from \path{menu.lst} --- you may want 10.469 -to boot your old Linux kernel in future, particularly if you 10.470 -have problems. 10.471 - 10.472 - 10.473 -\subsection{Serial Console (optional)} 10.474 - 10.475 -%% kernel /boot/xen-2.0.gz dom0_mem=131072 com1=115200,8n1 10.476 -%% module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro 10.477 - 10.478 - 10.479 -In order to configure Xen serial console output, it is necessary to add 10.480 -an boot option to your GRUB config; e.g. replace the above kernel line 10.481 -with: 10.482 -\begin{quote} 10.483 -{\small 10.484 -\begin{verbatim} 10.485 - kernel /boot/xen.gz dom0_mem=131072 com1=115200,8n1 10.486 -\end{verbatim}} 10.487 -\end{quote} 10.488 - 10.489 -This configures Xen to output on COM1 at 115,200 baud, 8 data bits, 10.490 -1 stop bit and no parity. Modify these parameters for your set up. 10.491 - 10.492 -One can also configure XenLinux to share the serial console; to 10.493 -achieve this append ``\path{console=ttyS0}'' to your 10.494 -module line. 10.495 - 10.496 - 10.497 -If you wish to be able to log in over the XenLinux serial console it 10.498 -is necessary to add a line into \path{/etc/inittab}, just as per 10.499 -regular Linux. Simply add the line: 10.500 -\begin{quote} 10.501 -{\small 10.502 -{\tt c:2345:respawn:/sbin/mingetty ttyS0} 10.503 -} 10.504 -\end{quote} 10.505 - 10.506 -and you should be able to log in. Note that to successfully log in 10.507 -as root over the serial line will require adding \path{ttyS0} to 10.508 -\path{/etc/securetty} in most modern distributions. 10.509 - 10.510 -\subsection{TLS Libraries} 10.511 - 10.512 -Users of the XenLinux 2.6 kernel should disable Thread Local Storage 10.513 -(e.g.\ by doing a \path{mv /lib/tls /lib/tls.disabled}) before 10.514 -attempting to run with a XenLinux kernel\footnote{If you boot without first 10.515 -disabling TLS, you will get a warning message during the boot 10.516 -process. In this case, simply perform the rename after the machine is 10.517 -up and then run \texttt{/sbin/ldconfig} to make it take effect.}. You can 10.518 -always reenable it by restoring the directory to its original location 10.519 -(i.e.\ \path{mv /lib/tls.disabled /lib/tls}). 10.520 - 10.521 -The reason for this is that the current TLS implementation uses 10.522 -segmentation in a way that is not permissible under Xen. If TLS is 10.523 -not disabled, an emulation mode is used within Xen which reduces 10.524 -performance substantially. 10.525 - 10.526 -We hope that this issue can be resolved by working with Linux 10.527 -distribution vendors to implement a minor backward-compatible change 10.528 -to the TLS library. 10.529 - 10.530 -\section{Booting Xen} 10.531 - 10.532 -It should now be possible to restart the system and use Xen. Reboot 10.533 -as usual but choose the new Xen option when the Grub screen appears. 10.534 - 10.535 -What follows should look much like a conventional Linux boot. The 10.536 -first portion of the output comes from Xen itself, supplying low level 10.537 -information about itself and the machine it is running on. The 10.538 -following portion of the output comes from XenLinux. 10.539 - 10.540 -You may see some errors during the XenLinux boot. These are not 10.541 -necessarily anything to worry about --- they may result from kernel 10.542 -configuration differences between your XenLinux kernel and the one you 10.543 -usually use. 10.544 - 10.545 -When the boot completes, you should be able to log into your system as 10.546 -usual. If you are unable to log in to your system running Xen, you 10.547 -should still be able to reboot with your normal Linux kernel. 10.548 - 10.549 - 10.550 -\chapter{Starting Additional Domains} 10.551 - 10.552 -The first step in creating a new domain is to prepare a root 10.553 -filesystem for it to boot off. Typically, this might be stored in a 10.554 -normal partition, an LVM or other volume manager partition, a disk 10.555 -file or on an NFS server. A simple way to do this is simply to boot 10.556 -from your standard OS install CD and install the distribution into 10.557 -another partition on your hard drive. 10.558 - 10.559 -To start the \xend control daemon, type 10.560 -\begin{quote} 10.561 -\verb!# xend start! 10.562 -\end{quote} 10.563 -If you 10.564 -wish the daemon to start automatically, see the instructions in 10.565 -Section~\ref{s:xend}. Once the daemon is running, you can use the 10.566 -\path{xm} tool to monitor and maintain the domains running on your 10.567 -system. This chapter provides only a brief tutorial: we provide full 10.568 -details of the \path{xm} tool in the next chapter. 10.569 - 10.570 -%\section{From the web interface} 10.571 -% 10.572 -%Boot the Xen machine and start Xensv (see Chapter~\ref{cha:xensv} for 10.573 -%more details) using the command: \\ 10.574 -%\verb_# xensv start_ \\ 10.575 -%This will also start Xend (see Chapter~\ref{cha:xend} for more information). 10.576 -% 10.577 -%The domain management interface will then be available at {\tt 10.578 -%http://your\_machine:8080/}. This provides a user friendly wizard for 10.579 -%starting domains and functions for managing running domains. 10.580 -% 10.581 -%\section{From the command line} 10.582 - 10.583 - 10.584 -\section{Creating a Domain Configuration File} 10.585 +%% Chapter Installation moved to installation.tex 10.586 +\include{src/user/installation} 10.587 10.588 -Before you can start an additional domain, you must create a 10.589 -configuration file. We provide two example files which you 10.590 -can use as a starting point: 10.591 -\begin{itemize} 10.592 - \item \path{/etc/xen/xmexample1} is a simple template configuration file 10.593 - for describing a single VM. 10.594 - 10.595 - \item \path{/etc/xen/xmexample2} file is a template description that 10.596 - is intended to be reused for multiple virtual machines. Setting 10.597 - the value of the \path{vmid} variable on the \path{xm} command line 10.598 - fills in parts of this template. 10.599 -\end{itemize} 10.600 - 10.601 -Copy one of these files and edit it as appropriate. 10.602 -Typical values you may wish to edit include: 10.603 - 10.604 -\begin{quote} 10.605 -\begin{description} 10.606 -\item[kernel] Set this to the path of the kernel you compiled for use 10.607 - with Xen (e.g.\ \path{kernel = '/boot/vmlinuz-2.6-xenU'}) 10.608 -\item[memory] Set this to the size of the domain's memory in 10.609 -megabytes (e.g.\ \path{memory = 64}) 10.610 -\item[disk] Set the first entry in this list to calculate the offset 10.611 -of the domain's root partition, based on the domain ID. Set the 10.612 -second to the location of \path{/usr} if you are sharing it between 10.613 -domains (e.g.\ \path{disk = ['phy:your\_hard\_drive\%d,sda1,w' \% 10.614 -(base\_partition\_number + vmid), 'phy:your\_usr\_partition,sda6,r' ]} 10.615 -\item[dhcp] Uncomment the dhcp variable, so that the domain will 10.616 -receive its IP address from a DHCP server (e.g.\ \path{dhcp='dhcp'}) 10.617 -\end{description} 10.618 -\end{quote} 10.619 - 10.620 -You may also want to edit the {\bf vif} variable in order to choose 10.621 -the MAC address of the virtual ethernet interface yourself. For 10.622 -example: 10.623 -\begin{quote} 10.624 -\verb_vif = ['mac=00:06:AA:F6:BB:B3']_ 10.625 -\end{quote} 10.626 -If you do not set this variable, \xend will automatically generate a 10.627 -random MAC address from an unused range. 10.628 - 10.629 - 10.630 -\section{Booting the Domain} 10.631 - 10.632 -The \path{xm} tool provides a variety of commands for managing domains. 10.633 -Use the \path{create} command to start new domains. Assuming you've 10.634 -created a configuration file \path{myvmconf} based around 10.635 -\path{/etc/xen/xmexample2}, to start a domain with virtual 10.636 -machine ID~1 you should type: 10.637 - 10.638 -\begin{quote} 10.639 -\begin{verbatim} 10.640 -# xm create -c myvmconf vmid=1 10.641 -\end{verbatim} 10.642 -\end{quote} 10.643 - 10.644 - 10.645 -The \path{-c} switch causes \path{xm} to turn into the domain's 10.646 -console after creation. The \path{vmid=1} sets the \path{vmid} 10.647 -variable used in the \path{myvmconf} file. 10.648 - 10.649 - 10.650 -You should see the console boot messages from the new domain 10.651 -appearing in the terminal in which you typed the command, 10.652 -culminating in a login prompt. 10.653 - 10.654 - 10.655 -\section{Example: ttylinux} 10.656 - 10.657 -Ttylinux is a very small Linux distribution, designed to require very 10.658 -few resources. We will use it as a concrete example of how to start a 10.659 -Xen domain. Most users will probably want to install a full-featured 10.660 -distribution once they have mastered the basics\footnote{ttylinux is 10.661 -maintained by Pascal Schmidt. You can download source packages from 10.662 -the distribution's home page: {\tt http://www.minimalinux.org/ttylinux/}}. 10.663 - 10.664 -\begin{enumerate} 10.665 -\item Download and extract the ttylinux disk image from the Files 10.666 -section of the project's SourceForge site (see 10.667 -\path{http://sf.net/projects/xen/}). 10.668 -\item Create a configuration file like the following: 10.669 -\begin{verbatim} 10.670 -kernel = "/boot/vmlinuz-2.6-xenU" 10.671 -memory = 64 10.672 -name = "ttylinux" 10.673 -nics = 1 10.674 -ip = "1.2.3.4" 10.675 -disk = ['file:/path/to/ttylinux/rootfs,sda1,w'] 10.676 -root = "/dev/sda1 ro" 10.677 -\end{verbatim} 10.678 -\item Now start the domain and connect to its console: 10.679 -\begin{verbatim} 10.680 -xm create configfile -c 10.681 -\end{verbatim} 10.682 -\item Login as root, password root. 10.683 -\end{enumerate} 10.684 - 10.685 - 10.686 -\section{Starting / Stopping Domains Automatically} 10.687 - 10.688 -It is possible to have certain domains start automatically at boot 10.689 -time and to have dom0 wait for all running domains to shutdown before 10.690 -it shuts down the system. 10.691 - 10.692 -To specify a domain is to start at boot-time, place its 10.693 -configuration file (or a link to it) under \path{/etc/xen/auto/}. 10.694 - 10.695 -A Sys-V style init script for RedHat and LSB-compliant systems is 10.696 -provided and will be automatically copied to \path{/etc/init.d/} 10.697 -during install. You can then enable it in the appropriate way for 10.698 -your distribution. 10.699 - 10.700 -For instance, on RedHat: 10.701 - 10.702 -\begin{quote} 10.703 -\verb_# chkconfig --add xendomains_ 10.704 -\end{quote} 10.705 - 10.706 -By default, this will start the boot-time domains in runlevels 3, 4 10.707 -and 5. 10.708 - 10.709 -You can also use the \path{service} command to run this script 10.710 -manually, e.g: 10.711 - 10.712 -\begin{quote} 10.713 -\verb_# service xendomains start_ 10.714 - 10.715 -Starts all the domains with config files under /etc/xen/auto/. 10.716 -\end{quote} 10.717 - 10.718 - 10.719 -\begin{quote} 10.720 -\verb_# service xendomains stop_ 10.721 - 10.722 -Shuts down ALL running Xen domains. 10.723 -\end{quote} 10.724 - 10.725 -\chapter{Domain Management Tools} 10.726 - 10.727 -The previous chapter described a simple example of how to configure 10.728 -and start a domain. This chapter summarises the tools available to 10.729 -manage running domains. 10.730 - 10.731 -\section{Command-line Management} 10.732 - 10.733 -Command line management tasks are also performed using the \path{xm} 10.734 -tool. For online help for the commands available, type: 10.735 -\begin{quote} 10.736 -\verb_# xm help_ 10.737 -\end{quote} 10.738 - 10.739 -You can also type \path{xm help $<$command$>$} for more information 10.740 -on a given command. 10.741 - 10.742 -\subsection{Basic Management Commands} 10.743 - 10.744 -The most important \path{xm} commands are: 10.745 -\begin{quote} 10.746 -\verb_# xm list_: Lists all domains running.\\ 10.747 -\verb_# xm consoles_ : Gives information about the domain consoles.\\ 10.748 -\verb_# xm console_: Opens a console to a domain (e.g.\ 10.749 - \verb_# xm console myVM_ 10.750 -\end{quote} 10.751 - 10.752 -\subsection{\tt xm list} 10.753 - 10.754 -The output of \path{xm list} is in rows of the following format: 10.755 -\begin{center} 10.756 -{\tt name domid memory cpu state cputime console} 10.757 -\end{center} 10.758 - 10.759 -\begin{quote} 10.760 -\begin{description} 10.761 -\item[name] The descriptive name of the virtual machine. 10.762 -\item[domid] The number of the domain ID this virtual machine is running in. 10.763 -\item[memory] Memory size in megabytes. 10.764 -\item[cpu] The CPU this domain is running on. 10.765 -\item[state] Domain state consists of 5 fields: 10.766 - \begin{description} 10.767 - \item[r] running 10.768 - \item[b] blocked 10.769 - \item[p] paused 10.770 - \item[s] shutdown 10.771 - \item[c] crashed 10.772 - \end{description} 10.773 -\item[cputime] How much CPU time (in seconds) the domain has used so far. 10.774 -\item[console] TCP port accepting connections to the domain's console. 10.775 -\end{description} 10.776 -\end{quote} 10.777 - 10.778 -The \path{xm list} command also supports a long output format when the 10.779 -\path{-l} switch is used. This outputs the fulls details of the 10.780 -running domains in \xend's SXP configuration format. 10.781 - 10.782 -For example, suppose the system is running the ttylinux domain as 10.783 -described earlier. The list command should produce output somewhat 10.784 -like the following: 10.785 -\begin{verbatim} 10.786 -# xm list 10.787 -Name Id Mem(MB) CPU State Time(s) Console 10.788 -Domain-0 0 251 0 r---- 172.2 10.789 -ttylinux 5 63 0 -b--- 3.0 9605 10.790 -\end{verbatim} 10.791 - 10.792 -Here we can see the details for the ttylinux domain, as well as for 10.793 -domain 0 (which, of course, is always running). Note that the console 10.794 -port for the ttylinux domain is 9605. This can be connected to by TCP 10.795 -using a terminal program (e.g. \path{telnet} or, better, 10.796 -\path{xencons}). The simplest way to connect is to use the \path{xm console} 10.797 -command, specifying the domain name or ID. To connect to the console 10.798 -of the ttylinux domain, we could use any of the following: 10.799 -\begin{verbatim} 10.800 -# xm console ttylinux 10.801 -# xm console 5 10.802 -# xencons localhost 9605 10.803 -\end{verbatim} 10.804 - 10.805 -\section{Domain Save and Restore} 10.806 - 10.807 -The administrator of a Xen system may suspend a virtual machine's 10.808 -current state into a disk file in domain 0, allowing it to be resumed 10.809 -at a later time. 10.810 - 10.811 -The ttylinux domain described earlier can be suspended to disk using 10.812 -the command: 10.813 -\begin{verbatim} 10.814 -# xm save ttylinux ttylinux.xen 10.815 -\end{verbatim} 10.816 - 10.817 -This will stop the domain named `ttylinux' and save its current state 10.818 -into a file called \path{ttylinux.xen}. 10.819 - 10.820 -To resume execution of this domain, use the \path{xm restore} command: 10.821 -\begin{verbatim} 10.822 -# xm restore ttylinux.xen 10.823 -\end{verbatim} 10.824 - 10.825 -This will restore the state of the domain and restart it. The domain 10.826 -will carry on as before and the console may be reconnected using the 10.827 -\path{xm console} command, as above. 10.828 - 10.829 -\section{Live Migration} 10.830 - 10.831 -Live migration is used to transfer a domain between physical hosts 10.832 -whilst that domain continues to perform its usual activities --- from 10.833 -the user's perspective, the migration should be imperceptible. 10.834 - 10.835 -To perform a live migration, both hosts must be running Xen / \xend and 10.836 -the destination host must have sufficient resources (e.g. memory 10.837 -capacity) to accommodate the domain after the move. Furthermore we 10.838 -currently require both source and destination machines to be on the 10.839 -same L2 subnet. 10.840 - 10.841 -Currently, there is no support for providing automatic remote access 10.842 -to filesystems stored on local disk when a domain is migrated. 10.843 -Administrators should choose an appropriate storage solution 10.844 -(i.e. SAN, NAS, etc.) to ensure that domain filesystems are also 10.845 -available on their destination node. GNBD is a good method for 10.846 -exporting a volume from one machine to another. iSCSI can do a similar 10.847 -job, but is more complex to set up. 10.848 - 10.849 -When a domain migrates, it's MAC and IP address move with it, thus it 10.850 -is only possible to migrate VMs within the same layer-2 network and IP 10.851 -subnet. If the destination node is on a different subnet, the 10.852 -administrator would need to manually configure a suitable etherip or 10.853 -IP tunnel in the domain 0 of the remote node. 10.854 - 10.855 -A domain may be migrated using the \path{xm migrate} command. To 10.856 -live migrate a domain to another machine, we would use 10.857 -the command: 10.858 - 10.859 -\begin{verbatim} 10.860 -# xm migrate --live mydomain destination.ournetwork.com 10.861 -\end{verbatim} 10.862 - 10.863 -Without the \path{--live} flag, \xend simply stops the domain and 10.864 -copies the memory image over to the new node and restarts it. Since 10.865 -domains can have large allocations this can be quite time consuming, 10.866 -even on a Gigabit network. With the \path{--live} flag \xend attempts 10.867 -to keep the domain running while the migration is in progress, 10.868 -resulting in typical `downtimes' of just 60--300ms. 10.869 - 10.870 -For now it will be necessary to reconnect to the domain's console on 10.871 -the new machine using the \path{xm console} command. If a migrated 10.872 -domain has any open network connections then they will be preserved, 10.873 -so SSH connections do not have this limitation. 10.874 - 10.875 -\section{Managing Domain Memory} 10.876 - 10.877 -XenLinux domains have the ability to relinquish / reclaim machine 10.878 -memory at the request of the administrator or the user of the domain. 10.879 +%% Chapter Starting Additional Domains moved to start_addl_dom.tex 10.880 +\include{src/user/start_addl_dom} 10.881 10.882 -\subsection{Setting memory footprints from dom0} 10.883 - 10.884 -The machine administrator can request that a domain alter its memory 10.885 -footprint using the \path{xm set-mem} command. For instance, we can 10.886 -request that our example ttylinux domain reduce its memory footprint 10.887 -to 32 megabytes. 10.888 - 10.889 -\begin{verbatim} 10.890 -# xm set-mem ttylinux 32 10.891 -\end{verbatim} 10.892 - 10.893 -We can now see the result of this in the output of \path{xm list}: 10.894 - 10.895 -\begin{verbatim} 10.896 -# xm list 10.897 -Name Id Mem(MB) CPU State Time(s) Console 10.898 -Domain-0 0 251 0 r---- 172.2 10.899 -ttylinux 5 31 0 -b--- 4.3 9605 10.900 -\end{verbatim} 10.901 - 10.902 -The domain has responded to the request by returning memory to Xen. We 10.903 -can restore the domain to its original size using the command line: 10.904 - 10.905 -\begin{verbatim} 10.906 -# xm set-mem ttylinux 64 10.907 -\end{verbatim} 10.908 - 10.909 -\subsection{Setting memory footprints from within a domain} 10.910 - 10.911 -The virtual file \path{/proc/xen/balloon} allows the owner of a 10.912 -domain to adjust their own memory footprint. Reading the file 10.913 -(e.g. \path{cat /proc/xen/balloon}) prints out the current 10.914 -memory footprint of the domain. Writing the file 10.915 -(e.g. \path{echo new\_target > /proc/xen/balloon}) requests 10.916 -that the kernel adjust the domain's memory footprint to a new value. 10.917 - 10.918 -\subsection{Setting memory limits} 10.919 - 10.920 -Xen associates a memory size limit with each domain. By default, this 10.921 -is the amount of memory the domain is originally started with, 10.922 -preventing the domain from ever growing beyond this size. To permit a 10.923 -domain to grow beyond its original allocation or to prevent a domain 10.924 -you've shrunk from reclaiming the memory it relinquished, use the 10.925 -\path{xm maxmem} command. 10.926 - 10.927 -\chapter{Domain Filesystem Storage} 10.928 - 10.929 -It is possible to directly export any Linux block device in dom0 to 10.930 -another domain, or to export filesystems / devices to virtual machines 10.931 -using standard network protocols (e.g. NBD, iSCSI, NFS, etc). This 10.932 -chapter covers some of the possibilities. 10.933 - 10.934 - 10.935 -\section{Exporting Physical Devices as VBDs} 10.936 -\label{s:exporting-physical-devices-as-vbds} 10.937 - 10.938 -One of the simplest configurations is to directly export 10.939 -individual partitions from domain 0 to other domains. To 10.940 -achieve this use the \path{phy:} specifier in your domain 10.941 -configuration file. For example a line like 10.942 -\begin{quote} 10.943 -\verb_disk = ['phy:hda3,sda1,w']_ 10.944 -\end{quote} 10.945 -specifies that the partition \path{/dev/hda3} in domain 0 10.946 -should be exported read-write to the new domain as \path{/dev/sda1}; 10.947 -one could equally well export it as \path{/dev/hda} or 10.948 -\path{/dev/sdb5} should one wish. 10.949 - 10.950 -In addition to local disks and partitions, it is possible to export 10.951 -any device that Linux considers to be ``a disk'' in the same manner. 10.952 -For example, if you have iSCSI disks or GNBD volumes imported into 10.953 -domain 0 you can export these to other domains using the \path{phy:} 10.954 -disk syntax. E.g.: 10.955 -\begin{quote} 10.956 -\verb_disk = ['phy:vg/lvm1,sda2,w']_ 10.957 -\end{quote} 10.958 - 10.959 - 10.960 - 10.961 -\begin{center} 10.962 -\framebox{\bf Warning: Block device sharing} 10.963 -\end{center} 10.964 -\begin{quote} 10.965 -Block devices should typically only be shared between domains in a 10.966 -read-only fashion otherwise the Linux kernel's file systems will get 10.967 -very confused as the file system structure may change underneath them 10.968 -(having the same ext3 partition mounted rw twice is a sure fire way to 10.969 -cause irreparable damage)! \Xend will attempt to prevent you from 10.970 -doing this by checking that the device is not mounted read-write in 10.971 -domain 0, and hasn't already been exported read-write to another 10.972 -domain. 10.973 -If you want read-write sharing, export the directory to other domains 10.974 -via NFS from domain0 (or use a cluster file system such as GFS or 10.975 -ocfs2). 10.976 - 10.977 -\end{quote} 10.978 - 10.979 - 10.980 -\section{Using File-backed VBDs} 10.981 - 10.982 -It is also possible to use a file in Domain 0 as the primary storage 10.983 -for a virtual machine. As well as being convenient, this also has the 10.984 -advantage that the virtual block device will be {\em sparse} --- space 10.985 -will only really be allocated as parts of the file are used. So if a 10.986 -virtual machine uses only half of its disk space then the file really 10.987 -takes up half of the size allocated. 10.988 - 10.989 -For example, to create a 2GB sparse file-backed virtual block device 10.990 -(actually only consumes 1KB of disk): 10.991 -\begin{quote} 10.992 -\verb_# dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1_ 10.993 -\end{quote} 10.994 - 10.995 -Make a file system in the disk file: 10.996 -\begin{quote} 10.997 -\verb_# mkfs -t ext3 vm1disk_ 10.998 -\end{quote} 10.999 - 10.1000 -(when the tool asks for confirmation, answer `y') 10.1001 - 10.1002 -Populate the file system e.g. by copying from the current root: 10.1003 -\begin{quote} 10.1004 -\begin{verbatim} 10.1005 -# mount -o loop vm1disk /mnt 10.1006 -# cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt 10.1007 -# mkdir /mnt/{proc,sys,home,tmp} 10.1008 -\end{verbatim} 10.1009 -\end{quote} 10.1010 - 10.1011 -Tailor the file system by editing \path{/etc/fstab}, 10.1012 -\path{/etc/hostname}, etc (don't forget to edit the files in the 10.1013 -mounted file system, instead of your domain 0 filesystem, e.g. you 10.1014 -would edit \path{/mnt/etc/fstab} instead of \path{/etc/fstab} ). For 10.1015 -this example put \path{/dev/sda1} to root in fstab. 10.1016 - 10.1017 -Now unmount (this is important!): 10.1018 -\begin{quote} 10.1019 -\verb_# umount /mnt_ 10.1020 -\end{quote} 10.1021 - 10.1022 -In the configuration file set: 10.1023 -\begin{quote} 10.1024 -\verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_ 10.1025 -\end{quote} 10.1026 +%% Chapter Domain Management Tools moved to domain_mgmt.tex 10.1027 +\include{src/user/domain_mgmt} 10.1028 10.1029 -As the virtual machine writes to its `disk', the sparse file will be 10.1030 -filled in and consume more space up to the original 2GB. 10.1031 - 10.1032 -{\bf Note that file-backed VBDs may not be appropriate for backing 10.1033 -I/O-intensive domains.} File-backed VBDs are known to experience 10.1034 -substantial slowdowns under heavy I/O workloads, due to the I/O handling 10.1035 -by the loopback block device used to support file-backed VBDs in dom0. 10.1036 -Better I/O performance can be achieved by using either LVM-backed VBDs 10.1037 -(Section~\ref{s:using-lvm-backed-vbds}) or physical devices as VBDs 10.1038 -(Section~\ref{s:exporting-physical-devices-as-vbds}). 10.1039 - 10.1040 -Linux supports a maximum of eight file-backed VBDs across all domains by 10.1041 -default. This limit can be statically increased by using the {\em 10.1042 -max\_loop} module parameter if CONFIG\_BLK\_DEV\_LOOP is compiled as a 10.1043 -module in the dom0 kernel, or by using the {\em max\_loop=n} boot option 10.1044 -if CONFIG\_BLK\_DEV\_LOOP is compiled directly into the dom0 kernel. 10.1045 - 10.1046 - 10.1047 -\section{Using LVM-backed VBDs} 10.1048 -\label{s:using-lvm-backed-vbds} 10.1049 - 10.1050 -A particularly appealing solution is to use LVM volumes 10.1051 -as backing for domain file-systems since this allows dynamic 10.1052 -growing/shrinking of volumes as well as snapshot and other 10.1053 -features. 10.1054 - 10.1055 -To initialise a partition to support LVM volumes: 10.1056 -\begin{quote} 10.1057 -\begin{verbatim} 10.1058 -# pvcreate /dev/sda10 10.1059 -\end{verbatim} 10.1060 -\end{quote} 10.1061 - 10.1062 -Create a volume group named `vg' on the physical partition: 10.1063 -\begin{quote} 10.1064 -\begin{verbatim} 10.1065 -# vgcreate vg /dev/sda10 10.1066 -\end{verbatim} 10.1067 -\end{quote} 10.1068 - 10.1069 -Create a logical volume of size 4GB named `myvmdisk1': 10.1070 -\begin{quote} 10.1071 -\begin{verbatim} 10.1072 -# lvcreate -L4096M -n myvmdisk1 vg 10.1073 -\end{verbatim} 10.1074 -\end{quote} 10.1075 - 10.1076 -You should now see that you have a \path{/dev/vg/myvmdisk1} 10.1077 -Make a filesystem, mount it and populate it, e.g.: 10.1078 -\begin{quote} 10.1079 -\begin{verbatim} 10.1080 -# mkfs -t ext3 /dev/vg/myvmdisk1 10.1081 -# mount /dev/vg/myvmdisk1 /mnt 10.1082 -# cp -ax / /mnt 10.1083 -# umount /mnt 10.1084 -\end{verbatim} 10.1085 -\end{quote} 10.1086 - 10.1087 -Now configure your VM with the following disk configuration: 10.1088 -\begin{quote} 10.1089 -\begin{verbatim} 10.1090 - disk = [ 'phy:vg/myvmdisk1,sda1,w' ] 10.1091 -\end{verbatim} 10.1092 -\end{quote} 10.1093 - 10.1094 -LVM enables you to grow the size of logical volumes, but you'll need 10.1095 -to resize the corresponding file system to make use of the new 10.1096 -space. Some file systems (e.g. ext3) now support on-line resize. See 10.1097 -the LVM manuals for more details. 10.1098 +%% Chapter Domain Filesystem Storage moved to domain_filesystem.tex 10.1099 +\include{src/user/domain_filesystem} 10.1100 10.1101 -You can also use LVM for creating copy-on-write clones of LVM 10.1102 -volumes (known as writable persistent snapshots in LVM 10.1103 -terminology). This facility is new in Linux 2.6.8, so isn't as 10.1104 -stable as one might hope. In particular, using lots of CoW LVM 10.1105 -disks consumes a lot of dom0 memory, and error conditions such as 10.1106 -running out of disk space are not handled well. Hopefully this 10.1107 -will improve in future. 10.1108 - 10.1109 -To create two copy-on-write clone of the above file system you 10.1110 -would use the following commands: 10.1111 - 10.1112 -\begin{quote} 10.1113 -\begin{verbatim} 10.1114 -# lvcreate -s -L1024M -n myclonedisk1 /dev/vg/myvmdisk1 10.1115 -# lvcreate -s -L1024M -n myclonedisk2 /dev/vg/myvmdisk1 10.1116 -\end{verbatim} 10.1117 -\end{quote} 10.1118 - 10.1119 -Each of these can grow to have 1GB of differences from the master 10.1120 -volume. You can grow the amount of space for storing the 10.1121 -differences using the lvextend command, e.g.: 10.1122 -\begin{quote} 10.1123 -\begin{verbatim} 10.1124 -# lvextend +100M /dev/vg/myclonedisk1 10.1125 -\end{verbatim} 10.1126 -\end{quote} 10.1127 - 10.1128 -Don't let the `differences volume' ever fill up otherwise LVM gets 10.1129 -rather confused. It may be possible to automate the growing 10.1130 -process by using \path{dmsetup wait} to spot the volume getting full 10.1131 -and then issue an \path{lvextend}. 10.1132 - 10.1133 -In principle, it is possible to continue writing to the volume 10.1134 -that has been cloned (the changes will not be visible to the 10.1135 -clones), but we wouldn't recommend this: have the cloned volume 10.1136 -as a `pristine' file system install that isn't mounted directly 10.1137 -by any of the virtual machines. 10.1138 - 10.1139 - 10.1140 -\section{Using NFS Root} 10.1141 - 10.1142 -First, populate a root filesystem in a directory on the server 10.1143 -machine. This can be on a distinct physical machine, or simply 10.1144 -run within a virtual machine on the same node. 10.1145 - 10.1146 -Now configure the NFS server to export this filesystem over the 10.1147 -network by adding a line to \path{/etc/exports}, for instance: 10.1148 - 10.1149 -\begin{quote} 10.1150 -\begin{small} 10.1151 -\begin{verbatim} 10.1152 -/export/vm1root 1.2.3.4/24 (rw,sync,no_root_squash) 10.1153 -\end{verbatim} 10.1154 -\end{small} 10.1155 -\end{quote} 10.1156 - 10.1157 -Finally, configure the domain to use NFS root. In addition to the 10.1158 -normal variables, you should make sure to set the following values in 10.1159 -the domain's configuration file: 10.1160 - 10.1161 -\begin{quote} 10.1162 -\begin{small} 10.1163 -\begin{verbatim} 10.1164 -root = '/dev/nfs' 10.1165 -nfs_server = '2.3.4.5' # substitute IP address of server 10.1166 -nfs_root = '/path/to/root' # path to root FS on the server 10.1167 -\end{verbatim} 10.1168 -\end{small} 10.1169 -\end{quote} 10.1170 - 10.1171 -The domain will need network access at boot time, so either statically 10.1172 -configure an IP address (Using the config variables \path{ip}, 10.1173 -\path{netmask}, \path{gateway}, \path{hostname}) or enable DHCP ( 10.1174 -\path{dhcp='dhcp'}). 10.1175 - 10.1176 -Note that the Linux NFS root implementation is known to have stability 10.1177 -problems under high load (this is not a Xen-specific problem), so this 10.1178 -configuration may not be appropriate for critical servers. 10.1179 10.1180 10.1181 \part{User Reference Documentation} 10.1182 10.1183 -\chapter{Control Software} 10.1184 - 10.1185 -The Xen control software includes the \xend node control daemon (which 10.1186 -must be running), the xm command line tools, and the prototype 10.1187 -xensv web interface. 10.1188 - 10.1189 -\section{\Xend (node control daemon)} 10.1190 -\label{s:xend} 10.1191 - 10.1192 -The Xen Daemon (\Xend) performs system management functions related to 10.1193 -virtual machines. It forms a central point of control for a machine 10.1194 -and can be controlled using an HTTP-based protocol. \Xend must be 10.1195 -running in order to start and manage virtual machines. 10.1196 - 10.1197 -\Xend must be run as root because it needs access to privileged system 10.1198 -management functions. A small set of commands may be issued on the 10.1199 -\xend command line: 10.1200 - 10.1201 -\begin{tabular}{ll} 10.1202 -\verb!# xend start! & start \xend, if not already running \\ 10.1203 -\verb!# xend stop! & stop \xend if already running \\ 10.1204 -\verb!# xend restart! & restart \xend if running, otherwise start it \\ 10.1205 -% \verb!# xend trace_start! & start \xend, with very detailed debug logging \\ 10.1206 -\verb!# xend status! & indicates \xend status by its return code 10.1207 -\end{tabular} 10.1208 - 10.1209 -A SysV init script called {\tt xend} is provided to start \xend at boot 10.1210 -time. {\tt make install} installs this script in {\path{/etc/init.d}. 10.1211 -To enable it, you have to make symbolic links in the appropriate 10.1212 -runlevel directories or use the {\tt chkconfig} tool, where available. 10.1213 - 10.1214 -Once \xend is running, more sophisticated administration can be done 10.1215 -using the xm tool (see Section~\ref{s:xm}) and the experimental 10.1216 -Xensv web interface (see Section~\ref{s:xensv}). 10.1217 - 10.1218 -As \xend runs, events will be logged to \path{/var/log/xend.log} and, 10.1219 -if the migration assistant daemon (\path{xfrd}) has been started, 10.1220 -\path{/var/log/xfrd.log}. These may be of use for troubleshooting 10.1221 -problems. 10.1222 - 10.1223 -\section{Xm (command line interface)} 10.1224 -\label{s:xm} 10.1225 - 10.1226 -The xm tool is the primary tool for managing Xen from the console. 10.1227 -The general format of an xm command line is: 10.1228 - 10.1229 -\begin{verbatim} 10.1230 -# xm command [switches] [arguments] [variables] 10.1231 -\end{verbatim} 10.1232 - 10.1233 -The available {\em switches} and {\em arguments} are dependent on the 10.1234 -{\em command} chosen. The {\em variables} may be set using 10.1235 -declarations of the form {\tt variable=value} and command line 10.1236 -declarations override any of the values in the configuration file 10.1237 -being used, including the standard variables described above and any 10.1238 -custom variables (for instance, the \path{xmdefconfig} file uses a 10.1239 -{\tt vmid} variable). 10.1240 - 10.1241 -The available commands are as follows: 10.1242 - 10.1243 -\begin{description} 10.1244 -\item[set-mem] Request a domain to adjust its memory footprint. 10.1245 -\item[create] Create a new domain. 10.1246 -\item[destroy] Kill a domain immediately. 10.1247 -\item[list] List running domains. 10.1248 -\item[shutdown] Ask a domain to shutdown. 10.1249 -\item[dmesg] Fetch the Xen (not Linux!) boot output. 10.1250 -\item[consoles] Lists the available consoles. 10.1251 -\item[console] Connect to the console for a domain. 10.1252 -\item[help] Get help on xm commands. 10.1253 -\item[save] Suspend a domain to disk. 10.1254 -\item[restore] Restore a domain from disk. 10.1255 -\item[pause] Pause a domain's execution. 10.1256 -\item[unpause] Unpause a domain. 10.1257 -\item[pincpu] Pin a domain to a CPU. 10.1258 -\item[bvt] Set BVT scheduler parameters for a domain. 10.1259 -\item[bvt\_ctxallow] Set the BVT context switching allowance for the system. 10.1260 -\item[atropos] Set the atropos parameters for a domain. 10.1261 -\item[rrobin] Set the round robin time slice for the system. 10.1262 -\item[info] Get information about the Xen host. 10.1263 -\item[call] Call a \xend HTTP API function directly. 10.1264 -\end{description} 10.1265 - 10.1266 -For a detailed overview of switches, arguments and variables to each command 10.1267 -try 10.1268 -\begin{quote} 10.1269 -\begin{verbatim} 10.1270 -# xm help command 10.1271 -\end{verbatim} 10.1272 -\end{quote} 10.1273 - 10.1274 -\section{Xensv (web control interface)} 10.1275 -\label{s:xensv} 10.1276 - 10.1277 -Xensv is the experimental web control interface for managing a Xen 10.1278 -machine. It can be used to perform some (but not yet all) of the 10.1279 -management tasks that can be done using the xm tool. 10.1280 - 10.1281 -It can be started using: 10.1282 -\begin{quote} 10.1283 -\verb_# xensv start_ 10.1284 -\end{quote} 10.1285 -and stopped using: 10.1286 -\begin{quote} 10.1287 -\verb_# xensv stop_ 10.1288 -\end{quote} 10.1289 - 10.1290 -By default, Xensv will serve out the web interface on port 8080. This 10.1291 -can be changed by editing 10.1292 -\path{/usr/lib/python2.3/site-packages/xen/sv/params.py}. 10.1293 - 10.1294 -Once Xensv is running, the web interface can be used to create and 10.1295 -manage running domains. 10.1296 - 10.1297 - 10.1298 - 10.1299 - 10.1300 -\chapter{Domain Configuration} 10.1301 -\label{cha:config} 10.1302 - 10.1303 -The following contains the syntax of the domain configuration 10.1304 -files and description of how to further specify networking, 10.1305 -driver domain and general scheduling behaviour. 10.1306 - 10.1307 -\section{Configuration Files} 10.1308 -\label{s:cfiles} 10.1309 - 10.1310 -Xen configuration files contain the following standard variables. 10.1311 -Unless otherwise stated, configuration items should be enclosed in 10.1312 -quotes: see \path{/etc/xen/xmexample1} and \path{/etc/xen/xmexample2} 10.1313 -for concrete examples of the syntax. 10.1314 - 10.1315 -\begin{description} 10.1316 -\item[kernel] Path to the kernel image 10.1317 -\item[ramdisk] Path to a ramdisk image (optional). 10.1318 -% \item[builder] The name of the domain build function (e.g. {\tt'linux'} or {\tt'netbsd'}. 10.1319 -\item[memory] Memory size in megabytes. 10.1320 -\item[cpu] CPU to run this domain on, or {\tt -1} for 10.1321 - auto-allocation. 10.1322 -\item[console] Port to export the domain console on (default 9600 + domain ID). 10.1323 -\item[nics] Number of virtual network interfaces. 10.1324 -\item[vif] List of MAC addresses (random addresses are assigned if not 10.1325 - given) and bridges to use for the domain's network interfaces, e.g. 10.1326 -\begin{verbatim} 10.1327 -vif = [ 'mac=aa:00:00:00:00:11, bridge=xen-br0', 10.1328 - 'bridge=xen-br1' ] 10.1329 -\end{verbatim} 10.1330 - to assign a MAC address and bridge to the first interface and assign 10.1331 - a different bridge to the second interface, leaving \xend to choose 10.1332 - the MAC address. 10.1333 -\item[disk] List of block devices to export to the domain, e.g. \\ 10.1334 - \verb_disk = [ 'phy:hda1,sda1,r' ]_ \\ 10.1335 - exports physical device \path{/dev/hda1} to the domain 10.1336 - as \path{/dev/sda1} with read-only access. Exporting a disk read-write 10.1337 - which is currently mounted is dangerous -- if you are \emph{certain} 10.1338 - you wish to do this, you can specify \path{w!} as the mode. 10.1339 -\item[dhcp] Set to {\tt 'dhcp'} if you want to use DHCP to configure 10.1340 - networking. 10.1341 -\item[netmask] Manually configured IP netmask. 10.1342 -\item[gateway] Manually configured IP gateway. 10.1343 -\item[hostname] Set the hostname for the virtual machine. 10.1344 -\item[root] Specify the root device parameter on the kernel command 10.1345 - line. 10.1346 -\item[nfs\_server] IP address for the NFS server (if any). 10.1347 -\item[nfs\_root] Path of the root filesystem on the NFS server (if any). 10.1348 -\item[extra] Extra string to append to the kernel command line (if 10.1349 - any) 10.1350 -\item[restart] Three possible options: 10.1351 - \begin{description} 10.1352 - \item[always] Always restart the domain, no matter what 10.1353 - its exit code is. 10.1354 - \item[never] Never restart the domain. 10.1355 - \item[onreboot] Restart the domain iff it requests reboot. 10.1356 - \end{description} 10.1357 -\end{description} 10.1358 - 10.1359 -For additional flexibility, it is also possible to include Python 10.1360 -scripting commands in configuration files. An example of this is the 10.1361 -\path{xmexample2} file, which uses Python code to handle the 10.1362 -\path{vmid} variable. 10.1363 - 10.1364 - 10.1365 -%\part{Advanced Topics} 10.1366 - 10.1367 -\section{Network Configuration} 10.1368 - 10.1369 -For many users, the default installation should work `out of the box'. 10.1370 -More complicated network setups, for instance with multiple ethernet 10.1371 -interfaces and/or existing bridging setups will require some 10.1372 -special configuration. 10.1373 - 10.1374 -The purpose of this section is to describe the mechanisms provided by 10.1375 -\xend to allow a flexible configuration for Xen's virtual networking. 10.1376 - 10.1377 -\subsection{Xen virtual network topology} 10.1378 - 10.1379 -Each domain network interface is connected to a virtual network 10.1380 -interface in dom0 by a point to point link (effectively a `virtual 10.1381 -crossover cable'). These devices are named {\tt 10.1382 -vif$<$domid$>$.$<$vifid$>$} (e.g. {\tt vif1.0} for the first interface 10.1383 -in domain 1, {\tt vif3.1} for the second interface in domain 3). 10.1384 - 10.1385 -Traffic on these virtual interfaces is handled in domain 0 using 10.1386 -standard Linux mechanisms for bridging, routing, rate limiting, etc. 10.1387 -Xend calls on two shell scripts to perform initial configuration of 10.1388 -the network and configuration of new virtual interfaces. By default, 10.1389 -these scripts configure a single bridge for all the virtual 10.1390 -interfaces. Arbitrary routing / bridging configurations can be 10.1391 -configured by customising the scripts, as described in the following 10.1392 -section. 10.1393 - 10.1394 -\subsection{Xen networking scripts} 10.1395 - 10.1396 -Xen's virtual networking is configured by two shell scripts (by 10.1397 -default \path{network} and \path{vif-bridge}). These are 10.1398 -called automatically by \xend when certain events occur, with 10.1399 -arguments to the scripts providing further contextual information. 10.1400 -These scripts are found by default in \path{/etc/xen/scripts}. The 10.1401 -names and locations of the scripts can be configured in 10.1402 -\path{/etc/xen/xend-config.sxp}. 10.1403 - 10.1404 -\begin{description} 10.1405 - 10.1406 -\item[network:] This script is called whenever \xend is started or 10.1407 -stopped to respectively initialise or tear down the Xen virtual 10.1408 -network. In the default configuration initialisation creates the 10.1409 -bridge `xen-br0' and moves eth0 onto that bridge, modifying the 10.1410 -routing accordingly. When \xend exits, it deletes the Xen bridge and 10.1411 -removes eth0, restoring the normal IP and routing configuration. 10.1412 - 10.1413 -%% In configurations where the bridge already exists, this script could 10.1414 -%% be replaced with a link to \path{/bin/true} (for instance). 10.1415 - 10.1416 -\item[vif-bridge:] This script is called for every domain virtual 10.1417 -interface and can configure firewalling rules and add the vif 10.1418 -to the appropriate bridge. By default, this adds and removes 10.1419 -VIFs on the default Xen bridge. 10.1420 - 10.1421 -\end{description} 10.1422 - 10.1423 -For more complex network setups (e.g. where routing is required or 10.1424 -integrate with existing bridges) these scripts may be replaced with 10.1425 -customised variants for your site's preferred configuration. 10.1426 - 10.1427 -%% There are two possible types of privileges: IO privileges and 10.1428 -%% administration privileges. 10.1429 - 10.1430 -\section{Driver Domain Configuration} 10.1431 - 10.1432 -I/O privileges can be assigned to allow a domain to directly access 10.1433 -PCI devices itself. This is used to support driver domains. 10.1434 - 10.1435 -Setting backend privileges is currently only supported in SXP format 10.1436 -config files. To allow a domain to function as a backend for others, 10.1437 -somewhere within the {\tt vm} element of its configuration file must 10.1438 -be a {\tt backend} element of the form {\tt (backend ({\em type}))} 10.1439 -where {\tt \em type} may be either {\tt netif} or {\tt blkif}, 10.1440 -according to the type of virtual device this domain will service. 10.1441 -%% After this domain has been built, \xend will connect all new and 10.1442 -%% existing {\em virtual} devices (of the appropriate type) to that 10.1443 -%% backend. 10.1444 - 10.1445 -Note that a block backend cannot currently import virtual block 10.1446 -devices from other domains, and a network backend cannot import 10.1447 -virtual network devices from other domains. Thus (particularly in the 10.1448 -case of block backends, which cannot import a virtual block device as 10.1449 -their root filesystem), you may need to boot a backend domain from a 10.1450 -ramdisk or a network device. 10.1451 - 10.1452 -Access to PCI devices may be configured on a per-device basis. Xen 10.1453 -will assign the minimal set of hardware privileges to a domain that 10.1454 -are required to control its devices. This can be configured in either 10.1455 -format of configuration file: 10.1456 - 10.1457 -\begin{itemize} 10.1458 -\item SXP Format: Include device elements of the form: \\ 10.1459 -\centerline{ {\tt (device (pci (bus {\em x}) (dev {\em y}) (func {\em z})))}} \\ 10.1460 - inside the top-level {\tt vm} element. Each one specifies the address 10.1461 - of a device this domain is allowed to access --- 10.1462 - the numbers {\em x},{\em y} and {\em z} may be in either decimal or 10.1463 - hexadecimal format. 10.1464 -\item Flat Format: Include a list of PCI device addresses of the 10.1465 - format: \\ 10.1466 -\centerline{{\tt pci = ['x,y,z', ...]}} \\ 10.1467 -where each element in the 10.1468 - list is a string specifying the components of the PCI device 10.1469 - address, separated by commas. The components ({\tt \em x}, {\tt \em 10.1470 - y} and {\tt \em z}) of the list may be formatted as either decimal 10.1471 - or hexadecimal. 10.1472 -\end{itemize} 10.1473 - 10.1474 -%% \section{Administration Domains} 10.1475 - 10.1476 -%% Administration privileges allow a domain to use the `dom0 10.1477 -%% operations' (so called because they are usually available only to 10.1478 -%% domain 0). A privileged domain can build other domains, set scheduling 10.1479 -%% parameters, etc. 10.1480 - 10.1481 -% Support for other administrative domains is not yet available... perhaps 10.1482 -% we should plumb it in some time 10.1483 - 10.1484 - 10.1485 - 10.1486 - 10.1487 - 10.1488 -\section{Scheduler Configuration} 10.1489 -\label{s:sched} 10.1490 - 10.1491 - 10.1492 -Xen offers a boot time choice between multiple schedulers. To select 10.1493 -a scheduler, pass the boot parameter {\em sched=sched\_name} to Xen, 10.1494 -substituting the appropriate scheduler name. Details of the schedulers 10.1495 -and their parameters are included below; future versions of the tools 10.1496 -will provide a higher-level interface to these tools. 10.1497 +%% Chapter Control Software moved to control_software.tex 10.1498 +\include{src/user/control_software} 10.1499 10.1500 -It is expected that system administrators configure their system to 10.1501 -use the scheduler most appropriate to their needs. Currently, the BVT 10.1502 -scheduler is the recommended choice. 10.1503 - 10.1504 -\subsection{Borrowed Virtual Time} 10.1505 - 10.1506 -{\tt sched=bvt} (the default) \\ 10.1507 - 10.1508 -BVT provides proportional fair shares of the CPU time. It has been 10.1509 -observed to penalise domains that block frequently (e.g. I/O intensive 10.1510 -domains), but this can be compensated for by using warping. 10.1511 - 10.1512 -\subsubsection{Global Parameters} 10.1513 - 10.1514 -\begin{description} 10.1515 -\item[ctx\_allow] 10.1516 - the context switch allowance is similar to the `quantum' 10.1517 - in traditional schedulers. It is the minimum time that 10.1518 - a scheduled domain will be allowed to run before being 10.1519 - pre-empted. 10.1520 -\end{description} 10.1521 - 10.1522 -\subsubsection{Per-domain parameters} 10.1523 - 10.1524 -\begin{description} 10.1525 -\item[mcuadv] 10.1526 - the MCU (Minimum Charging Unit) advance determines the 10.1527 - proportional share of the CPU that a domain receives. It 10.1528 - is set inversely proportionally to a domain's sharing weight. 10.1529 -\item[warp] 10.1530 - the amount of `virtual time' the domain is allowed to warp 10.1531 - backwards 10.1532 -\item[warpl] 10.1533 - the warp limit is the maximum time a domain can run warped for 10.1534 -\item[warpu] 10.1535 - the unwarp requirement is the minimum time a domain must 10.1536 - run unwarped for before it can warp again 10.1537 -\end{description} 10.1538 - 10.1539 -\subsection{Atropos} 10.1540 - 10.1541 -{\tt sched=atropos} \\ 10.1542 - 10.1543 -Atropos is a soft real time scheduler. It provides guarantees about 10.1544 -absolute shares of the CPU, with a facility for sharing 10.1545 -slack CPU time on a best-effort basis. It can provide timeliness 10.1546 -guarantees for latency-sensitive domains. 10.1547 - 10.1548 -Every domain has an associated period and slice. The domain should 10.1549 -receive `slice' nanoseconds every `period' nanoseconds. This allows 10.1550 -the administrator to configure both the absolute share of the CPU a 10.1551 -domain receives and the frequency with which it is scheduled. 10.1552 - 10.1553 -%% When 10.1554 -%% domains unblock, their period is reduced to the value of the latency 10.1555 -%% hint (the slice is scaled accordingly so that they still get the same 10.1556 -%% proportion of the CPU). For each subsequent period, the slice and 10.1557 -%% period times are doubled until they reach their original values. 10.1558 - 10.1559 -Note: don't overcommit the CPU when using Atropos (i.e. don't reserve 10.1560 -more CPU than is available --- the utilisation should be kept to 10.1561 -slightly less than 100\% in order to ensure predictable behaviour). 10.1562 - 10.1563 -\subsubsection{Per-domain parameters} 10.1564 - 10.1565 -\begin{description} 10.1566 -\item[period] The regular time interval during which a domain is 10.1567 - guaranteed to receive its allocation of CPU time. 10.1568 -\item[slice] 10.1569 - The length of time per period that a domain is guaranteed to run 10.1570 - for (in the absence of voluntary yielding of the CPU). 10.1571 -\item[latency] 10.1572 - The latency hint is used to control how soon after 10.1573 - waking up a domain it should be scheduled. 10.1574 -\item[xtratime] This is a boolean flag that specifies whether a domain 10.1575 - should be allowed a share of the system slack time. 10.1576 -\end{description} 10.1577 - 10.1578 -\subsection{Round Robin} 10.1579 - 10.1580 -{\tt sched=rrobin} \\ 10.1581 - 10.1582 -The round robin scheduler is included as a simple demonstration of 10.1583 -Xen's internal scheduler API. It is not intended for production use. 10.1584 - 10.1585 -\subsubsection{Global Parameters} 10.1586 - 10.1587 -\begin{description} 10.1588 -\item[rr\_slice] 10.1589 - The maximum time each domain runs before the next 10.1590 - scheduling decision is made. 10.1591 -\end{description} 10.1592 - 10.1593 - 10.1594 - 10.1595 - 10.1596 - 10.1597 - 10.1598 - 10.1599 - 10.1600 - 10.1601 - 10.1602 - 10.1603 - 10.1604 -\chapter{Build, Boot and Debug options} 10.1605 - 10.1606 -This chapter describes the build- and boot-time options 10.1607 -which may be used to tailor your Xen system. 10.1608 - 10.1609 -\section{Xen Build Options} 10.1610 - 10.1611 -Xen provides a number of build-time options which should be 10.1612 -set as environment variables or passed on make's command-line. 10.1613 - 10.1614 -\begin{description} 10.1615 -\item[verbose=y] Enable debugging messages when Xen detects an unexpected condition. 10.1616 -Also enables console output from all domains. 10.1617 -\item[debug=y] 10.1618 -Enable debug assertions. Implies {\bf verbose=y}. 10.1619 -(Primarily useful for tracing bugs in Xen). 10.1620 -\item[debugger=y] 10.1621 -Enable the in-Xen debugger. This can be used to debug 10.1622 -Xen, guest OSes, and applications. 10.1623 -\item[perfc=y] 10.1624 -Enable performance counters for significant events 10.1625 -within Xen. The counts can be reset or displayed 10.1626 -on Xen's console via console control keys. 10.1627 -\item[trace=y] 10.1628 -Enable per-cpu trace buffers which log a range of 10.1629 -events within Xen for collection by control 10.1630 -software. 10.1631 -\end{description} 10.1632 - 10.1633 -\section{Xen Boot Options} 10.1634 -\label{s:xboot} 10.1635 - 10.1636 -These options are used to configure Xen's behaviour at runtime. They 10.1637 -should be appended to Xen's command line, either manually or by 10.1638 -editing \path{grub.conf}. 10.1639 - 10.1640 -\begin{description} 10.1641 -\item [noreboot ] 10.1642 - Don't reboot the machine automatically on errors. This is 10.1643 - useful to catch debug output if you aren't catching console messages 10.1644 - via the serial line. 10.1645 - 10.1646 -\item [nosmp ] 10.1647 - Disable SMP support. 10.1648 - This option is implied by `ignorebiostables'. 10.1649 - 10.1650 -\item [watchdog ] 10.1651 - Enable NMI watchdog which can report certain failures. 10.1652 - 10.1653 -\item [noirqbalance ] 10.1654 - Disable software IRQ balancing and affinity. This can be used on 10.1655 - systems such as Dell 1850/2850 that have workarounds in hardware for 10.1656 - IRQ-routing issues. 10.1657 +%% Chapter Domain Configuration moved to domain_configuration.tex 10.1658 +\include{src/user/domain_configuration} 10.1659 10.1660 -\item [badpage=$<$page number$>$,$<$page number$>$, \ldots ] 10.1661 - Specify a list of pages not to be allocated for use 10.1662 - because they contain bad bytes. For example, if your 10.1663 - memory tester says that byte 0x12345678 is bad, you would 10.1664 - place `badpage=0x12345' on Xen's command line. 10.1665 - 10.1666 -\item [com1=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ 10.1667 - com2=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ ] \mbox{}\\ 10.1668 - Xen supports up to two 16550-compatible serial ports. 10.1669 - For example: `com1=9600, 8n1, 0x408, 5' maps COM1 to a 10.1670 - 9600-baud port, 8 data bits, no parity, 1 stop bit, 10.1671 - I/O port base 0x408, IRQ 5. 10.1672 - If some configuration options are standard (e.g., I/O base and IRQ), 10.1673 - then only a prefix of the full configuration string need be 10.1674 - specified. If the baud rate is pre-configured (e.g., by the 10.1675 - bootloader) then you can specify `auto' in place of a numeric baud 10.1676 - rate. 10.1677 - 10.1678 -\item [console=$<$specifier list$>$ ] 10.1679 - Specify the destination for Xen console I/O. 10.1680 - This is a comma-separated list of, for example: 10.1681 -\begin{description} 10.1682 - \item[vga] use VGA console and allow keyboard input 10.1683 - \item[com1] use serial port com1 10.1684 - \item[com2H] use serial port com2. Transmitted chars will 10.1685 - have the MSB set. Received chars must have 10.1686 - MSB set. 10.1687 - \item[com2L] use serial port com2. Transmitted chars will 10.1688 - have the MSB cleared. Received chars must 10.1689 - have MSB cleared. 10.1690 -\end{description} 10.1691 - The latter two examples allow a single port to be 10.1692 - shared by two subsystems (e.g. console and 10.1693 - debugger). Sharing is controlled by MSB of each 10.1694 - transmitted/received character. 10.1695 - [NB. Default for this option is `com1,vga'] 10.1696 - 10.1697 -\item [sync\_console ] 10.1698 - Force synchronous console output. This is useful if you system fails 10.1699 - unexpectedly before it has sent all available output to the 10.1700 - console. In most cases Xen will automatically enter synchronous mode 10.1701 - when an exceptional event occurs, but this option provides a manual 10.1702 - fallback. 10.1703 - 10.1704 -\item [conswitch=$<$switch-char$><$auto-switch-char$>$ ] 10.1705 - Specify how to switch serial-console input between 10.1706 - Xen and DOM0. The required sequence is CTRL-$<$switch-char$>$ 10.1707 - pressed three times. Specifying the backtick character 10.1708 - disables switching. 10.1709 - The $<$auto-switch-char$>$ specifies whether Xen should 10.1710 - auto-switch input to DOM0 when it boots --- if it is `x' 10.1711 - then auto-switching is disabled. Any other value, or 10.1712 - omitting the character, enables auto-switching. 10.1713 - [NB. default switch-char is `a'] 10.1714 - 10.1715 -\item [nmi=xxx ] 10.1716 - Specify what to do with an NMI parity or I/O error. \\ 10.1717 - `nmi=fatal': Xen prints a diagnostic and then hangs. \\ 10.1718 - `nmi=dom0': Inform DOM0 of the NMI. \\ 10.1719 - `nmi=ignore': Ignore the NMI. 10.1720 - 10.1721 -\item [mem=xxx ] 10.1722 - Set the physical RAM address limit. Any RAM appearing beyond this 10.1723 - physical address in the memory map will be ignored. This parameter 10.1724 - may be specified with a B, K, M or G suffix, representing bytes, 10.1725 - kilobytes, megabytes and gigabytes respectively. The 10.1726 - default unit, if no suffix is specified, is kilobytes. 10.1727 - 10.1728 -\item [dom0\_mem=xxx ] 10.1729 - Set the amount of memory to be allocated to domain0. In Xen 3.x the parameter 10.1730 - may be specified with a B, K, M or G suffix, representing bytes, 10.1731 - kilobytes, megabytes and gigabytes respectively; if no suffix is specified, 10.1732 - the parameter defaults to kilobytes. In previous versions of Xen, suffixes 10.1733 - were not supported and the value is always interpreted as kilobytes. 10.1734 - 10.1735 -\item [tbuf\_size=xxx ] 10.1736 - Set the size of the per-cpu trace buffers, in pages 10.1737 - (default 1). Note that the trace buffers are only 10.1738 - enabled in debug builds. Most users can ignore 10.1739 - this feature completely. 10.1740 - 10.1741 -\item [sched=xxx ] 10.1742 - Select the CPU scheduler Xen should use. The current 10.1743 - possibilities are `bvt' (default), `atropos' and `rrobin'. 10.1744 - For more information see Section~\ref{s:sched}. 10.1745 - 10.1746 -\item [apic\_verbosity=debug,verbose ] 10.1747 - Print more detailed information about local APIC and IOAPIC configuration. 10.1748 - 10.1749 -\item [lapic ] 10.1750 - Force use of local APIC even when left disabled by uniprocessor BIOS. 10.1751 - 10.1752 -\item [nolapic ] 10.1753 - Ignore local APIC in a uniprocessor system, even if enabled by the BIOS. 10.1754 - 10.1755 -\item [apic=bigsmp,default,es7000,summit ] 10.1756 - Specify NUMA platform. This can usually be probed automatically. 10.1757 - 10.1758 -\end{description} 10.1759 - 10.1760 -In addition, the following options may be specified on the Xen command 10.1761 -line. Since domain 0 shares responsibility for booting the platform, 10.1762 -Xen will automatically propagate these options to its command 10.1763 -line. These options are taken from Linux's command-line syntax with 10.1764 -unchanged semantics. 10.1765 - 10.1766 -\begin{description} 10.1767 -\item [acpi=off,force,strict,ht,noirq,\ldots ] 10.1768 - Modify how Xen (and domain 0) parses the BIOS ACPI tables. 10.1769 - 10.1770 -\item [acpi\_skip\_timer\_override ] 10.1771 - Instruct Xen (and domain 0) to ignore timer-interrupt override 10.1772 - instructions specified by the BIOS ACPI tables. 10.1773 - 10.1774 -\item [noapic ] 10.1775 - Instruct Xen (and domain 0) to ignore any IOAPICs that are present in 10.1776 - the system, and instead continue to use the legacy PIC. 10.1777 - 10.1778 -\end{description} 10.1779 - 10.1780 -\section{XenLinux Boot Options} 10.1781 - 10.1782 -In addition to the standard Linux kernel boot options, we support: 10.1783 -\begin{description} 10.1784 -\item[xencons=xxx ] Specify the device node to which the Xen virtual 10.1785 -console driver is attached. The following options are supported: 10.1786 -\begin{center} 10.1787 -\begin{tabular}{l} 10.1788 -`xencons=off': disable virtual console \\ 10.1789 -`xencons=tty': attach console to /dev/tty1 (tty0 at boot-time) \\ 10.1790 -`xencons=ttyS': attach console to /dev/ttyS0 10.1791 -\end{tabular} 10.1792 -\end{center} 10.1793 -The default is ttyS for dom0 and tty for all other domains. 10.1794 -\end{description} 10.1795 - 10.1796 - 10.1797 - 10.1798 -\section{Debugging} 10.1799 -\label{s:keys} 10.1800 - 10.1801 -Xen has a set of debugging features that can be useful to try and 10.1802 -figure out what's going on. Hit 'h' on the serial line (if you 10.1803 -specified a baud rate on the Xen command line) or ScrollLock-h on the 10.1804 -keyboard to get a list of supported commands. 10.1805 - 10.1806 -If you have a crash you'll likely get a crash dump containing an EIP 10.1807 -(PC) which, along with an \path{objdump -d image}, can be useful in 10.1808 -figuring out what's happened. Debug a Xenlinux image just as you 10.1809 -would any other Linux kernel. 10.1810 - 10.1811 -%% We supply a handy debug terminal program which you can find in 10.1812 -%% \path{/usr/local/src/xen-2.0.bk/tools/misc/miniterm/} 10.1813 -%% This should be built and executed on another machine that is connected 10.1814 -%% via a null modem cable. Documentation is included. 10.1815 -%% Alternatively, if the Xen machine is connected to a serial-port server 10.1816 -%% then we supply a dumb TCP terminal client, {\tt xencons}. 10.1817 - 10.1818 - 10.1819 +%% Chapter Build, Boot and Debug Options moved to build.tex 10.1820 +\include{src/user/build} 10.1821 10.1822 10.1823 \chapter{Further Support} 10.1824 @@ -1875,6 +108,7 @@ directory of the Xen source distribution 10.1825 %Various HOWTOs are available in \path{docs/HOWTOS} but this content is 10.1826 %being integrated into this manual. 10.1827 10.1828 + 10.1829 \section{Online References} 10.1830 10.1831 The official Xen web site is found at: 10.1832 @@ -1885,6 +119,7 @@ The official Xen web site is found at: 10.1833 This contains links to the latest versions of all on-line 10.1834 documentation (including the lateset version of the FAQ). 10.1835 10.1836 + 10.1837 \section{Mailing Lists} 10.1838 10.1839 There are currently four official Xen mailing lists: 10.1840 @@ -1905,326 +140,18 @@ from the unstable and 2.0 trees - develo 10.1841 \end{description} 10.1842 10.1843 10.1844 + 10.1845 \appendix 10.1846 10.1847 +%% Chapter Installing Xen / XenLinux on Debian moved to debian.tex 10.1848 +\include{src/user/debian} 10.1849 + 10.1850 +%% Chapter Installing Xen on Red Hat moved to redhat.tex 10.1851 +\include{src/user/redhat} 10.1852 + 10.1853 10.1854 -\chapter{Installing Xen / XenLinux on Debian} 10.1855 - 10.1856 -The Debian project provides a tool called \path{debootstrap} which 10.1857 -allows a base Debian system to be installed into a filesystem without 10.1858 -requiring the host system to have any Debian-specific software (such 10.1859 -as \path{apt}. 10.1860 - 10.1861 -Here's some info how to install Debian 3.1 (Sarge) for an unprivileged 10.1862 -Xen domain: 10.1863 - 10.1864 -\begin{enumerate} 10.1865 -\item Set up Xen 2.0 and test that it's working, as described earlier in 10.1866 - this manual. 10.1867 - 10.1868 -\item Create disk images for root-fs and swap (alternatively, you 10.1869 - might create dedicated partitions, LVM logical volumes, etc. if 10.1870 - that suits your setup). 10.1871 -\begin{small}\begin{verbatim} 10.1872 -dd if=/dev/zero of=/path/diskimage bs=1024k count=size_in_mbytes 10.1873 -dd if=/dev/zero of=/path/swapimage bs=1024k count=size_in_mbytes 10.1874 -\end{verbatim}\end{small} 10.1875 - If you're going to use this filesystem / disk image only as a 10.1876 - `template' for other vm disk images, something like 300 MB should 10.1877 - be enough.. (of course it depends what kind of packages you are 10.1878 - planning to install to the template) 10.1879 - 10.1880 -\item Create the filesystem and initialise the swap image 10.1881 -\begin{small}\begin{verbatim} 10.1882 -mkfs.ext3 /path/diskimage 10.1883 -mkswap /path/swapimage 10.1884 -\end{verbatim}\end{small} 10.1885 - 10.1886 -\item Mount the disk image for installation 10.1887 -\begin{small}\begin{verbatim} 10.1888 -mount -o loop /path/diskimage /mnt/disk 10.1889 -\end{verbatim}\end{small} 10.1890 - 10.1891 -\item Install \path{debootstrap} 10.1892 - 10.1893 -Make sure you have debootstrap installed on the host. If you are 10.1894 -running Debian sarge (3.1 / testing) or unstable you can install it by 10.1895 -running \path{apt-get install debootstrap}. Otherwise, it can be 10.1896 -downloaded from the Debian project website. 10.1897 - 10.1898 -\item Install Debian base to the disk image: 10.1899 -\begin{small}\begin{verbatim} 10.1900 -debootstrap --arch i386 sarge /mnt/disk \ 10.1901 - http://ftp.<countrycode>.debian.org/debian 10.1902 -\end{verbatim}\end{small} 10.1903 - 10.1904 -You can use any other Debian http/ftp mirror you want. 10.1905 - 10.1906 -\item When debootstrap completes successfully, modify settings: 10.1907 -\begin{small}\begin{verbatim} 10.1908 -chroot /mnt/disk /bin/bash 10.1909 -\end{verbatim}\end{small} 10.1910 - 10.1911 -Edit the following files using vi or nano and make needed changes: 10.1912 -\begin{small}\begin{verbatim} 10.1913 -/etc/hostname 10.1914 -/etc/hosts 10.1915 -/etc/resolv.conf 10.1916 -/etc/network/interfaces 10.1917 -/etc/networks 10.1918 -\end{verbatim}\end{small} 10.1919 - 10.1920 -Set up access to the services, edit: 10.1921 -\begin{small}\begin{verbatim} 10.1922 -/etc/hosts.deny 10.1923 -/etc/hosts.allow 10.1924 -/etc/inetd.conf 10.1925 -\end{verbatim}\end{small} 10.1926 - 10.1927 -Add Debian mirror to: 10.1928 -\begin{small}\begin{verbatim} 10.1929 -/etc/apt/sources.list 10.1930 -\end{verbatim}\end{small} 10.1931 - 10.1932 -Create fstab like this: 10.1933 -\begin{small}\begin{verbatim} 10.1934 -/dev/sda1 / ext3 errors=remount-ro 0 1 10.1935 -/dev/sda2 none swap sw 0 0 10.1936 -proc /proc proc defaults 0 0 10.1937 -\end{verbatim}\end{small} 10.1938 - 10.1939 -Logout 10.1940 - 10.1941 -\item Unmount the disk image 10.1942 -\begin{small}\begin{verbatim} 10.1943 -umount /mnt/disk 10.1944 -\end{verbatim}\end{small} 10.1945 - 10.1946 -\item Create Xen 2.0 configuration file for the new domain. You can 10.1947 - use the example-configurations coming with Xen as a template. 10.1948 - 10.1949 - Make sure you have the following set up: 10.1950 -\begin{small}\begin{verbatim} 10.1951 -disk = [ 'file:/path/diskimage,sda1,w', 'file:/path/swapimage,sda2,w' ] 10.1952 -root = "/dev/sda1 ro" 10.1953 -\end{verbatim}\end{small} 10.1954 - 10.1955 -\item Start the new domain 10.1956 -\begin{small}\begin{verbatim} 10.1957 -xm create -f domain_config_file 10.1958 -\end{verbatim}\end{small} 10.1959 - 10.1960 -Check that the new domain is running: 10.1961 -\begin{small}\begin{verbatim} 10.1962 -xm list 10.1963 -\end{verbatim}\end{small} 10.1964 - 10.1965 -\item Attach to the console of the new domain. 10.1966 - You should see something like this when starting the new domain: 10.1967 - 10.1968 -\begin{small}\begin{verbatim} 10.1969 -Started domain testdomain2, console on port 9626 10.1970 -\end{verbatim}\end{small} 10.1971 - 10.1972 - There you can see the ID of the console: 26. You can also list 10.1973 - the consoles with \path{xm consoles} (ID is the last two 10.1974 - digits of the port number.) 10.1975 - 10.1976 - Attach to the console: 10.1977 - 10.1978 -\begin{small}\begin{verbatim} 10.1979 -xm console 26 10.1980 -\end{verbatim}\end{small} 10.1981 - 10.1982 - or by telnetting to the port 9626 of localhost (the xm console 10.1983 - program works better). 10.1984 - 10.1985 -\item Log in and run base-config 10.1986 - 10.1987 - As a default there's no password for the root. 10.1988 - 10.1989 - Check that everything looks OK, and the system started without 10.1990 - errors. Check that the swap is active, and the network settings are 10.1991 - correct. 10.1992 - 10.1993 - Run \path{/usr/sbin/base-config} to set up the Debian settings. 10.1994 - 10.1995 - Set up the password for root using passwd. 10.1996 - 10.1997 -\item Done. You can exit the console by pressing \path{Ctrl + ]} 10.1998 - 10.1999 -\end{enumerate} 10.2000 - 10.2001 -If you need to create new domains, you can just copy the contents of 10.2002 -the `template'-image to the new disk images, either by mounting the 10.2003 -template and the new image, and using \path{cp -a} or \path{tar} or by 10.2004 -simply copying the image file. Once this is done, modify the 10.2005 -image-specific settings (hostname, network settings, etc). 10.2006 - 10.2007 -\chapter{Installing Xen / XenLinux on Redhat or Fedora Core} 10.2008 - 10.2009 -When using Xen / XenLinux on a standard Linux distribution there are 10.2010 -a couple of things to watch out for: 10.2011 - 10.2012 -Note that, because domains>0 don't have any privileged access at all, 10.2013 -certain commands in the default boot sequence will fail e.g. attempts 10.2014 -to update the hwclock, change the console font, update the keytable 10.2015 -map, start apmd (power management), or gpm (mouse cursor). Either 10.2016 -ignore the errors (they should be harmless), or remove them from the 10.2017 -startup scripts. Deleting the following links are a good start: 10.2018 -{\path{S24pcmcia}}, {\path{S09isdn}}, 10.2019 -{\path{S17keytable}}, {\path{S26apmd}}, 10.2020 -{\path{S85gpm}}. 10.2021 - 10.2022 -If you want to use a single root file system that works cleanly for 10.2023 -both domain 0 and unprivileged domains, a useful trick is to use 10.2024 -different 'init' run levels. For example, use 10.2025 -run level 3 for domain 0, and run level 4 for other domains. This 10.2026 -enables different startup scripts to be run in depending on the run 10.2027 -level number passed on the kernel command line. 10.2028 - 10.2029 -If using NFS root files systems mounted either from an 10.2030 -external server or from domain0 there are a couple of other gotchas. 10.2031 -The default {\path{/etc/sysconfig/iptables}} rules block NFS, so part 10.2032 -way through the boot sequence things will suddenly go dead. 10.2033 - 10.2034 -If you're planning on having a separate NFS {\path{/usr}} partition, the 10.2035 -RH9 boot scripts don't make life easy - they attempt to mount NFS file 10.2036 -systems way to late in the boot process. The easiest way I found to do 10.2037 -this was to have a {\path{/linuxrc}} script run ahead of 10.2038 -{\path{/sbin/init}} that mounts {\path{/usr}}: 10.2039 - 10.2040 -\begin{quote} 10.2041 -\begin{small}\begin{verbatim} 10.2042 - #!/bin/bash 10.2043 - /sbin/ipconfig lo 127.0.0.1 10.2044 - /sbin/portmap 10.2045 - /bin/mount /usr 10.2046 - exec /sbin/init "$@" <>/dev/console 2>&1 10.2047 -\end{verbatim}\end{small} 10.2048 -\end{quote} 10.2049 - 10.2050 -%$ XXX SMH: font lock fix :-) 10.2051 - 10.2052 -The one slight complication with the above is that 10.2053 -{\path{/sbin/portmap}} is dynamically linked against 10.2054 -{\path{/usr/lib/libwrap.so.0}} Since this is in 10.2055 -{\path{/usr}}, it won't work. This can be solved by copying the 10.2056 -file (and link) below the /usr mount point, and just let the file be 10.2057 -'covered' when the mount happens. 10.2058 - 10.2059 -In some installations, where a shared read-only {\path{/usr}} is 10.2060 -being used, it may be desirable to move other large directories over 10.2061 -into the read-only {\path{/usr}}. For example, you might replace 10.2062 -{\path{/bin}}, {\path{/lib}} and {\path{/sbin}} with 10.2063 -links into {\path{/usr/root/bin}}, {\path{/usr/root/lib}} 10.2064 -and {\path{/usr/root/sbin}} respectively. This creates other 10.2065 -problems for running the {\path{/linuxrc}} script, requiring 10.2066 -bash, portmap, mount, ifconfig, and a handful of other shared 10.2067 -libraries to be copied below the mount point --- a simple 10.2068 -statically-linked C program would solve this problem. 10.2069 - 10.2070 - 10.2071 - 10.2072 - 10.2073 -\chapter{Glossary of Terms} 10.2074 - 10.2075 -\begin{description} 10.2076 -\item[Atropos] One of the CPU schedulers provided by Xen. 10.2077 - Atropos provides domains with absolute shares 10.2078 - of the CPU, with timeliness guarantees and a 10.2079 - mechanism for sharing out `slack time'. 10.2080 - 10.2081 -\item[BVT] The BVT scheduler is used to give proportional 10.2082 - fair shares of the CPU to domains. 10.2083 - 10.2084 -\item[Exokernel] A minimal piece of privileged code, similar to 10.2085 - a {\bf microkernel} but providing a more 10.2086 - `hardware-like' interface to the tasks it 10.2087 - manages. This is similar to a paravirtualising 10.2088 - VMM like {\bf Xen} but was designed as a new 10.2089 - operating system structure, rather than 10.2090 - specifically to run multiple conventional OSs. 10.2091 - 10.2092 -\item[Domain] A domain is the execution context that 10.2093 - contains a running {\bf virtual machine}. 10.2094 - The relationship between virtual machines 10.2095 - and domains on Xen is similar to that between 10.2096 - programs and processes in an operating 10.2097 - system: a virtual machine is a persistent 10.2098 - entity that resides on disk (somewhat like 10.2099 - a program). When it is loaded for execution, 10.2100 - it runs in a domain. Each domain has a 10.2101 - {\bf domain ID}. 10.2102 - 10.2103 -\item[Domain 0] The first domain to be started on a Xen 10.2104 - machine. Domain 0 is responsible for managing 10.2105 - the system. 10.2106 - 10.2107 -\item[Domain ID] A unique identifier for a {\bf domain}, 10.2108 - analogous to a process ID in an operating 10.2109 - system. 10.2110 - 10.2111 -\item[Full virtualisation] An approach to virtualisation which 10.2112 - requires no modifications to the hosted 10.2113 - operating system, providing the illusion of 10.2114 - a complete system of real hardware devices. 10.2115 - 10.2116 -\item[Hypervisor] An alternative term for {\bf VMM}, used 10.2117 - because it means `beyond supervisor', 10.2118 - since it is responsible for managing multiple 10.2119 - `supervisor' kernels. 10.2120 - 10.2121 -\item[Live migration] A technique for moving a running virtual 10.2122 - machine to another physical host, without 10.2123 - stopping it or the services running on it. 10.2124 - 10.2125 -\item[Microkernel] A small base of code running at the highest 10.2126 - hardware privilege level. A microkernel is 10.2127 - responsible for sharing CPU and memory (and 10.2128 - sometimes other devices) between less 10.2129 - privileged tasks running on the system. 10.2130 - This is similar to a VMM, particularly a 10.2131 - {\bf paravirtualising} VMM but typically 10.2132 - addressing a different problem space and 10.2133 - providing different kind of interface. 10.2134 - 10.2135 -\item[NetBSD/Xen] A port of NetBSD to the Xen architecture. 10.2136 - 10.2137 -\item[Paravirtualisation] An approach to virtualisation which requires 10.2138 - modifications to the operating system in 10.2139 - order to run in a virtual machine. Xen 10.2140 - uses paravirtualisation but preserves 10.2141 - binary compatibility for user space 10.2142 - applications. 10.2143 - 10.2144 -\item[Shadow pagetables] A technique for hiding the layout of machine 10.2145 - memory from a virtual machine's operating 10.2146 - system. Used in some {\bf VMMs} to provide 10.2147 - the illusion of contiguous physical memory, 10.2148 - in Xen this is used during 10.2149 - {\bf live migration}. 10.2150 - 10.2151 -\item[Virtual Machine] The environment in which a hosted operating 10.2152 - system runs, providing the abstraction of a 10.2153 - dedicated machine. A virtual machine may 10.2154 - be identical to the underlying hardware (as 10.2155 - in {\bf full virtualisation}, or it may 10.2156 - differ, as in {\bf paravirtualisation}. 10.2157 - 10.2158 -\item[VMM] Virtual Machine Monitor - the software that 10.2159 - allows multiple virtual machines to be 10.2160 - multiplexed on a single physical machine. 10.2161 - 10.2162 -\item[Xen] Xen is a paravirtualising virtual machine 10.2163 - monitor, developed primarily by the 10.2164 - Systems Research Group at the University 10.2165 - of Cambridge Computer Laboratory. 10.2166 - 10.2167 -\item[XenLinux] Official name for the port of the Linux kernel 10.2168 - that runs on Xen. 10.2169 - 10.2170 -\end{description} 10.2171 +%% Chapter Glossary of Terms moved to glossary.tex 10.2172 +\include{src/user/glossary} 10.2173 10.2174 10.2175 \end{document}
11.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 11.2 +++ b/docs/src/user/build.tex Tue Sep 20 09:17:33 2005 +0000 11.3 @@ -0,0 +1,170 @@ 11.4 +\chapter{Build, Boot and Debug Options} 11.5 + 11.6 +This chapter describes the build- and boot-time options which may be 11.7 +used to tailor your Xen system. 11.8 + 11.9 + 11.10 +\section{Xen Build Options} 11.11 + 11.12 +Xen provides a number of build-time options which should be set as 11.13 +environment variables or passed on make's command-line. 11.14 + 11.15 +\begin{description} 11.16 +\item[verbose=y] Enable debugging messages when Xen detects an 11.17 + unexpected condition. Also enables console output from all domains. 11.18 +\item[debug=y] Enable debug assertions. Implies {\bf verbose=y}. 11.19 + (Primarily useful for tracing bugs in Xen). 11.20 +\item[debugger=y] Enable the in-Xen debugger. This can be used to 11.21 + debug Xen, guest OSes, and applications. 11.22 +\item[perfc=y] Enable performance counters for significant events 11.23 + within Xen. The counts can be reset or displayed on Xen's console 11.24 + via console control keys. 11.25 +\item[trace=y] Enable per-cpu trace buffers which log a range of 11.26 + events within Xen for collection by control software. 11.27 +\end{description} 11.28 + 11.29 + 11.30 +\section{Xen Boot Options} 11.31 +\label{s:xboot} 11.32 + 11.33 +These options are used to configure Xen's behaviour at runtime. They 11.34 +should be appended to Xen's command line, either manually or by 11.35 +editing \path{grub.conf}. 11.36 + 11.37 +\begin{description} 11.38 +\item [ noreboot ] Don't reboot the machine automatically on errors. 11.39 + This is useful to catch debug output if you aren't catching console 11.40 + messages via the serial line. 11.41 +\item [ nosmp ] Disable SMP support. This option is implied by 11.42 + `ignorebiostables'. 11.43 +\item [ watchdog ] Enable NMI watchdog which can report certain 11.44 + failures. 11.45 +\item [ noirqbalance ] Disable software IRQ balancing and affinity. 11.46 + This can be used on systems such as Dell 1850/2850 that have 11.47 + workarounds in hardware for IRQ-routing issues. 11.48 +\item [ badpage=$<$page number$>$,$<$page number$>$, \ldots ] Specify 11.49 + a list of pages not to be allocated for use because they contain bad 11.50 + bytes. For example, if your memory tester says that byte 0x12345678 11.51 + is bad, you would place `badpage=0x12345' on Xen's command line. 11.52 +\item [ com1=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ 11.53 + com2=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ ] \mbox{}\\ 11.54 + Xen supports up to two 16550-compatible serial ports. For example: 11.55 + `com1=9600, 8n1, 0x408, 5' maps COM1 to a 9600-baud port, 8 data 11.56 + bits, no parity, 1 stop bit, I/O port base 0x408, IRQ 5. If some 11.57 + configuration options are standard (e.g., I/O base and IRQ), then 11.58 + only a prefix of the full configuration string need be specified. If 11.59 + the baud rate is pre-configured (e.g., by the bootloader) then you 11.60 + can specify `auto' in place of a numeric baud rate. 11.61 +\item [ console=$<$specifier list$>$ ] Specify the destination for Xen 11.62 + console I/O. This is a comma-separated list of, for example: 11.63 + \begin{description} 11.64 + \item[ vga ] Use VGA console and allow keyboard input. 11.65 + \item[ com1 ] Use serial port com1. 11.66 + \item[ com2H ] Use serial port com2. Transmitted chars will have the 11.67 + MSB set. Received chars must have MSB set. 11.68 + \item[ com2L] Use serial port com2. Transmitted chars will have the 11.69 + MSB cleared. Received chars must have MSB cleared. 11.70 + \end{description} 11.71 + The latter two examples allow a single port to be shared by two 11.72 + subsystems (e.g.\ console and debugger). Sharing is controlled by 11.73 + MSB of each transmitted/received character. [NB. Default for this 11.74 + option is `com1,vga'] 11.75 +\item [ sync\_console ] Force synchronous console output. This is 11.76 + useful if you system fails unexpectedly before it has sent all 11.77 + available output to the console. In most cases Xen will 11.78 + automatically enter synchronous mode when an exceptional event 11.79 + occurs, but this option provides a manual fallback. 11.80 +\item [ conswitch=$<$switch-char$><$auto-switch-char$>$ ] Specify how 11.81 + to switch serial-console input between Xen and DOM0. The required 11.82 + sequence is CTRL-$<$switch-char$>$ pressed three times. Specifying 11.83 + the backtick character disables switching. The 11.84 + $<$auto-switch-char$>$ specifies whether Xen should auto-switch 11.85 + input to DOM0 when it boots --- if it is `x' then auto-switching is 11.86 + disabled. Any other value, or omitting the character, enables 11.87 + auto-switching. [NB. Default switch-char is `a'.] 11.88 +\item [ nmi=xxx ] 11.89 + Specify what to do with an NMI parity or I/O error. \\ 11.90 + `nmi=fatal': Xen prints a diagnostic and then hangs. \\ 11.91 + `nmi=dom0': Inform DOM0 of the NMI. \\ 11.92 + `nmi=ignore': Ignore the NMI. 11.93 +\item [ mem=xxx ] Set the physical RAM address limit. Any RAM 11.94 + appearing beyond this physical address in the memory map will be 11.95 + ignored. This parameter may be specified with a B, K, M or G suffix, 11.96 + representing bytes, kilobytes, megabytes and gigabytes respectively. 11.97 + The default unit, if no suffix is specified, is kilobytes. 11.98 +\item [ dom0\_mem=xxx ] Set the amount of memory to be allocated to 11.99 + domain0. In Xen 3.x the parameter may be specified with a B, K, M or 11.100 + G suffix, representing bytes, kilobytes, megabytes and gigabytes 11.101 + respectively; if no suffix is specified, the parameter defaults to 11.102 + kilobytes. In previous versions of Xen, suffixes were not supported 11.103 + and the value is always interpreted as kilobytes. 11.104 +\item [ tbuf\_size=xxx ] Set the size of the per-cpu trace buffers, in 11.105 + pages (default 1). Note that the trace buffers are only enabled in 11.106 + debug builds. Most users can ignore this feature completely. 11.107 +\item [ sched=xxx ] Select the CPU scheduler Xen should use. The 11.108 + current possibilities are `bvt' (default), `atropos' and `rrobin'. 11.109 + For more information see Section~\ref{s:sched}. 11.110 +\item [ apic\_verbosity=debug,verbose ] Print more detailed 11.111 + information about local APIC and IOAPIC configuration. 11.112 +\item [ lapic ] Force use of local APIC even when left disabled by 11.113 + uniprocessor BIOS. 11.114 +\item [ nolapic ] Ignore local APIC in a uniprocessor system, even if 11.115 + enabled by the BIOS. 11.116 +\item [ apic=bigsmp,default,es7000,summit ] Specify NUMA platform. 11.117 + This can usually be probed automatically. 11.118 +\end{description} 11.119 + 11.120 +In addition, the following options may be specified on the Xen command 11.121 +line. Since domain 0 shares responsibility for booting the platform, 11.122 +Xen will automatically propagate these options to its command line. 11.123 +These options are taken from Linux's command-line syntax with 11.124 +unchanged semantics. 11.125 + 11.126 +\begin{description} 11.127 +\item [ acpi=off,force,strict,ht,noirq,\ldots ] Modify how Xen (and 11.128 + domain 0) parses the BIOS ACPI tables. 11.129 +\item [ acpi\_skip\_timer\_override ] Instruct Xen (and domain~0) to 11.130 + ignore timer-interrupt override instructions specified by the BIOS 11.131 + ACPI tables. 11.132 +\item [ noapic ] Instruct Xen (and domain~0) to ignore any IOAPICs 11.133 + that are present in the system, and instead continue to use the 11.134 + legacy PIC. 11.135 +\end{description} 11.136 + 11.137 + 11.138 +\section{XenLinux Boot Options} 11.139 + 11.140 +In addition to the standard Linux kernel boot options, we support: 11.141 +\begin{description} 11.142 +\item[ xencons=xxx ] Specify the device node to which the Xen virtual 11.143 + console driver is attached. The following options are supported: 11.144 + \begin{center} 11.145 + \begin{tabular}{l} 11.146 + `xencons=off': disable virtual console \\ 11.147 + `xencons=tty': attach console to /dev/tty1 (tty0 at boot-time) \\ 11.148 + `xencons=ttyS': attach console to /dev/ttyS0 11.149 + \end{tabular} 11.150 +\end{center} 11.151 +The default is ttyS for dom0 and tty for all other domains. 11.152 +\end{description} 11.153 + 11.154 + 11.155 +\section{Debugging} 11.156 +\label{s:keys} 11.157 + 11.158 +Xen has a set of debugging features that can be useful to try and 11.159 +figure out what's going on. Hit `h' on the serial line (if you 11.160 +specified a baud rate on the Xen command line) or ScrollLock-h on the 11.161 +keyboard to get a list of supported commands. 11.162 + 11.163 +If you have a crash you'll likely get a crash dump containing an EIP 11.164 +(PC) which, along with an \path{objdump -d image}, can be useful in 11.165 +figuring out what's happened. Debug a Xenlinux image just as you 11.166 +would any other Linux kernel. 11.167 + 11.168 +%% We supply a handy debug terminal program which you can find in 11.169 +%% \path{/usr/local/src/xen-2.0.bk/tools/misc/miniterm/} This should 11.170 +%% be built and executed on another machine that is connected via a 11.171 +%% null modem cable. Documentation is included. Alternatively, if the 11.172 +%% Xen machine is connected to a serial-port server then we supply a 11.173 +%% dumb TCP terminal client, {\tt xencons}.
12.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 12.2 +++ b/docs/src/user/control_software.tex Tue Sep 20 09:17:33 2005 +0000 12.3 @@ -0,0 +1,115 @@ 12.4 +\chapter{Control Software} 12.5 + 12.6 +The Xen control software includes the \xend\ node control daemon 12.7 +(which must be running), the xm command line tools, and the prototype 12.8 +xensv web interface. 12.9 + 12.10 +\section{\Xend\ (node control daemon)} 12.11 +\label{s:xend} 12.12 + 12.13 +The Xen Daemon (\Xend) performs system management functions related to 12.14 +virtual machines. It forms a central point of control for a machine 12.15 +and can be controlled using an HTTP-based protocol. \Xend\ must be 12.16 +running in order to start and manage virtual machines. 12.17 + 12.18 +\Xend\ must be run as root because it needs access to privileged 12.19 +system management functions. A small set of commands may be issued on 12.20 +the \xend\ command line: 12.21 + 12.22 +\begin{tabular}{ll} 12.23 + \verb!# xend start! & start \xend, if not already running \\ 12.24 + \verb!# xend stop! & stop \xend\ if already running \\ 12.25 + \verb!# xend restart! & restart \xend\ if running, otherwise start it \\ 12.26 + % \verb!# xend trace_start! & start \xend, with very detailed debug logging \\ 12.27 + \verb!# xend status! & indicates \xend\ status by its return code 12.28 +\end{tabular} 12.29 + 12.30 +A SysV init script called {\tt xend} is provided to start \xend\ at 12.31 +boot time. {\tt make install} installs this script in 12.32 +\path{/etc/init.d}. To enable it, you have to make symbolic links in 12.33 +the appropriate runlevel directories or use the {\tt chkconfig} tool, 12.34 +where available. 12.35 + 12.36 +Once \xend\ is running, more sophisticated administration can be done 12.37 +using the xm tool (see Section~\ref{s:xm}) and the experimental Xensv 12.38 +web interface (see Section~\ref{s:xensv}). 12.39 + 12.40 +As \xend\ runs, events will be logged to \path{/var/log/xend.log} and, 12.41 +if the migration assistant daemon (\path{xfrd}) has been started, 12.42 +\path{/var/log/xfrd.log}. These may be of use for troubleshooting 12.43 +problems. 12.44 + 12.45 +\section{Xm (command line interface)} 12.46 +\label{s:xm} 12.47 + 12.48 +The xm tool is the primary tool for managing Xen from the console. 12.49 +The general format of an xm command line is: 12.50 + 12.51 +\begin{verbatim} 12.52 +# xm command [switches] [arguments] [variables] 12.53 +\end{verbatim} 12.54 + 12.55 +The available \emph{switches} and \emph{arguments} are dependent on 12.56 +the \emph{command} chosen. The \emph{variables} may be set using 12.57 +declarations of the form {\tt variable=value} and command line 12.58 +declarations override any of the values in the configuration file 12.59 +being used, including the standard variables described above and any 12.60 +custom variables (for instance, the \path{xmdefconfig} file uses a 12.61 +{\tt vmid} variable). 12.62 + 12.63 +The available commands are as follows: 12.64 + 12.65 +\begin{description} 12.66 +\item[set-mem] Request a domain to adjust its memory footprint. 12.67 +\item[create] Create a new domain. 12.68 +\item[destroy] Kill a domain immediately. 12.69 +\item[list] List running domains. 12.70 +\item[shutdown] Ask a domain to shutdown. 12.71 +\item[dmesg] Fetch the Xen (not Linux!) boot output. 12.72 +\item[consoles] Lists the available consoles. 12.73 +\item[console] Connect to the console for a domain. 12.74 +\item[help] Get help on xm commands. 12.75 +\item[save] Suspend a domain to disk. 12.76 +\item[restore] Restore a domain from disk. 12.77 +\item[pause] Pause a domain's execution. 12.78 +\item[unpause] Un-pause a domain. 12.79 +\item[pincpu] Pin a domain to a CPU. 12.80 +\item[bvt] Set BVT scheduler parameters for a domain. 12.81 +\item[bvt\_ctxallow] Set the BVT context switching allowance for the 12.82 + system. 12.83 +\item[atropos] Set the atropos parameters for a domain. 12.84 +\item[rrobin] Set the round robin time slice for the system. 12.85 +\item[info] Get information about the Xen host. 12.86 +\item[call] Call a \xend\ HTTP API function directly. 12.87 +\end{description} 12.88 + 12.89 +For a detailed overview of switches, arguments and variables to each 12.90 +command try 12.91 +\begin{quote} 12.92 +\begin{verbatim} 12.93 +# xm help command 12.94 +\end{verbatim} 12.95 +\end{quote} 12.96 + 12.97 +\section{Xensv (web control interface)} 12.98 +\label{s:xensv} 12.99 + 12.100 +Xensv is the experimental web control interface for managing a Xen 12.101 +machine. It can be used to perform some (but not yet all) of the 12.102 +management tasks that can be done using the xm tool. 12.103 + 12.104 +It can be started using: 12.105 +\begin{quote} 12.106 + \verb_# xensv start_ 12.107 +\end{quote} 12.108 +and stopped using: 12.109 +\begin{quote} 12.110 + \verb_# xensv stop_ 12.111 +\end{quote} 12.112 + 12.113 +By default, Xensv will serve out the web interface on port 8080. This 12.114 +can be changed by editing 12.115 +\path{/usr/lib/python2.3/site-packages/xen/sv/params.py}. 12.116 + 12.117 +Once Xensv is running, the web interface can be used to create and 12.118 +manage running domains.
13.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 13.2 +++ b/docs/src/user/debian.tex Tue Sep 20 09:17:33 2005 +0000 13.3 @@ -0,0 +1,154 @@ 13.4 +\chapter{Installing Xen / XenLinux on Debian} 13.5 + 13.6 +The Debian project provides a tool called \path{debootstrap} which 13.7 +allows a base Debian system to be installed into a filesystem without 13.8 +requiring the host system to have any Debian-specific software (such 13.9 +as \path{apt}). 13.10 + 13.11 +Here's some info how to install Debian 3.1 (Sarge) for an unprivileged 13.12 +Xen domain: 13.13 + 13.14 +\begin{enumerate} 13.15 + 13.16 +\item Set up Xen and test that it's working, as described earlier in 13.17 + this manual. 13.18 + 13.19 +\item Create disk images for rootfs and swap. Alternatively, you might 13.20 + create dedicated partitions, LVM logical volumes, etc.\ if that 13.21 + suits your setup. 13.22 +\begin{verbatim} 13.23 +dd if=/dev/zero of=/path/diskimage bs=1024k count=size_in_mbytes 13.24 +dd if=/dev/zero of=/path/swapimage bs=1024k count=size_in_mbytes 13.25 +\end{verbatim} 13.26 + 13.27 + If you're going to use this filesystem / disk image only as a 13.28 + `template' for other vm disk images, something like 300 MB should be 13.29 + enough. (of course it depends what kind of packages you are planning 13.30 + to install to the template) 13.31 + 13.32 +\item Create the filesystem and initialise the swap image 13.33 +\begin{verbatim} 13.34 +mkfs.ext3 /path/diskimage 13.35 +mkswap /path/swapimage 13.36 +\end{verbatim} 13.37 + 13.38 +\item Mount the disk image for installation 13.39 +\begin{verbatim} 13.40 +mount -o loop /path/diskimage /mnt/disk 13.41 +\end{verbatim} 13.42 + 13.43 +\item Install \path{debootstrap}. Make sure you have debootstrap 13.44 + installed on the host. If you are running Debian Sarge (3.1 / 13.45 + testing) or unstable you can install it by running \path{apt-get 13.46 + install debootstrap}. Otherwise, it can be downloaded from the 13.47 + Debian project website. 13.48 + 13.49 +\item Install Debian base to the disk image: 13.50 +\begin{verbatim} 13.51 +debootstrap --arch i386 sarge /mnt/disk \ 13.52 + http://ftp.<countrycode>.debian.org/debian 13.53 +\end{verbatim} 13.54 + 13.55 + You can use any other Debian http/ftp mirror you want. 13.56 + 13.57 +\item When debootstrap completes successfully, modify settings: 13.58 +\begin{verbatim} 13.59 +chroot /mnt/disk /bin/bash 13.60 +\end{verbatim} 13.61 + 13.62 +Edit the following files using vi or nano and make needed changes: 13.63 +\begin{verbatim} 13.64 +/etc/hostname 13.65 +/etc/hosts 13.66 +/etc/resolv.conf 13.67 +/etc/network/interfaces 13.68 +/etc/networks 13.69 +\end{verbatim} 13.70 + 13.71 +Set up access to the services, edit: 13.72 +\begin{verbatim} 13.73 +/etc/hosts.deny 13.74 +/etc/hosts.allow 13.75 +/etc/inetd.conf 13.76 +\end{verbatim} 13.77 + 13.78 +Add Debian mirror to: 13.79 +\begin{verbatim} 13.80 +/etc/apt/sources.list 13.81 +\end{verbatim} 13.82 + 13.83 +Create fstab like this: 13.84 +\begin{verbatim} 13.85 +/dev/sda1 / ext3 errors=remount-ro 0 1 13.86 +/dev/sda2 none swap sw 0 0 13.87 +proc /proc proc defaults 0 0 13.88 +\end{verbatim} 13.89 + 13.90 +Logout 13.91 + 13.92 +\item Unmount the disk image 13.93 +\begin{verbatim} 13.94 +umount /mnt/disk 13.95 +\end{verbatim} 13.96 + 13.97 +\item Create Xen 2.0 configuration file for the new domain. You can 13.98 + use the example-configurations coming with Xen as a template. 13.99 + 13.100 + Make sure you have the following set up: 13.101 +\begin{verbatim} 13.102 +disk = [ 'file:/path/diskimage,sda1,w', 'file:/path/swapimage,sda2,w' ] 13.103 +root = "/dev/sda1 ro" 13.104 +\end{verbatim} 13.105 + 13.106 +\item Start the new domain 13.107 +\begin{verbatim} 13.108 +xm create -f domain_config_file 13.109 +\end{verbatim} 13.110 + 13.111 +Check that the new domain is running: 13.112 +\begin{verbatim} 13.113 +xm list 13.114 +\end{verbatim} 13.115 + 13.116 +\item Attach to the console of the new domain. You should see 13.117 + something like this when starting the new domain: 13.118 + 13.119 +\begin{verbatim} 13.120 +Started domain testdomain2, console on port 9626 13.121 +\end{verbatim} 13.122 + 13.123 + There you can see the ID of the console: 26. You can also list the 13.124 + consoles with \path{xm consoles} (ID is the last two digits of the 13.125 + port number.) 13.126 + 13.127 + Attach to the console: 13.128 + 13.129 +\begin{verbatim} 13.130 +xm console 26 13.131 +\end{verbatim} 13.132 + 13.133 + or by telnetting to the port 9626 of localhost (the xm console 13.134 + program works better). 13.135 + 13.136 +\item Log in and run base-config 13.137 + 13.138 + As a default there's no password for the root. 13.139 + 13.140 + Check that everything looks OK, and the system started without 13.141 + errors. Check that the swap is active, and the network settings are 13.142 + correct. 13.143 + 13.144 + Run \path{/usr/sbin/base-config} to set up the Debian settings. 13.145 + 13.146 + Set up the password for root using passwd. 13.147 + 13.148 +\item Done. You can exit the console by pressing {\path{Ctrl + ]}} 13.149 + 13.150 +\end{enumerate} 13.151 + 13.152 + 13.153 +If you need to create new domains, you can just copy the contents of 13.154 +the `template'-image to the new disk images, either by mounting the 13.155 +template and the new image, and using \path{cp -a} or \path{tar} or by 13.156 +simply copying the image file. Once this is done, modify the 13.157 +image-specific settings (hostname, network settings, etc).
14.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 14.2 +++ b/docs/src/user/domain_configuration.tex Tue Sep 20 09:17:33 2005 +0000 14.3 @@ -0,0 +1,281 @@ 14.4 +\chapter{Domain Configuration} 14.5 +\label{cha:config} 14.6 + 14.7 +The following contains the syntax of the domain configuration files 14.8 +and description of how to further specify networking, driver domain 14.9 +and general scheduling behavior. 14.10 + 14.11 + 14.12 +\section{Configuration Files} 14.13 +\label{s:cfiles} 14.14 + 14.15 +Xen configuration files contain the following standard variables. 14.16 +Unless otherwise stated, configuration items should be enclosed in 14.17 +quotes: see \path{/etc/xen/xmexample1} and \path{/etc/xen/xmexample2} 14.18 +for concrete examples of the syntax. 14.19 + 14.20 +\begin{description} 14.21 +\item[kernel] Path to the kernel image. 14.22 +\item[ramdisk] Path to a ramdisk image (optional). 14.23 + % \item[builder] The name of the domain build function (e.g. 14.24 + % {\tt'linux'} or {\tt'netbsd'}. 14.25 +\item[memory] Memory size in megabytes. 14.26 +\item[cpu] CPU to run this domain on, or {\tt -1} for auto-allocation. 14.27 +\item[console] Port to export the domain console on (default 9600 + 14.28 + domain ID). 14.29 +\item[nics] Number of virtual network interfaces. 14.30 +\item[vif] List of MAC addresses (random addresses are assigned if not 14.31 + given) and bridges to use for the domain's network interfaces, e.g.\ 14.32 +\begin{verbatim} 14.33 +vif = [ 'mac=aa:00:00:00:00:11, bridge=xen-br0', 14.34 + 'bridge=xen-br1' ] 14.35 +\end{verbatim} 14.36 + to assign a MAC address and bridge to the first interface and assign 14.37 + a different bridge to the second interface, leaving \xend\ to choose 14.38 + the MAC address. 14.39 +\item[disk] List of block devices to export to the domain, e.g.\ \\ 14.40 + \verb_disk = [ 'phy:hda1,sda1,r' ]_ \\ 14.41 + exports physical device \path{/dev/hda1} to the domain as 14.42 + \path{/dev/sda1} with read-only access. Exporting a disk read-write 14.43 + which is currently mounted is dangerous -- if you are \emph{certain} 14.44 + you wish to do this, you can specify \path{w!} as the mode. 14.45 +\item[dhcp] Set to {\tt `dhcp'} if you want to use DHCP to configure 14.46 + networking. 14.47 +\item[netmask] Manually configured IP netmask. 14.48 +\item[gateway] Manually configured IP gateway. 14.49 +\item[hostname] Set the hostname for the virtual machine. 14.50 +\item[root] Specify the root device parameter on the kernel command 14.51 + line. 14.52 +\item[nfs\_server] IP address for the NFS server (if any). 14.53 +\item[nfs\_root] Path of the root filesystem on the NFS server (if 14.54 + any). 14.55 +\item[extra] Extra string to append to the kernel command line (if 14.56 + any) 14.57 +\item[restart] Three possible options: 14.58 + \begin{description} 14.59 + \item[always] Always restart the domain, no matter what its exit 14.60 + code is. 14.61 + \item[never] Never restart the domain. 14.62 + \item[onreboot] Restart the domain iff it requests reboot. 14.63 + \end{description} 14.64 +\end{description} 14.65 + 14.66 +For additional flexibility, it is also possible to include Python 14.67 +scripting commands in configuration files. An example of this is the 14.68 +\path{xmexample2} file, which uses Python code to handle the 14.69 +\path{vmid} variable. 14.70 + 14.71 + 14.72 +%\part{Advanced Topics} 14.73 + 14.74 + 14.75 +\section{Network Configuration} 14.76 + 14.77 +For many users, the default installation should work ``out of the 14.78 +box''. More complicated network setups, for instance with multiple 14.79 +Ethernet interfaces and/or existing bridging setups will require some 14.80 +special configuration. 14.81 + 14.82 +The purpose of this section is to describe the mechanisms provided by 14.83 +\xend\ to allow a flexible configuration for Xen's virtual networking. 14.84 + 14.85 +\subsection{Xen virtual network topology} 14.86 + 14.87 +Each domain network interface is connected to a virtual network 14.88 +interface in dom0 by a point to point link (effectively a ``virtual 14.89 +crossover cable''). These devices are named {\tt 14.90 + vif$<$domid$>$.$<$vifid$>$} (e.g.\ {\tt vif1.0} for the first 14.91 +interface in domain~1, {\tt vif3.1} for the second interface in 14.92 +domain~3). 14.93 + 14.94 +Traffic on these virtual interfaces is handled in domain~0 using 14.95 +standard Linux mechanisms for bridging, routing, rate limiting, etc. 14.96 +Xend calls on two shell scripts to perform initial configuration of 14.97 +the network and configuration of new virtual interfaces. By default, 14.98 +these scripts configure a single bridge for all the virtual 14.99 +interfaces. Arbitrary routing / bridging configurations can be 14.100 +configured by customizing the scripts, as described in the following 14.101 +section. 14.102 + 14.103 +\subsection{Xen networking scripts} 14.104 + 14.105 +Xen's virtual networking is configured by two shell scripts (by 14.106 +default \path{network} and \path{vif-bridge}). These are called 14.107 +automatically by \xend\ when certain events occur, with arguments to 14.108 +the scripts providing further contextual information. These scripts 14.109 +are found by default in \path{/etc/xen/scripts}. The names and 14.110 +locations of the scripts can be configured in 14.111 +\path{/etc/xen/xend-config.sxp}. 14.112 + 14.113 +\begin{description} 14.114 +\item[network:] This script is called whenever \xend\ is started or 14.115 + stopped to respectively initialize or tear down the Xen virtual 14.116 + network. In the default configuration initialization creates the 14.117 + bridge `xen-br0' and moves eth0 onto that bridge, modifying the 14.118 + routing accordingly. When \xend\ exits, it deletes the Xen bridge 14.119 + and removes eth0, restoring the normal IP and routing configuration. 14.120 + 14.121 + %% In configurations where the bridge already exists, this script 14.122 + %% could be replaced with a link to \path{/bin/true} (for instance). 14.123 + 14.124 +\item[vif-bridge:] This script is called for every domain virtual 14.125 + interface and can configure firewalling rules and add the vif to the 14.126 + appropriate bridge. By default, this adds and removes VIFs on the 14.127 + default Xen bridge. 14.128 +\end{description} 14.129 + 14.130 +For more complex network setups (e.g.\ where routing is required or 14.131 +integrate with existing bridges) these scripts may be replaced with 14.132 +customized variants for your site's preferred configuration. 14.133 + 14.134 +%% There are two possible types of privileges: IO privileges and 14.135 +%% administration privileges. 14.136 + 14.137 + 14.138 +\section{Driver Domain Configuration} 14.139 + 14.140 +I/O privileges can be assigned to allow a domain to directly access 14.141 +PCI devices itself. This is used to support driver domains. 14.142 + 14.143 +Setting back-end privileges is currently only supported in SXP format 14.144 +config files. To allow a domain to function as a back-end for others, 14.145 +somewhere within the {\tt vm} element of its configuration file must 14.146 +be a {\tt back-end} element of the form {\tt (back-end ({\em type}))} 14.147 +where {\tt \em type} may be either {\tt netif} or {\tt blkif}, 14.148 +according to the type of virtual device this domain will service. 14.149 +%% After this domain has been built, \xend will connect all new and 14.150 +%% existing {\em virtual} devices (of the appropriate type) to that 14.151 +%% back-end. 14.152 + 14.153 +Note that a block back-end cannot currently import virtual block 14.154 +devices from other domains, and a network back-end cannot import 14.155 +virtual network devices from other domains. Thus (particularly in the 14.156 +case of block back-ends, which cannot import a virtual block device as 14.157 +their root filesystem), you may need to boot a back-end domain from a 14.158 +ramdisk or a network device. 14.159 + 14.160 +Access to PCI devices may be configured on a per-device basis. Xen 14.161 +will assign the minimal set of hardware privileges to a domain that 14.162 +are required to control its devices. This can be configured in either 14.163 +format of configuration file: 14.164 + 14.165 +\begin{itemize} 14.166 +\item SXP Format: Include device elements of the form: \\ 14.167 + \centerline{ {\tt (device (pci (bus {\em x}) (dev {\em y}) (func {\em z})))}} \\ 14.168 + inside the top-level {\tt vm} element. Each one specifies the 14.169 + address of a device this domain is allowed to access --- the numbers 14.170 + \emph{x},\emph{y} and \emph{z} may be in either decimal or 14.171 + hexadecimal format. 14.172 +\item Flat Format: Include a list of PCI device addresses of the 14.173 + format: \\ 14.174 + \centerline{{\tt pci = ['x,y,z', \ldots]}} \\ 14.175 + where each element in the list is a string specifying the components 14.176 + of the PCI device address, separated by commas. The components 14.177 + ({\tt \em x}, {\tt \em y} and {\tt \em z}) of the list may be 14.178 + formatted as either decimal or hexadecimal. 14.179 +\end{itemize} 14.180 + 14.181 +%% \section{Administration Domains} 14.182 + 14.183 +%% Administration privileges allow a domain to use the `dom0 14.184 +%% operations' (so called because they are usually available only to 14.185 +%% domain 0). A privileged domain can build other domains, set 14.186 +%% scheduling parameters, etc. 14.187 + 14.188 +% Support for other administrative domains is not yet available... 14.189 +% perhaps we should plumb it in some time 14.190 + 14.191 + 14.192 +\section{Scheduler Configuration} 14.193 +\label{s:sched} 14.194 + 14.195 +Xen offers a boot time choice between multiple schedulers. To select 14.196 +a scheduler, pass the boot parameter \emph{sched=sched\_name} to Xen, 14.197 +substituting the appropriate scheduler name. Details of the 14.198 +schedulers and their parameters are included below; future versions of 14.199 +the tools will provide a higher-level interface to these tools. 14.200 + 14.201 +It is expected that system administrators configure their system to 14.202 +use the scheduler most appropriate to their needs. Currently, the BVT 14.203 +scheduler is the recommended choice. 14.204 + 14.205 +\subsection{Borrowed Virtual Time} 14.206 + 14.207 +{\tt sched=bvt} (the default) \\ 14.208 + 14.209 +BVT provides proportional fair shares of the CPU time. It has been 14.210 +observed to penalize domains that block frequently (e.g.\ I/O 14.211 +intensive domains), but this can be compensated for by using warping. 14.212 + 14.213 +\subsubsection{Global Parameters} 14.214 + 14.215 +\begin{description} 14.216 +\item[ctx\_allow] The context switch allowance is similar to the 14.217 + ``quantum'' in traditional schedulers. It is the minimum time that 14.218 + a scheduled domain will be allowed to run before being preempted. 14.219 +\end{description} 14.220 + 14.221 +\subsubsection{Per-domain parameters} 14.222 + 14.223 +\begin{description} 14.224 +\item[mcuadv] The MCU (Minimum Charging Unit) advance determines the 14.225 + proportional share of the CPU that a domain receives. It is set 14.226 + inversely proportionally to a domain's sharing weight. 14.227 +\item[warp] The amount of ``virtual time'' the domain is allowed to 14.228 + warp backwards. 14.229 +\item[warpl] The warp limit is the maximum time a domain can run 14.230 + warped for. 14.231 +\item[warpu] The unwarp requirement is the minimum time a domain must 14.232 + run unwarped for before it can warp again. 14.233 +\end{description} 14.234 + 14.235 +\subsection{Atropos} 14.236 + 14.237 +{\tt sched=atropos} \\ 14.238 + 14.239 +Atropos is a soft real time scheduler. It provides guarantees about 14.240 +absolute shares of the CPU, with a facility for sharing slack CPU time 14.241 +on a best-effort basis. It can provide timeliness guarantees for 14.242 +latency-sensitive domains. 14.243 + 14.244 +Every domain has an associated period and slice. The domain should 14.245 +receive `slice' nanoseconds every `period' nanoseconds. This allows 14.246 +the administrator to configure both the absolute share of the CPU a 14.247 +domain receives and the frequency with which it is scheduled. 14.248 + 14.249 +%% When domains unblock, their period is reduced to the value of the 14.250 +%% latency hint (the slice is scaled accordingly so that they still 14.251 +%% get the same proportion of the CPU). For each subsequent period, 14.252 +%% the slice and period times are doubled until they reach their 14.253 +%% original values. 14.254 + 14.255 +Note: don't over-commit the CPU when using Atropos (i.e.\ don't reserve 14.256 +more CPU than is available --- the utilization should be kept to 14.257 +slightly less than 100\% in order to ensure predictable behavior). 14.258 + 14.259 +\subsubsection{Per-domain parameters} 14.260 + 14.261 +\begin{description} 14.262 +\item[period] The regular time interval during which a domain is 14.263 + guaranteed to receive its allocation of CPU time. 14.264 +\item[slice] The length of time per period that a domain is guaranteed 14.265 + to run for (in the absence of voluntary yielding of the CPU). 14.266 +\item[latency] The latency hint is used to control how soon after 14.267 + waking up a domain it should be scheduled. 14.268 +\item[xtratime] This is a boolean flag that specifies whether a domain 14.269 + should be allowed a share of the system slack time. 14.270 +\end{description} 14.271 + 14.272 +\subsection{Round Robin} 14.273 + 14.274 +{\tt sched=rrobin} \\ 14.275 + 14.276 +The round robin scheduler is included as a simple demonstration of 14.277 +Xen's internal scheduler API. It is not intended for production use. 14.278 + 14.279 +\subsubsection{Global Parameters} 14.280 + 14.281 +\begin{description} 14.282 +\item[rr\_slice] The maximum time each domain runs before the next 14.283 + scheduling decision is made. 14.284 +\end{description}
15.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 15.2 +++ b/docs/src/user/domain_filesystem.tex Tue Sep 20 09:17:33 2005 +0000 15.3 @@ -0,0 +1,243 @@ 15.4 +\chapter{Domain Filesystem Storage} 15.5 + 15.6 +It is possible to directly export any Linux block device in dom0 to 15.7 +another domain, or to export filesystems / devices to virtual machines 15.8 +using standard network protocols (e.g.\ NBD, iSCSI, NFS, etc.). This 15.9 +chapter covers some of the possibilities. 15.10 + 15.11 + 15.12 +\section{Exporting Physical Devices as VBDs} 15.13 +\label{s:exporting-physical-devices-as-vbds} 15.14 + 15.15 +One of the simplest configurations is to directly export individual 15.16 +partitions from domain~0 to other domains. To achieve this use the 15.17 +\path{phy:} specifier in your domain configuration file. For example a 15.18 +line like 15.19 +\begin{quote} 15.20 + \verb_disk = ['phy:hda3,sda1,w']_ 15.21 +\end{quote} 15.22 +specifies that the partition \path{/dev/hda3} in domain~0 should be 15.23 +exported read-write to the new domain as \path{/dev/sda1}; one could 15.24 +equally well export it as \path{/dev/hda} or \path{/dev/sdb5} should 15.25 +one wish. 15.26 + 15.27 +In addition to local disks and partitions, it is possible to export 15.28 +any device that Linux considers to be ``a disk'' in the same manner. 15.29 +For example, if you have iSCSI disks or GNBD volumes imported into 15.30 +domain~0 you can export these to other domains using the \path{phy:} 15.31 +disk syntax. E.g.: 15.32 +\begin{quote} 15.33 + \verb_disk = ['phy:vg/lvm1,sda2,w']_ 15.34 +\end{quote} 15.35 + 15.36 +\begin{center} 15.37 + \framebox{\bf Warning: Block device sharing} 15.38 +\end{center} 15.39 +\begin{quote} 15.40 + Block devices should typically only be shared between domains in a 15.41 + read-only fashion otherwise the Linux kernel's file systems will get 15.42 + very confused as the file system structure may change underneath 15.43 + them (having the same ext3 partition mounted \path{rw} twice is a 15.44 + sure fire way to cause irreparable damage)! \Xend\ will attempt to 15.45 + prevent you from doing this by checking that the device is not 15.46 + mounted read-write in domain~0, and hasn't already been exported 15.47 + read-write to another domain. If you want read-write sharing, 15.48 + export the directory to other domains via NFS from domain~0 (or use 15.49 + a cluster file system such as GFS or ocfs2). 15.50 +\end{quote} 15.51 + 15.52 + 15.53 +\section{Using File-backed VBDs} 15.54 + 15.55 +It is also possible to use a file in Domain~0 as the primary storage 15.56 +for a virtual machine. As well as being convenient, this also has the 15.57 +advantage that the virtual block device will be \emph{sparse} --- 15.58 +space will only really be allocated as parts of the file are used. So 15.59 +if a virtual machine uses only half of its disk space then the file 15.60 +really takes up half of the size allocated. 15.61 + 15.62 +For example, to create a 2GB sparse file-backed virtual block device 15.63 +(actually only consumes 1KB of disk): 15.64 +\begin{quote} 15.65 + \verb_# dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1_ 15.66 +\end{quote} 15.67 + 15.68 +Make a file system in the disk file: 15.69 +\begin{quote} 15.70 + \verb_# mkfs -t ext3 vm1disk_ 15.71 +\end{quote} 15.72 + 15.73 +(when the tool asks for confirmation, answer `y') 15.74 + 15.75 +Populate the file system e.g.\ by copying from the current root: 15.76 +\begin{quote} 15.77 +\begin{verbatim} 15.78 +# mount -o loop vm1disk /mnt 15.79 +# cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt 15.80 +# mkdir /mnt/{proc,sys,home,tmp} 15.81 +\end{verbatim} 15.82 +\end{quote} 15.83 + 15.84 +Tailor the file system by editing \path{/etc/fstab}, 15.85 +\path{/etc/hostname}, etc.\ Don't forget to edit the files in the 15.86 +mounted file system, instead of your domain~0 filesystem, e.g.\ you 15.87 +would edit \path{/mnt/etc/fstab} instead of \path{/etc/fstab}. For 15.88 +this example put \path{/dev/sda1} to root in fstab. 15.89 + 15.90 +Now unmount (this is important!): 15.91 +\begin{quote} 15.92 + \verb_# umount /mnt_ 15.93 +\end{quote} 15.94 + 15.95 +In the configuration file set: 15.96 +\begin{quote} 15.97 + \verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_ 15.98 +\end{quote} 15.99 + 15.100 +As the virtual machine writes to its `disk', the sparse file will be 15.101 +filled in and consume more space up to the original 2GB. 15.102 + 15.103 +{\bf Note that file-backed VBDs may not be appropriate for backing 15.104 + I/O-intensive domains.} File-backed VBDs are known to experience 15.105 +substantial slowdowns under heavy I/O workloads, due to the I/O 15.106 +handling by the loopback block device used to support file-backed VBDs 15.107 +in dom0. Better I/O performance can be achieved by using either 15.108 +LVM-backed VBDs (Section~\ref{s:using-lvm-backed-vbds}) or physical 15.109 +devices as VBDs (Section~\ref{s:exporting-physical-devices-as-vbds}). 15.110 + 15.111 +Linux supports a maximum of eight file-backed VBDs across all domains 15.112 +by default. This limit can be statically increased by using the 15.113 +\emph{max\_loop} module parameter if CONFIG\_BLK\_DEV\_LOOP is 15.114 +compiled as a module in the dom0 kernel, or by using the 15.115 +\emph{max\_loop=n} boot option if CONFIG\_BLK\_DEV\_LOOP is compiled 15.116 +directly into the dom0 kernel. 15.117 + 15.118 + 15.119 +\section{Using LVM-backed VBDs} 15.120 +\label{s:using-lvm-backed-vbds} 15.121 + 15.122 +A particularly appealing solution is to use LVM volumes as backing for 15.123 +domain file-systems since this allows dynamic growing/shrinking of 15.124 +volumes as well as snapshot and other features. 15.125 + 15.126 +To initialize a partition to support LVM volumes: 15.127 +\begin{quote} 15.128 +\begin{verbatim} 15.129 +# pvcreate /dev/sda10 15.130 +\end{verbatim} 15.131 +\end{quote} 15.132 + 15.133 +Create a volume group named `vg' on the physical partition: 15.134 +\begin{quote} 15.135 +\begin{verbatim} 15.136 +# vgcreate vg /dev/sda10 15.137 +\end{verbatim} 15.138 +\end{quote} 15.139 + 15.140 +Create a logical volume of size 4GB named `myvmdisk1': 15.141 +\begin{quote} 15.142 +\begin{verbatim} 15.143 +# lvcreate -L4096M -n myvmdisk1 vg 15.144 +\end{verbatim} 15.145 +\end{quote} 15.146 + 15.147 +You should now see that you have a \path{/dev/vg/myvmdisk1} Make a 15.148 +filesystem, mount it and populate it, e.g.: 15.149 +\begin{quote} 15.150 +\begin{verbatim} 15.151 +# mkfs -t ext3 /dev/vg/myvmdisk1 15.152 +# mount /dev/vg/myvmdisk1 /mnt 15.153 +# cp -ax / /mnt 15.154 +# umount /mnt 15.155 +\end{verbatim} 15.156 +\end{quote} 15.157 + 15.158 +Now configure your VM with the following disk configuration: 15.159 +\begin{quote} 15.160 +\begin{verbatim} 15.161 + disk = [ 'phy:vg/myvmdisk1,sda1,w' ] 15.162 +\end{verbatim} 15.163 +\end{quote} 15.164 + 15.165 +LVM enables you to grow the size of logical volumes, but you'll need 15.166 +to resize the corresponding file system to make use of the new space. 15.167 +Some file systems (e.g.\ ext3) now support online resize. See the LVM 15.168 +manuals for more details. 15.169 + 15.170 +You can also use LVM for creating copy-on-write (CoW) clones of LVM 15.171 +volumes (known as writable persistent snapshots in LVM terminology). 15.172 +This facility is new in Linux 2.6.8, so isn't as stable as one might 15.173 +hope. In particular, using lots of CoW LVM disks consumes a lot of 15.174 +dom0 memory, and error conditions such as running out of disk space 15.175 +are not handled well. Hopefully this will improve in future. 15.176 + 15.177 +To create two copy-on-write clone of the above file system you would 15.178 +use the following commands: 15.179 + 15.180 +\begin{quote} 15.181 +\begin{verbatim} 15.182 +# lvcreate -s -L1024M -n myclonedisk1 /dev/vg/myvmdisk1 15.183 +# lvcreate -s -L1024M -n myclonedisk2 /dev/vg/myvmdisk1 15.184 +\end{verbatim} 15.185 +\end{quote} 15.186 + 15.187 +Each of these can grow to have 1GB of differences from the master 15.188 +volume. You can grow the amount of space for storing the differences 15.189 +using the lvextend command, e.g.: 15.190 +\begin{quote} 15.191 +\begin{verbatim} 15.192 +# lvextend +100M /dev/vg/myclonedisk1 15.193 +\end{verbatim} 15.194 +\end{quote} 15.195 + 15.196 +Don't let the `differences volume' ever fill up otherwise LVM gets 15.197 +rather confused. It may be possible to automate the growing process by 15.198 +using \path{dmsetup wait} to spot the volume getting full and then 15.199 +issue an \path{lvextend}. 15.200 + 15.201 +In principle, it is possible to continue writing to the volume that 15.202 +has been cloned (the changes will not be visible to the clones), but 15.203 +we wouldn't recommend this: have the cloned volume as a `pristine' 15.204 +file system install that isn't mounted directly by any of the virtual 15.205 +machines. 15.206 + 15.207 + 15.208 +\section{Using NFS Root} 15.209 + 15.210 +First, populate a root filesystem in a directory on the server 15.211 +machine. This can be on a distinct physical machine, or simply run 15.212 +within a virtual machine on the same node. 15.213 + 15.214 +Now configure the NFS server to export this filesystem over the 15.215 +network by adding a line to \path{/etc/exports}, for instance: 15.216 + 15.217 +\begin{quote} 15.218 + \begin{small} 15.219 +\begin{verbatim} 15.220 +/export/vm1root 1.2.3.4/24 (rw,sync,no_root_squash) 15.221 +\end{verbatim} 15.222 + \end{small} 15.223 +\end{quote} 15.224 + 15.225 +Finally, configure the domain to use NFS root. In addition to the 15.226 +normal variables, you should make sure to set the following values in 15.227 +the domain's configuration file: 15.228 + 15.229 +\begin{quote} 15.230 + \begin{small} 15.231 +\begin{verbatim} 15.232 +root = '/dev/nfs' 15.233 +nfs_server = '2.3.4.5' # substitute IP address of server 15.234 +nfs_root = '/path/to/root' # path to root FS on the server 15.235 +\end{verbatim} 15.236 + \end{small} 15.237 +\end{quote} 15.238 + 15.239 +The domain will need network access at boot time, so either statically 15.240 +configure an IP address using the config variables \path{ip}, 15.241 +\path{netmask}, \path{gateway}, \path{hostname}; or enable DHCP 15.242 +(\path{dhcp='dhcp'}). 15.243 + 15.244 +Note that the Linux NFS root implementation is known to have stability 15.245 +problems under high load (this is not a Xen-specific problem), so this 15.246 +configuration may not be appropriate for critical servers.
16.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 16.2 +++ b/docs/src/user/domain_mgmt.tex Tue Sep 20 09:17:33 2005 +0000 16.3 @@ -0,0 +1,203 @@ 16.4 +\chapter{Domain Management Tools} 16.5 + 16.6 +The previous chapter described a simple example of how to configure 16.7 +and start a domain. This chapter summarises the tools available to 16.8 +manage running domains. 16.9 + 16.10 + 16.11 +\section{Command-line Management} 16.12 + 16.13 +Command line management tasks are also performed using the \path{xm} 16.14 +tool. For online help for the commands available, type: 16.15 +\begin{quote} 16.16 + \verb_# xm help_ 16.17 +\end{quote} 16.18 + 16.19 +You can also type \path{xm help $<$command$>$} for more information on 16.20 +a given command. 16.21 + 16.22 +\subsection{Basic Management Commands} 16.23 + 16.24 +The most important \path{xm} commands are: 16.25 +\begin{quote} 16.26 + \verb_# xm list_: Lists all domains running.\\ 16.27 + \verb_# xm consoles_: Gives information about the domain consoles.\\ 16.28 + \verb_# xm console_: Opens a console to a domain (e.g.\ 16.29 + \verb_# xm console myVM_) 16.30 +\end{quote} 16.31 + 16.32 +\subsection{\tt xm list} 16.33 + 16.34 +The output of \path{xm list} is in rows of the following format: 16.35 +\begin{center} {\tt name domid memory cpu state cputime console} 16.36 +\end{center} 16.37 + 16.38 +\begin{quote} 16.39 + \begin{description} 16.40 + \item[name] The descriptive name of the virtual machine. 16.41 + \item[domid] The number of the domain ID this virtual machine is 16.42 + running in. 16.43 + \item[memory] Memory size in megabytes. 16.44 + \item[cpu] The CPU this domain is running on. 16.45 + \item[state] Domain state consists of 5 fields: 16.46 + \begin{description} 16.47 + \item[r] running 16.48 + \item[b] blocked 16.49 + \item[p] paused 16.50 + \item[s] shutdown 16.51 + \item[c] crashed 16.52 + \end{description} 16.53 + \item[cputime] How much CPU time (in seconds) the domain has used so 16.54 + far. 16.55 + \item[console] TCP port accepting connections to the domain's 16.56 + console. 16.57 + \end{description} 16.58 +\end{quote} 16.59 + 16.60 +The \path{xm list} command also supports a long output format when the 16.61 +\path{-l} switch is used. This outputs the fulls details of the 16.62 +running domains in \xend's SXP configuration format. 16.63 + 16.64 +For example, suppose the system is running the ttylinux domain as 16.65 +described earlier. The list command should produce output somewhat 16.66 +like the following: 16.67 +\begin{verbatim} 16.68 +# xm list 16.69 +Name Id Mem(MB) CPU State Time(s) Console 16.70 +Domain-0 0 251 0 r---- 172.2 16.71 +ttylinux 5 63 0 -b--- 3.0 9605 16.72 +\end{verbatim} 16.73 + 16.74 +Here we can see the details for the ttylinux domain, as well as for 16.75 +domain~0 (which, of course, is always running). Note that the console 16.76 +port for the ttylinux domain is 9605. This can be connected to by TCP 16.77 +using a terminal program (e.g. \path{telnet} or, better, 16.78 +\path{xencons}). The simplest way to connect is to use the 16.79 +\path{xm~console} command, specifying the domain name or ID. To 16.80 +connect to the console of the ttylinux domain, we could use any of the 16.81 +following: 16.82 +\begin{verbatim} 16.83 +# xm console ttylinux 16.84 +# xm console 5 16.85 +# xencons localhost 9605 16.86 +\end{verbatim} 16.87 + 16.88 +\section{Domain Save and Restore} 16.89 + 16.90 +The administrator of a Xen system may suspend a virtual machine's 16.91 +current state into a disk file in domain~0, allowing it to be resumed 16.92 +at a later time. 16.93 + 16.94 +The ttylinux domain described earlier can be suspended to disk using 16.95 +the command: 16.96 +\begin{verbatim} 16.97 +# xm save ttylinux ttylinux.xen 16.98 +\end{verbatim} 16.99 + 16.100 +This will stop the domain named `ttylinux' and save its current state 16.101 +into a file called \path{ttylinux.xen}. 16.102 + 16.103 +To resume execution of this domain, use the \path{xm restore} command: 16.104 +\begin{verbatim} 16.105 +# xm restore ttylinux.xen 16.106 +\end{verbatim} 16.107 + 16.108 +This will restore the state of the domain and restart it. The domain 16.109 +will carry on as before and the console may be reconnected using the 16.110 +\path{xm console} command, as above. 16.111 + 16.112 +\section{Live Migration} 16.113 + 16.114 +Live migration is used to transfer a domain between physical hosts 16.115 +whilst that domain continues to perform its usual activities --- from 16.116 +the user's perspective, the migration should be imperceptible. 16.117 + 16.118 +To perform a live migration, both hosts must be running Xen / \xend\ 16.119 +and the destination host must have sufficient resources (e.g.\ memory 16.120 +capacity) to accommodate the domain after the move. Furthermore we 16.121 +currently require both source and destination machines to be on the 16.122 +same L2 subnet. 16.123 + 16.124 +Currently, there is no support for providing automatic remote access 16.125 +to filesystems stored on local disk when a domain is migrated. 16.126 +Administrators should choose an appropriate storage solution (i.e.\ 16.127 +SAN, NAS, etc.) to ensure that domain filesystems are also available 16.128 +on their destination node. GNBD is a good method for exporting a 16.129 +volume from one machine to another. iSCSI can do a similar job, but is 16.130 +more complex to set up. 16.131 + 16.132 +When a domain migrates, it's MAC and IP address move with it, thus it 16.133 +is only possible to migrate VMs within the same layer-2 network and IP 16.134 +subnet. If the destination node is on a different subnet, the 16.135 +administrator would need to manually configure a suitable etherip or 16.136 +IP tunnel in the domain~0 of the remote node. 16.137 + 16.138 +A domain may be migrated using the \path{xm migrate} command. To live 16.139 +migrate a domain to another machine, we would use the command: 16.140 + 16.141 +\begin{verbatim} 16.142 +# xm migrate --live mydomain destination.ournetwork.com 16.143 +\end{verbatim} 16.144 + 16.145 +Without the \path{--live} flag, \xend\ simply stops the domain and 16.146 +copies the memory image over to the new node and restarts it. Since 16.147 +domains can have large allocations this can be quite time consuming, 16.148 +even on a Gigabit network. With the \path{--live} flag \xend\ attempts 16.149 +to keep the domain running while the migration is in progress, 16.150 +resulting in typical `downtimes' of just 60--300ms. 16.151 + 16.152 +For now it will be necessary to reconnect to the domain's console on 16.153 +the new machine using the \path{xm console} command. If a migrated 16.154 +domain has any open network connections then they will be preserved, 16.155 +so SSH connections do not have this limitation. 16.156 + 16.157 + 16.158 +\section{Managing Domain Memory} 16.159 + 16.160 +XenLinux domains have the ability to relinquish / reclaim machine 16.161 +memory at the request of the administrator or the user of the domain. 16.162 + 16.163 +\subsection{Setting memory footprints from dom0} 16.164 + 16.165 +The machine administrator can request that a domain alter its memory 16.166 +footprint using the \path{xm set-mem} command. For instance, we can 16.167 +request that our example ttylinux domain reduce its memory footprint 16.168 +to 32 megabytes. 16.169 + 16.170 +\begin{verbatim} 16.171 +# xm set-mem ttylinux 32 16.172 +\end{verbatim} 16.173 + 16.174 +We can now see the result of this in the output of \path{xm list}: 16.175 + 16.176 +\begin{verbatim} 16.177 +# xm list 16.178 +Name Id Mem(MB) CPU State Time(s) Console 16.179 +Domain-0 0 251 0 r---- 172.2 16.180 +ttylinux 5 31 0 -b--- 4.3 9605 16.181 +\end{verbatim} 16.182 + 16.183 +The domain has responded to the request by returning memory to Xen. We 16.184 +can restore the domain to its original size using the command line: 16.185 + 16.186 +\begin{verbatim} 16.187 +# xm set-mem ttylinux 64 16.188 +\end{verbatim} 16.189 + 16.190 +\subsection{Setting memory footprints from within a domain} 16.191 + 16.192 +The virtual file \path{/proc/xen/balloon} allows the owner of a domain 16.193 +to adjust their own memory footprint. Reading the file (e.g.\ 16.194 +\path{cat /proc/xen/balloon}) prints out the current memory footprint 16.195 +of the domain. Writing the file (e.g.\ \path{echo new\_target > 16.196 + /proc/xen/balloon}) requests that the kernel adjust the domain's 16.197 +memory footprint to a new value. 16.198 + 16.199 +\subsection{Setting memory limits} 16.200 + 16.201 +Xen associates a memory size limit with each domain. By default, this 16.202 +is the amount of memory the domain is originally started with, 16.203 +preventing the domain from ever growing beyond this size. To permit a 16.204 +domain to grow beyond its original allocation or to prevent a domain 16.205 +you've shrunk from reclaiming the memory it relinquished, use the 16.206 +\path{xm maxmem} command.
17.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 17.2 +++ b/docs/src/user/glossary.tex Tue Sep 20 09:17:33 2005 +0000 17.3 @@ -0,0 +1,79 @@ 17.4 +\chapter{Glossary of Terms} 17.5 + 17.6 +\begin{description} 17.7 + 17.8 +\item[Atropos] One of the CPU schedulers provided by Xen. Atropos 17.9 + provides domains with absolute shares of the CPU, with timeliness 17.10 + guarantees and a mechanism for sharing out `slack time'. 17.11 + 17.12 +\item[BVT] The BVT scheduler is used to give proportional fair shares 17.13 + of the CPU to domains. 17.14 + 17.15 +\item[Exokernel] A minimal piece of privileged code, similar to a {\bf 17.16 + microkernel} but providing a more `hardware-like' interface to the 17.17 + tasks it manages. This is similar to a paravirtualising VMM like 17.18 + {\bf Xen} but was designed as a new operating system structure, 17.19 + rather than specifically to run multiple conventional OSs. 17.20 + 17.21 +\item[Domain] A domain is the execution context that contains a 17.22 + running {\bf virtual machine}. The relationship between virtual 17.23 + machines and domains on Xen is similar to that between programs and 17.24 + processes in an operating system: a virtual machine is a persistent 17.25 + entity that resides on disk (somewhat like a program). When it is 17.26 + loaded for execution, it runs in a domain. Each domain has a {\bf 17.27 + domain ID}. 17.28 + 17.29 +\item[Domain 0] The first domain to be started on a Xen machine. 17.30 + Domain 0 is responsible for managing the system. 17.31 + 17.32 +\item[Domain ID] A unique identifier for a {\bf domain}, analogous to 17.33 + a process ID in an operating system. 17.34 + 17.35 +\item[Full virtualisation] An approach to virtualisation which 17.36 + requires no modifications to the hosted operating system, providing 17.37 + the illusion of a complete system of real hardware devices. 17.38 + 17.39 +\item[Hypervisor] An alternative term for {\bf VMM}, used because it 17.40 + means `beyond supervisor', since it is responsible for managing 17.41 + multiple `supervisor' kernels. 17.42 + 17.43 +\item[Live migration] A technique for moving a running virtual machine 17.44 + to another physical host, without stopping it or the services 17.45 + running on it. 17.46 + 17.47 +\item[Microkernel] A small base of code running at the highest 17.48 + hardware privilege level. A microkernel is responsible for sharing 17.49 + CPU and memory (and sometimes other devices) between less privileged 17.50 + tasks running on the system. This is similar to a VMM, particularly 17.51 + a {\bf paravirtualising} VMM but typically addressing a different 17.52 + problem space and providing different kind of interface. 17.53 + 17.54 +\item[NetBSD/Xen] A port of NetBSD to the Xen architecture. 17.55 + 17.56 +\item[Paravirtualisation] An approach to virtualisation which requires 17.57 + modifications to the operating system in order to run in a virtual 17.58 + machine. Xen uses paravirtualisation but preserves binary 17.59 + compatibility for user space applications. 17.60 + 17.61 +\item[Shadow pagetables] A technique for hiding the layout of machine 17.62 + memory from a virtual machine's operating system. Used in some {\bf 17.63 + VMMs} to provide the illusion of contiguous physical memory, in 17.64 + Xen this is used during {\bf live migration}. 17.65 + 17.66 +\item[Virtual Machine] The environment in which a hosted operating 17.67 + system runs, providing the abstraction of a dedicated machine. A 17.68 + virtual machine may be identical to the underlying hardware (as in 17.69 + {\bf full virtualisation}, or it may differ, as in {\bf 17.70 + paravirtualisation}). 17.71 + 17.72 +\item[VMM] Virtual Machine Monitor - the software that allows multiple 17.73 + virtual machines to be multiplexed on a single physical machine. 17.74 + 17.75 +\item[Xen] Xen is a paravirtualising virtual machine monitor, 17.76 + developed primarily by the Systems Research Group at the University 17.77 + of Cambridge Computer Laboratory. 17.78 + 17.79 +\item[XenLinux] Official name for the port of the Linux kernel that 17.80 + runs on Xen. 17.81 + 17.82 +\end{description}
18.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 18.2 +++ b/docs/src/user/installation.tex Tue Sep 20 09:17:33 2005 +0000 18.3 @@ -0,0 +1,394 @@ 18.4 +\chapter{Installation} 18.5 + 18.6 +The Xen distribution includes three main components: Xen itself, ports 18.7 +of Linux 2.4 and 2.6 and NetBSD to run on Xen, and the userspace 18.8 +tools required to manage a Xen-based system. This chapter describes 18.9 +how to install the Xen~2.0 distribution from source. Alternatively, 18.10 +there may be pre-built packages available as part of your operating 18.11 +system distribution. 18.12 + 18.13 + 18.14 +\section{Prerequisites} 18.15 +\label{sec:prerequisites} 18.16 + 18.17 +The following is a full list of prerequisites. Items marked `$\dag$' 18.18 +are required by the \xend\ control tools, and hence required if you 18.19 +want to run more than one virtual machine; items marked `$*$' are only 18.20 +required if you wish to build from source. 18.21 +\begin{itemize} 18.22 +\item A working Linux distribution using the GRUB bootloader and 18.23 + running on a P6-class (or newer) CPU. 18.24 +\item [$\dag$] The \path{iproute2} package. 18.25 +\item [$\dag$] The Linux bridge-utils\footnote{Available from {\tt 18.26 + http://bridge.sourceforge.net}} (e.g., \path{/sbin/brctl}) 18.27 +\item [$\dag$] An installation of Twisted~v1.3 or 18.28 + above\footnote{Available from {\tt http://www.twistedmatrix.com}}. 18.29 + There may be a binary package available for your distribution; 18.30 + alternatively it can be installed by running `{\sl make 18.31 + install-twisted}' in the root of the Xen source tree. 18.32 +\item [$*$] Build tools (gcc v3.2.x or v3.3.x, binutils, GNU make). 18.33 +\item [$*$] Development installation of libcurl (e.g., libcurl-devel) 18.34 +\item [$*$] Development installation of zlib (e.g., zlib-dev). 18.35 +\item [$*$] Development installation of Python v2.2 or later (e.g., 18.36 + python-dev). 18.37 +\item [$*$] \LaTeX\ and transfig are required to build the 18.38 + documentation. 18.39 +\end{itemize} 18.40 + 18.41 +Once you have satisfied the relevant prerequisites, you can now 18.42 +install either a binary or source distribution of Xen. 18.43 + 18.44 + 18.45 +\section{Installing from Binary Tarball} 18.46 + 18.47 +Pre-built tarballs are available for download from the Xen download 18.48 +page 18.49 +\begin{quote} {\tt http://xen.sf.net} 18.50 +\end{quote} 18.51 + 18.52 +Once you've downloaded the tarball, simply unpack and install: 18.53 +\begin{verbatim} 18.54 +# tar zxvf xen-2.0-install.tgz 18.55 +# cd xen-2.0-install 18.56 +# sh ./install.sh 18.57 +\end{verbatim} 18.58 + 18.59 +Once you've installed the binaries you need to configure your system 18.60 +as described in Section~\ref{s:configure}. 18.61 + 18.62 + 18.63 +\section{Installing from Source} 18.64 + 18.65 +This section describes how to obtain, build, and install Xen from 18.66 +source. 18.67 + 18.68 +\subsection{Obtaining the Source} 18.69 + 18.70 +The Xen source tree is available as either a compressed source tar 18.71 +ball or as a clone of our master BitKeeper repository. 18.72 + 18.73 +\begin{description} 18.74 +\item[Obtaining the Source Tarball]\mbox{} \\ 18.75 + Stable versions (and daily snapshots) of the Xen source tree are 18.76 + available as compressed tarballs from the Xen download page 18.77 + \begin{quote} {\tt http://xen.sf.net} 18.78 + \end{quote} 18.79 + 18.80 +\item[Using BitKeeper]\mbox{} \\ 18.81 + If you wish to install Xen from a clone of our latest BitKeeper 18.82 + repository then you will need to install the BitKeeper tools. 18.83 + Download instructions for BitKeeper can be obtained by filling out 18.84 + the form at: 18.85 + \begin{quote} {\tt http://www.bitmover.com/cgi-bin/download.cgi} 18.86 +\end{quote} 18.87 +The public master BK repository for the 2.0 release lives at: 18.88 +\begin{quote} {\tt bk://xen.bkbits.net/xen-2.0.bk} 18.89 +\end{quote} 18.90 +You can use BitKeeper to download it and keep it updated with the 18.91 +latest features and fixes. 18.92 + 18.93 +Change to the directory in which you want to put the source code, then 18.94 +run: 18.95 +\begin{verbatim} 18.96 +# bk clone bk://xen.bkbits.net/xen-2.0.bk 18.97 +\end{verbatim} 18.98 + 18.99 +Under your current directory, a new directory named \path{xen-2.0.bk} 18.100 +has been created, which contains all the source code for Xen, the OS 18.101 +ports, and the control tools. You can update your repository with the 18.102 +latest changes at any time by running: 18.103 +\begin{verbatim} 18.104 +# cd xen-2.0.bk # to change into the local repository 18.105 +# bk pull # to update the repository 18.106 +\end{verbatim} 18.107 +\end{description} 18.108 + 18.109 +% \section{The distribution} 18.110 +% 18.111 +% The Xen source code repository is structured as follows: 18.112 +% 18.113 +% \begin{description} 18.114 +% \item[\path{tools/}] Xen node controller daemon (Xend), command line 18.115 +% tools, control libraries 18.116 +% \item[\path{xen/}] The Xen VMM. 18.117 +% \item[\path{linux-*-xen-sparse/}] Xen support for Linux. 18.118 +% \item[\path{linux-*-patches/}] Experimental patches for Linux. 18.119 +% \item[\path{netbsd-*-xen-sparse/}] Xen support for NetBSD. 18.120 +% \item[\path{docs/}] Various documentation files for users and 18.121 +% developers. 18.122 +% \item[\path{extras/}] Bonus extras. 18.123 +% \end{description} 18.124 + 18.125 +\subsection{Building from Source} 18.126 + 18.127 +The top-level Xen Makefile includes a target `world' that will do the 18.128 +following: 18.129 + 18.130 +\begin{itemize} 18.131 +\item Build Xen. 18.132 +\item Build the control tools, including \xend. 18.133 +\item Download (if necessary) and unpack the Linux 2.6 source code, 18.134 + and patch it for use with Xen. 18.135 +\item Build a Linux kernel to use in domain 0 and a smaller 18.136 + unprivileged kernel, which can optionally be used for unprivileged 18.137 + virtual machines. 18.138 +\end{itemize} 18.139 + 18.140 +After the build has completed you should have a top-level directory 18.141 +called \path{dist/} in which all resulting targets will be placed; of 18.142 +particular interest are the two kernels XenLinux kernel images, one 18.143 +with a `-xen0' extension which contains hardware device drivers and 18.144 +drivers for Xen's virtual devices, and one with a `-xenU' extension 18.145 +that just contains the virtual ones. These are found in 18.146 +\path{dist/install/boot/} along with the image for Xen itself and the 18.147 +configuration files used during the build. 18.148 + 18.149 +The NetBSD port can be built using: 18.150 +\begin{quote} 18.151 +\begin{verbatim} 18.152 +# make netbsd20 18.153 +\end{verbatim} 18.154 +\end{quote} 18.155 +NetBSD port is built using a snapshot of the netbsd-2-0 cvs branch. 18.156 +The snapshot is downloaded as part of the build process, if it is not 18.157 +yet present in the \path{NETBSD\_SRC\_PATH} search path. The build 18.158 +process also downloads a toolchain which includes all the tools 18.159 +necessary to build the NetBSD kernel under Linux. 18.160 + 18.161 +To customize further the set of kernels built you need to edit the 18.162 +top-level Makefile. Look for the line: 18.163 + 18.164 +\begin{quote} 18.165 +\begin{verbatim} 18.166 +KERNELS ?= mk.linux-2.6-xen0 mk.linux-2.6-xenU 18.167 +\end{verbatim} 18.168 +\end{quote} 18.169 + 18.170 +You can edit this line to include any set of operating system kernels 18.171 +which have configurations in the top-level \path{buildconfigs/} 18.172 +directory, for example \path{mk.linux-2.4-xenU} to build a Linux 2.4 18.173 +kernel containing only virtual device drivers. 18.174 + 18.175 +%% Inspect the Makefile if you want to see what goes on during a 18.176 +%% build. Building Xen and the tools is straightforward, but XenLinux 18.177 +%% is more complicated. The makefile needs a `pristine' Linux kernel 18.178 +%% tree to which it will then add the Xen architecture files. You can 18.179 +%% tell the makefile the location of the appropriate Linux compressed 18.180 +%% tar file by 18.181 +%% setting the LINUX\_SRC environment variable, e.g. \\ 18.182 +%% \verb!# LINUX_SRC=/tmp/linux-2.6.11.tar.bz2 make world! \\ or by 18.183 +%% placing the tar file somewhere in the search path of {\tt 18.184 +%% LINUX\_SRC\_PATH} which defaults to `{\tt .:..}'. If the 18.185 +%% makefile can't find a suitable kernel tar file it attempts to 18.186 +%% download it from kernel.org (this won't work if you're behind a 18.187 +%% firewall). 18.188 + 18.189 +%% After untaring the pristine kernel tree, the makefile uses the {\tt 18.190 +%% mkbuildtree} script to add the Xen patches to the kernel. 18.191 + 18.192 + 18.193 +%% The procedure is similar to build the Linux 2.4 port: \\ 18.194 +%% \verb!# LINUX_SRC=/path/to/linux2.4/source make linux24! 18.195 + 18.196 + 18.197 +%% \framebox{\parbox{5in}{ 18.198 +%% {\bf Distro specific:} \\ 18.199 +%% {\it Gentoo} --- if not using udev (most installations, 18.200 +%% currently), you'll need to enable devfs and devfs mount at boot 18.201 +%% time in the xen0 config. }} 18.202 + 18.203 +\subsection{Custom XenLinux Builds} 18.204 + 18.205 +% If you have an SMP machine you may wish to give the {\tt '-j4'} 18.206 +% argument to make to get a parallel build. 18.207 + 18.208 +If you wish to build a customized XenLinux kernel (e.g. to support 18.209 +additional devices or enable distribution-required features), you can 18.210 +use the standard Linux configuration mechanisms, specifying that the 18.211 +architecture being built for is \path{xen}, e.g: 18.212 +\begin{quote} 18.213 +\begin{verbatim} 18.214 +# cd linux-2.6.11-xen0 18.215 +# make ARCH=xen xconfig 18.216 +# cd .. 18.217 +# make 18.218 +\end{verbatim} 18.219 +\end{quote} 18.220 + 18.221 +You can also copy an existing Linux configuration (\path{.config}) 18.222 +into \path{linux-2.6.11-xen0} and execute: 18.223 +\begin{quote} 18.224 +\begin{verbatim} 18.225 +# make ARCH=xen oldconfig 18.226 +\end{verbatim} 18.227 +\end{quote} 18.228 + 18.229 +You may be prompted with some Xen-specific options; we advise 18.230 +accepting the defaults for these options. 18.231 + 18.232 +Note that the only difference between the two types of Linux kernel 18.233 +that are built is the configuration file used for each. The `U' 18.234 +suffixed (unprivileged) versions don't contain any of the physical 18.235 +hardware device drivers, leading to a 30\% reduction in size; hence 18.236 +you may prefer these for your non-privileged domains. The `0' 18.237 +suffixed privileged versions can be used to boot the system, as well 18.238 +as in driver domains and unprivileged domains. 18.239 + 18.240 +\subsection{Installing the Binaries} 18.241 + 18.242 +The files produced by the build process are stored under the 18.243 +\path{dist/install/} directory. To install them in their default 18.244 +locations, do: 18.245 +\begin{quote} 18.246 +\begin{verbatim} 18.247 +# make install 18.248 +\end{verbatim} 18.249 +\end{quote} 18.250 + 18.251 +Alternatively, users with special installation requirements may wish 18.252 +to install them manually by copying the files to their appropriate 18.253 +destinations. 18.254 + 18.255 +%% Files in \path{install/boot/} include: 18.256 +%% \begin{itemize} 18.257 +%% \item \path{install/boot/xen-2.0.gz} Link to the Xen 'kernel' 18.258 +%% \item \path{install/boot/vmlinuz-2.6-xen0} Link to domain 0 18.259 +%% XenLinux kernel 18.260 +%% \item \path{install/boot/vmlinuz-2.6-xenU} Link to unprivileged 18.261 +%% XenLinux kernel 18.262 +%% \end{itemize} 18.263 + 18.264 +The \path{dist/install/boot} directory will also contain the config 18.265 +files used for building the XenLinux kernels, and also versions of Xen 18.266 +and XenLinux kernels that contain debug symbols (\path{xen-syms-2.0.6} 18.267 +and \path{vmlinux-syms-2.6.11.11-xen0}) which are essential for 18.268 +interpreting crash dumps. Retain these files as the developers may 18.269 +wish to see them if you post on the mailing list. 18.270 + 18.271 + 18.272 +\section{Configuration} 18.273 +\label{s:configure} 18.274 + 18.275 +Once you have built and installed the Xen distribution, it is simple 18.276 +to prepare the machine for booting and running Xen. 18.277 + 18.278 +\subsection{GRUB Configuration} 18.279 + 18.280 +An entry should be added to \path{grub.conf} (often found under 18.281 +\path{/boot/} or \path{/boot/grub/}) to allow Xen / XenLinux to boot. 18.282 +This file is sometimes called \path{menu.lst}, depending on your 18.283 +distribution. The entry should look something like the following: 18.284 + 18.285 +{\small 18.286 +\begin{verbatim} 18.287 +title Xen 2.0 / XenLinux 2.6 18.288 + kernel /boot/xen-2.0.gz dom0_mem=131072 18.289 + module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro console=tty0 18.290 +\end{verbatim} 18.291 +} 18.292 + 18.293 +The kernel line tells GRUB where to find Xen itself and what boot 18.294 +parameters should be passed to it (in this case, setting domain 0's 18.295 +memory allocation in kilobytes and the settings for the serial port). 18.296 +For more details on the various Xen boot parameters see 18.297 +Section~\ref{s:xboot}. 18.298 + 18.299 +The module line of the configuration describes the location of the 18.300 +XenLinux kernel that Xen should start and the parameters that should 18.301 +be passed to it (these are standard Linux parameters, identifying the 18.302 +root device and specifying it be initially mounted read only and 18.303 +instructing that console output be sent to the screen). Some 18.304 +distributions such as SuSE do not require the \path{ro} parameter. 18.305 + 18.306 +%% \framebox{\parbox{5in}{ 18.307 +%% {\bf Distro specific:} \\ 18.308 +%% {\it SuSE} --- Omit the {\tt ro} option from the XenLinux 18.309 +%% kernel command line, since the partition won't be remounted rw 18.310 +%% during boot. }} 18.311 + 18.312 + 18.313 +If you want to use an initrd, just add another \path{module} line to 18.314 +the configuration, as usual: 18.315 + 18.316 +{\small 18.317 +\begin{verbatim} 18.318 + module /boot/my_initrd.gz 18.319 +\end{verbatim} 18.320 +} 18.321 + 18.322 +As always when installing a new kernel, it is recommended that you do 18.323 +not delete existing menu options from \path{menu.lst} --- you may want 18.324 +to boot your old Linux kernel in future, particularly if you have 18.325 +problems. 18.326 + 18.327 +\subsection{Serial Console (optional)} 18.328 + 18.329 +%% kernel /boot/xen-2.0.gz dom0_mem=131072 com1=115200,8n1 18.330 +%% module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro 18.331 + 18.332 + 18.333 +In order to configure Xen serial console output, it is necessary to 18.334 +add an boot option to your GRUB config; e.g.\ replace the above kernel 18.335 +line with: 18.336 +\begin{quote} 18.337 +{\small 18.338 +\begin{verbatim} 18.339 + kernel /boot/xen.gz dom0_mem=131072 com1=115200,8n1 18.340 +\end{verbatim}} 18.341 +\end{quote} 18.342 + 18.343 +This configures Xen to output on COM1 at 115,200 baud, 8 data bits, 1 18.344 +stop bit and no parity. Modify these parameters for your set up. 18.345 + 18.346 +One can also configure XenLinux to share the serial console; to 18.347 +achieve this append ``\path{console=ttyS0}'' to your module line. 18.348 + 18.349 +If you wish to be able to log in over the XenLinux serial console it 18.350 +is necessary to add a line into \path{/etc/inittab}, just as per 18.351 +regular Linux. Simply add the line: 18.352 +\begin{quote} {\small {\tt c:2345:respawn:/sbin/mingetty ttyS0}} 18.353 +\end{quote} 18.354 + 18.355 +and you should be able to log in. Note that to successfully log in as 18.356 +root over the serial line will require adding \path{ttyS0} to 18.357 +\path{/etc/securetty} in most modern distributions. 18.358 + 18.359 +\subsection{TLS Libraries} 18.360 + 18.361 +Users of the XenLinux 2.6 kernel should disable Thread Local Storage 18.362 +(e.g.\ by doing a \path{mv /lib/tls /lib/tls.disabled}) before 18.363 +attempting to run with a XenLinux kernel\footnote{If you boot without 18.364 + first disabling TLS, you will get a warning message during the boot 18.365 + process. In this case, simply perform the rename after the machine 18.366 + is up and then run \texttt{/sbin/ldconfig} to make it take effect.}. 18.367 +You can always reenable it by restoring the directory to its original 18.368 +location (i.e.\ \path{mv /lib/tls.disabled /lib/tls}). 18.369 + 18.370 +The reason for this is that the current TLS implementation uses 18.371 +segmentation in a way that is not permissible under Xen. If TLS is 18.372 +not disabled, an emulation mode is used within Xen which reduces 18.373 +performance substantially. 18.374 + 18.375 +We hope that this issue can be resolved by working with Linux 18.376 +distribution vendors to implement a minor backward-compatible change 18.377 +to the TLS library. 18.378 + 18.379 + 18.380 +\section{Booting Xen} 18.381 + 18.382 +It should now be possible to restart the system and use Xen. Reboot 18.383 +as usual but choose the new Xen option when the Grub screen appears. 18.384 + 18.385 +What follows should look much like a conventional Linux boot. The 18.386 +first portion of the output comes from Xen itself, supplying low level 18.387 +information about itself and the machine it is running on. The 18.388 +following portion of the output comes from XenLinux. 18.389 + 18.390 +You may see some errors during the XenLinux boot. These are not 18.391 +necessarily anything to worry about --- they may result from kernel 18.392 +configuration differences between your XenLinux kernel and the one you 18.393 +usually use. 18.394 + 18.395 +When the boot completes, you should be able to log into your system as 18.396 +usual. If you are unable to log in to your system running Xen, you 18.397 +should still be able to reboot with your normal Linux kernel.
19.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 19.2 +++ b/docs/src/user/introduction.tex Tue Sep 20 09:17:33 2005 +0000 19.3 @@ -0,0 +1,143 @@ 19.4 +\chapter{Introduction} 19.5 + 19.6 + 19.7 +Xen is a \emph{paravirtualising} virtual machine monitor (VMM), or 19.8 +`hypervisor', for the x86 processor architecture. Xen can securely 19.9 +execute multiple virtual machines on a single physical system with 19.10 +close-to-native performance. The virtual machine technology 19.11 +facilitates enterprise-grade functionality, including: 19.12 + 19.13 +\begin{itemize} 19.14 +\item Virtual machines with performance close to native hardware. 19.15 +\item Live migration of running virtual machines between physical 19.16 + hosts. 19.17 +\item Excellent hardware support (supports most Linux device drivers). 19.18 +\item Sandboxed, re-startable device drivers. 19.19 +\end{itemize} 19.20 + 19.21 +Paravirtualisation permits very high performance virtualisation, even 19.22 +on architectures like x86 that are traditionally very hard to 19.23 +virtualise. 19.24 + 19.25 +The drawback of this approach is that it requires operating systems to 19.26 +be \emph{ported} to run on Xen. Porting an OS to run on Xen is 19.27 +similar to supporting a new hardware platform, however the process is 19.28 +simplified because the paravirtual machine architecture is very 19.29 +similar to the underlying native hardware. Even though operating 19.30 +system kernels must explicitly support Xen, a key feature is that user 19.31 +space applications and libraries \emph{do not} require modification. 19.32 + 19.33 +Xen support is available for increasingly many operating systems: 19.34 +right now, Linux 2.4, Linux 2.6 and NetBSD are available for Xen 2.0. 19.35 +A FreeBSD port is undergoing testing and will be incorporated into the 19.36 +release soon. Other OS ports, including Plan 9, are in progress. We 19.37 +hope that that arch-xen patches will be incorporated into the 19.38 +mainstream releases of these operating systems in due course (as has 19.39 +already happened for NetBSD). 19.40 + 19.41 +Possible usage scenarios for Xen include: 19.42 + 19.43 +\begin{description} 19.44 +\item [Kernel development.] Test and debug kernel modifications in a 19.45 + sandboxed virtual machine --- no need for a separate test machine. 19.46 +\item [Multiple OS configurations.] Run multiple operating systems 19.47 + simultaneously, for instance for compatibility or QA purposes. 19.48 +\item [Server consolidation.] Move multiple servers onto a single 19.49 + physical host with performance and fault isolation provided at 19.50 + virtual machine boundaries. 19.51 +\item [Cluster computing.] Management at VM granularity provides more 19.52 + flexibility than separately managing each physical host, but better 19.53 + control and isolation than single-system image solutions, 19.54 + particularly by using live migration for load balancing. 19.55 +\item [Hardware support for custom OSes.] Allow development of new 19.56 + OSes while benefiting from the wide-ranging hardware support of 19.57 + existing OSes such as Linux. 19.58 +\end{description} 19.59 + 19.60 + 19.61 +\section{Structure of a Xen-Based System} 19.62 + 19.63 +A Xen system has multiple layers, the lowest and most privileged of 19.64 +which is Xen itself. 19.65 + 19.66 +Xen in turn may host multiple \emph{guest} operating systems, each of 19.67 +which is executed within a secure virtual machine (in Xen terminology, 19.68 +a \emph{domain}). Domains are scheduled by Xen to make effective use 19.69 +of the available physical CPUs. Each guest OS manages its own 19.70 +applications, which includes responsibility for scheduling each 19.71 +application within the time allotted to the VM by Xen. 19.72 + 19.73 +The first domain, \emph{domain 0}, is created automatically when the 19.74 +system boots and has special management privileges. Domain 0 builds 19.75 +other domains and manages their virtual devices. It also performs 19.76 +administrative tasks such as suspending, resuming and migrating other 19.77 +virtual machines. 19.78 + 19.79 +Within domain 0, a process called \emph{xend} runs to manage the 19.80 +system. \Xend is responsible for managing virtual machines and 19.81 +providing access to their consoles. Commands are issued to \xend over 19.82 +an HTTP interface, either from a command-line tool or from a web 19.83 +browser. 19.84 + 19.85 + 19.86 +\section{Hardware Support} 19.87 + 19.88 +Xen currently runs only on the x86 architecture, requiring a `P6' or 19.89 +newer processor (e.g. Pentium Pro, Celeron, Pentium II, Pentium III, 19.90 +Pentium IV, Xeon, AMD Athlon, AMD Duron). Multiprocessor machines are 19.91 +supported, and we also have basic support for HyperThreading (SMT), 19.92 +although this remains a topic for ongoing research. A port 19.93 +specifically for x86/64 is in progress, although Xen already runs on 19.94 +such systems in 32-bit legacy mode. In addition a port to the IA64 19.95 +architecture is approaching completion. We hope to add other 19.96 +architectures such as PPC and ARM in due course. 19.97 + 19.98 +Xen can currently use up to 4GB of memory. It is possible for x86 19.99 +machines to address up to 64GB of physical memory but there are no 19.100 +current plans to support these systems: The x86/64 port is the planned 19.101 +route to supporting larger memory sizes. 19.102 + 19.103 +Xen offloads most of the hardware support issues to the guest OS 19.104 +running in Domain~0. Xen itself contains only the code required to 19.105 +detect and start secondary processors, set up interrupt routing, and 19.106 +perform PCI bus enumeration. Device drivers run within a privileged 19.107 +guest OS rather than within Xen itself. This approach provides 19.108 +compatibility with the majority of device hardware supported by Linux. 19.109 +The default XenLinux build contains support for relatively modern 19.110 +server-class network and disk hardware, but you can add support for 19.111 +other hardware by configuring your XenLinux kernel in the normal way. 19.112 + 19.113 + 19.114 +\section{History} 19.115 + 19.116 +Xen was originally developed by the Systems Research Group at the 19.117 +University of Cambridge Computer Laboratory as part of the XenoServers 19.118 +project, funded by the UK-EPSRC. 19.119 + 19.120 +XenoServers aim to provide a `public infrastructure for global 19.121 +distributed computing', and Xen plays a key part in that, allowing us 19.122 +to efficiently partition a single machine to enable multiple 19.123 +independent clients to run their operating systems and applications in 19.124 +an environment providing protection, resource isolation and 19.125 +accounting. The project web page contains further information along 19.126 +with pointers to papers and technical reports: 19.127 +\path{http://www.cl.cam.ac.uk/xeno} 19.128 + 19.129 +Xen has since grown into a fully-fledged project in its own right, 19.130 +enabling us to investigate interesting research issues regarding the 19.131 +best techniques for virtualising resources such as the CPU, memory, 19.132 +disk and network. The project has been bolstered by support from 19.133 +Intel Research Cambridge, and HP Labs, who are now working closely 19.134 +with us. 19.135 + 19.136 +Xen was first described in a paper presented at SOSP in 19.137 +2003\footnote{\tt 19.138 + http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf}, and the 19.139 +first public release (1.0) was made that October. Since then, Xen has 19.140 +significantly matured and is now used in production scenarios on many 19.141 +sites. 19.142 + 19.143 +Xen 2.0 features greatly enhanced hardware support, configuration 19.144 +flexibility, usability and a larger complement of supported operating 19.145 +systems. This latest release takes Xen a step closer to becoming the 19.146 +definitive open source solution for virtualisation.
20.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 20.2 +++ b/docs/src/user/redhat.tex Tue Sep 20 09:17:33 2005 +0000 20.3 @@ -0,0 +1,61 @@ 20.4 +\chapter{Installing Xen / XenLinux on Red~Hat or Fedora Core} 20.5 + 20.6 +When using Xen / XenLinux on a standard Linux distribution there are a 20.7 +couple of things to watch out for: 20.8 + 20.9 +Note that, because domains greater than 0 don't have any privileged 20.10 +access at all, certain commands in the default boot sequence will fail 20.11 +e.g.\ attempts to update the hwclock, change the console font, update 20.12 +the keytable map, start apmd (power management), or gpm (mouse 20.13 +cursor). Either ignore the errors (they should be harmless), or 20.14 +remove them from the startup scripts. Deleting the following links 20.15 +are a good start: {\path{S24pcmcia}}, {\path{S09isdn}}, 20.16 +{\path{S17keytable}}, {\path{S26apmd}}, {\path{S85gpm}}. 20.17 + 20.18 +If you want to use a single root file system that works cleanly for 20.19 +both domain~0 and unprivileged domains, a useful trick is to use 20.20 +different `init' run levels. For example, use run level 3 for 20.21 +domain~0, and run level 4 for other domains. This enables different 20.22 +startup scripts to be run in depending on the run level number passed 20.23 +on the kernel command line. 20.24 + 20.25 +If using NFS root files systems mounted either from an external server 20.26 +or from domain0 there are a couple of other gotchas. The default 20.27 +{\path{/etc/sysconfig/iptables}} rules block NFS, so part way through 20.28 +the boot sequence things will suddenly go dead. 20.29 + 20.30 +If you're planning on having a separate NFS {\path{/usr}} partition, 20.31 +the RH9 boot scripts don't make life easy - they attempt to mount NFS 20.32 +file systems way to late in the boot process. The easiest way I found 20.33 +to do this was to have a {\path{/linuxrc}} script run ahead of 20.34 +{\path{/sbin/init}} that mounts {\path{/usr}}: 20.35 + 20.36 +\begin{quote} 20.37 + \begin{small}\begin{verbatim} 20.38 + #!/bin/bash 20.39 + /sbin/ipconfig lo 127.0.0.1 20.40 + /sbin/portmap 20.41 + /bin/mount /usr 20.42 + exec /sbin/init "$@" <>/dev/console 2>&1 20.43 +\end{verbatim}\end{small} 20.44 +\end{quote} 20.45 + 20.46 +%% $ XXX SMH: font lock fix :-) 20.47 + 20.48 +The one slight complication with the above is that 20.49 +{\path{/sbin/portmap}} is dynamically linked against 20.50 +{\path{/usr/lib/libwrap.so.0}} Since this is in {\path{/usr}}, it 20.51 +won't work. This can be solved by copying the file (and link) below 20.52 +the {\path{/usr}} mount point, and just let the file be `covered' when 20.53 +the mount happens. 20.54 + 20.55 +In some installations, where a shared read-only {\path{/usr}} is being 20.56 +used, it may be desirable to move other large directories over into 20.57 +the read-only {\path{/usr}}. For example, you might replace 20.58 +{\path{/bin}}, {\path{/lib}} and {\path{/sbin}} with links into 20.59 +{\path{/usr/root/bin}}, {\path{/usr/root/lib}} and 20.60 +{\path{/usr/root/sbin}} respectively. This creates other problems for 20.61 +running the {\path{/linuxrc}} script, requiring bash, portmap, mount, 20.62 +ifconfig, and a handful of other shared libraries to be copied below 20.63 +the mount point --- a simple statically-linked C program would solve 20.64 +this problem.
21.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 21.2 +++ b/docs/src/user/start_addl_dom.tex Tue Sep 20 09:17:33 2005 +0000 21.3 @@ -0,0 +1,172 @@ 21.4 +\chapter{Starting Additional Domains} 21.5 + 21.6 +The first step in creating a new domain is to prepare a root 21.7 +filesystem for it to boot from. Typically, this might be stored in a 21.8 +normal partition, an LVM or other volume manager partition, a disk 21.9 +file or on an NFS server. A simple way to do this is simply to boot 21.10 +from your standard OS install CD and install the distribution into 21.11 +another partition on your hard drive. 21.12 + 21.13 +To start the \xend\ control daemon, type 21.14 +\begin{quote} 21.15 + \verb!# xend start! 21.16 +\end{quote} 21.17 + 21.18 +If you wish the daemon to start automatically, see the instructions in 21.19 +Section~\ref{s:xend}. Once the daemon is running, you can use the 21.20 +\path{xm} tool to monitor and maintain the domains running on your 21.21 +system. This chapter provides only a brief tutorial. We provide full 21.22 +details of the \path{xm} tool in the next chapter. 21.23 + 21.24 +% \section{From the web interface} 21.25 +% 21.26 +% Boot the Xen machine and start Xensv (see Chapter~\ref{cha:xensv} 21.27 +% for more details) using the command: \\ 21.28 +% \verb_# xensv start_ \\ 21.29 +% This will also start Xend (see Chapter~\ref{cha:xend} for more 21.30 +% information). 21.31 +% 21.32 +% The domain management interface will then be available at {\tt 21.33 +% http://your\_machine:8080/}. This provides a user friendly wizard 21.34 +% for starting domains and functions for managing running domains. 21.35 +% 21.36 +% \section{From the command line} 21.37 + 21.38 + 21.39 +\section{Creating a Domain Configuration File} 21.40 + 21.41 +Before you can start an additional domain, you must create a 21.42 +configuration file. We provide two example files which you can use as 21.43 +a starting point: 21.44 +\begin{itemize} 21.45 +\item \path{/etc/xen/xmexample1} is a simple template configuration 21.46 + file for describing a single VM. 21.47 + 21.48 +\item \path{/etc/xen/xmexample2} file is a template description that 21.49 + is intended to be reused for multiple virtual machines. Setting the 21.50 + value of the \path{vmid} variable on the \path{xm} command line 21.51 + fills in parts of this template. 21.52 +\end{itemize} 21.53 + 21.54 +Copy one of these files and edit it as appropriate. Typical values 21.55 +you may wish to edit include: 21.56 + 21.57 +\begin{quote} 21.58 +\begin{description} 21.59 +\item[kernel] Set this to the path of the kernel you compiled for use 21.60 + with Xen (e.g.\ \path{kernel = `/boot/vmlinuz-2.6-xenU'}) 21.61 +\item[memory] Set this to the size of the domain's memory in megabytes 21.62 + (e.g.\ \path{memory = 64}) 21.63 +\item[disk] Set the first entry in this list to calculate the offset 21.64 + of the domain's root partition, based on the domain ID. Set the 21.65 + second to the location of \path{/usr} if you are sharing it between 21.66 + domains (e.g.\ \path{disk = [`phy:your\_hard\_drive\%d,sda1,w' \% 21.67 + (base\_partition\_number + vmid), 21.68 + `phy:your\_usr\_partition,sda6,r' ]} 21.69 +\item[dhcp] Uncomment the dhcp variable, so that the domain will 21.70 + receive its IP address from a DHCP server (e.g.\ \path{dhcp=`dhcp'}) 21.71 +\end{description} 21.72 +\end{quote} 21.73 + 21.74 +You may also want to edit the {\bf vif} variable in order to choose 21.75 +the MAC address of the virtual ethernet interface yourself. For 21.76 +example: 21.77 +\begin{quote} 21.78 +\verb_vif = [`mac=00:06:AA:F6:BB:B3']_ 21.79 +\end{quote} 21.80 +If you do not set this variable, \xend\ will automatically generate a 21.81 +random MAC address from an unused range. 21.82 + 21.83 + 21.84 +\section{Booting the Domain} 21.85 + 21.86 +The \path{xm} tool provides a variety of commands for managing 21.87 +domains. Use the \path{create} command to start new domains. Assuming 21.88 +you've created a configuration file \path{myvmconf} based around 21.89 +\path{/etc/xen/xmexample2}, to start a domain with virtual machine 21.90 +ID~1 you should type: 21.91 + 21.92 +\begin{quote} 21.93 +\begin{verbatim} 21.94 +# xm create -c myvmconf vmid=1 21.95 +\end{verbatim} 21.96 +\end{quote} 21.97 + 21.98 +The \path{-c} switch causes \path{xm} to turn into the domain's 21.99 +console after creation. The \path{vmid=1} sets the \path{vmid} 21.100 +variable used in the \path{myvmconf} file. 21.101 + 21.102 +You should see the console boot messages from the new domain appearing 21.103 +in the terminal in which you typed the command, culminating in a login 21.104 +prompt. 21.105 + 21.106 + 21.107 +\section{Example: ttylinux} 21.108 + 21.109 +Ttylinux is a very small Linux distribution, designed to require very 21.110 +few resources. We will use it as a concrete example of how to start a 21.111 +Xen domain. Most users will probably want to install a full-featured 21.112 +distribution once they have mastered the basics\footnote{ttylinux is 21.113 + maintained by Pascal Schmidt. You can download source packages from 21.114 + the distribution's home page: {\tt 21.115 + http://www.minimalinux.org/ttylinux/}}. 21.116 + 21.117 +\begin{enumerate} 21.118 +\item Download and extract the ttylinux disk image from the Files 21.119 + section of the project's SourceForge site (see 21.120 + \path{http://sf.net/projects/xen/}). 21.121 +\item Create a configuration file like the following: 21.122 +\begin{verbatim} 21.123 +kernel = "/boot/vmlinuz-2.6-xenU" 21.124 +memory = 64 21.125 +name = "ttylinux" 21.126 +nics = 1 21.127 +ip = "1.2.3.4" 21.128 +disk = ['file:/path/to/ttylinux/rootfs,sda1,w'] 21.129 +root = "/dev/sda1 ro" 21.130 +\end{verbatim} 21.131 +\item Now start the domain and connect to its console: 21.132 +\begin{verbatim} 21.133 +xm create configfile -c 21.134 +\end{verbatim} 21.135 +\item Login as root, password root. 21.136 +\end{enumerate} 21.137 + 21.138 + 21.139 +\section{Starting / Stopping Domains Automatically} 21.140 + 21.141 +It is possible to have certain domains start automatically at boot 21.142 +time and to have dom0 wait for all running domains to shutdown before 21.143 +it shuts down the system. 21.144 + 21.145 +To specify a domain is to start at boot-time, place its configuration 21.146 +file (or a link to it) under \path{/etc/xen/auto/}. 21.147 + 21.148 +A Sys-V style init script for Red Hat and LSB-compliant systems is 21.149 +provided and will be automatically copied to \path{/etc/init.d/} 21.150 +during install. You can then enable it in the appropriate way for 21.151 +your distribution. 21.152 + 21.153 +For instance, on Red Hat: 21.154 + 21.155 +\begin{quote} 21.156 + \verb_# chkconfig --add xendomains_ 21.157 +\end{quote} 21.158 + 21.159 +By default, this will start the boot-time domains in runlevels 3, 4 21.160 +and 5. 21.161 + 21.162 +You can also use the \path{service} command to run this script 21.163 +manually, e.g: 21.164 + 21.165 +\begin{quote} 21.166 + \verb_# service xendomains start_ 21.167 + 21.168 + Starts all the domains with config files under /etc/xen/auto/. 21.169 +\end{quote} 21.170 + 21.171 +\begin{quote} 21.172 + \verb_# service xendomains stop_ 21.173 + 21.174 + Shuts down ALL running Xen domains. 21.175 +\end{quote}