view tools/xenmon/README @ 9685:7b9dacaf3340

This is a patch for XenMon which only applies to the userspace tools.
The primary purpose of this patch is to add support for non-polling
access to the xen trace buffers. The hypervisor changes have already
been accepted.

Also included are a few bug fixes and some minor new features:

1. If xenmon is run without first allocating trace buffers (via
'setsize') and enabling them (via 'tbctl'), then this is done
automatically using sensible defaults.

2. There was a bug that caused the first second's worth of data output
from xenmon to be erroneous; This has been fixed.

3. There was a bug that caused xenmon to sometimes not display data for
newly created domains; This has also been fixed.

4. The xenmon display has a 'heartbeat' which flickers once per second.
This is to show that xenmon is still alive, even though the display
isn't changing at all, a situation that can happen sometimes when there
is nothing at all happening on a particular cpu.

5. Added cpu utilization display to the top of the xenmon window.

6. Added a bunch of options in xenmon to control exactly which metrics
are displayed, so the screen doesn't get cluttered with stuff you're not
interested in. The new options are:

7. Added an option ("--cpu=N") to xenmon to specify which physical cpu
you'd like data displayed for.

8. Updated the README with information about default trace buffer size, etc.

Signed-off-by: Rob Gardner <>
date Fri Apr 14 14:21:12 2006 +0100 (2006-04-14)
parents 394390f6ff85
line source
1 Xen Performance Monitor
2 -----------------------
4 The xenmon tools make use of the existing xen tracing feature to provide fine
5 grained reporting of various domain related metrics. It should be stressed that
6 the script included here is just an example of the data that may be
7 displayed. The xenbake demon keeps a large amount of history in a shared memory
8 area that may be accessed by tools such as xenmon.
10 For each domain, xenmon reports various metrics. One part of the display is a
11 group of metrics that have been accumulated over the last second, while another
12 part of the display shows data measured over 10 seconds. Other measurement
13 intervals are possible, but we have just chosen 1s and 10s as an example.
16 Execution Count
17 ---------------
18 o The number of times that a domain was scheduled to run (ie, dispatched) over
19 the measurement interval
22 CPU usage
23 ---------
24 o Total time used over the measurement interval
25 o Usage expressed as a percentage of the measurement interval
26 o Average cpu time used during each execution of the domain
29 Waiting time
30 ------------
31 This is how much time the domain spent waiting to run, or put another way, the
32 amount of time the domain spent in the "runnable" state (or on the run queue)
33 but not actually running. Xenmon displays:
35 o Total time waiting over the measurement interval
36 o Wait time expressed as a percentage of the measurement interval
37 o Average waiting time for each execution of the domain
39 Blocked time
40 ------------
41 This is how much time the domain spent blocked (or sleeping); Put another way,
42 the amount of time the domain spent not needing/wanting the cpu because it was
43 waiting for some event (ie, I/O). Xenmon reports:
45 o Total time blocked over the measurement interval
46 o Blocked time expressed as a percentage of the measurement interval
47 o Blocked time per I/O (see I/O count below)
49 Allocation time
50 ---------------
51 This is how much cpu time was allocated to the domain by the scheduler; This is
52 distinct from cpu usage since the "time slice" given to a domain is frequently
53 cut short for one reason or another, ie, the domain requests I/O and blocks.
54 Xenmon reports:
56 o Average allocation time per execution (ie, time slice)
57 o Min and Max allocation times
59 I/O Count
60 ---------
61 This is a rough measure of I/O requested by the domain. The number of page
62 exchanges (or page "flips") between the domain and dom0 are counted. The
63 number of pages exchanged may not accurately reflect the number of bytes
64 transferred to/from a domain due to partial pages being used by the network
65 protocols, etc. But it does give a good sense of the magnitude of I/O being
66 requested by a domain. Xenmon reports:
68 o Total number of page exchanges during the measurement interval
69 o Average number of page exchanges per execution of the domain
72 Usage Notes and issues
73 ----------------------
74 - Start xenmon by simply running; The xenbake demon is started and
75 stopped automatically by xenmon.
76 - To see the various options for xenmon, run xenmon -h. Ditto for xenbaked.
77 - xenmon also has an option (-n) to output log data to a file instead of the
78 curses interface.
79 - NDOMAINS is defined to be 32, but can be changed by recompiling xenbaked
80 - appears to create 1-2% cpu overhead; Part of this is just the
81 overhead of the python interpreter. Part of it may be the number of trace
82 records being generated. The number of trace records generated can be
83 limited by setting the trace mask (with a dom0 Op), which controls which
84 events cause a trace record to be emitted.
85 - To exit xenmon, type 'q'
86 - To cycle the display to other physical cpu's, type 'c'
87 - The first time xenmon is run, it attempts to allocate xen trace buffers
88 using a default size. If you wish to use a non-default value for the
89 trace buffer size, run the 'setsize' program (located in tools/xentrace)
90 and specify the number of memory pages as a parameter. The default is 20.
91 - Not well tested with domains using more than 1 virtual cpu
92 - If you create a lot of domains, or repeatedly kill a domain and restart it,
93 and the domain id's get to be bigger than NDOMAINS, then xenmon behaves badly.
94 This is a bug that is due to xenbaked's treatment of domain id's vs. domain
95 indices in a data array. Will be fixed in a future release; Workaround:
96 Increase NDOMAINS in xenbaked and rebuild.
98 Future Work
99 -----------
100 o RPC interface to allow external entities to programmatically access processed data
101 o I/O Count batching to reduce number of trace records generated
103 Case Study
104 ----------
105 We have written a case study which demonstrates some of the usefulness of
106 this tool and the metrics reported. It is available at:
109 Authors
110 -------
111 Diwaker Gupta <>
112 Rob Gardner <>
113 Lucy Cherkasova <>