rev |
line source |
kaf24@7840
|
1 Xen Performance Monitor
|
kaf24@7840
|
2 -----------------------
|
kaf24@7840
|
3
|
kaf24@7840
|
4 The xenmon tools make use of the existing xen tracing feature to provide fine
|
kaf24@7840
|
5 grained reporting of various domain related metrics. It should be stressed that
|
kaf24@7840
|
6 the xenmon.py script included here is just an example of the data that may be
|
kaf24@7840
|
7 displayed. The xenbake demon keeps a large amount of history in a shared memory
|
kaf24@7840
|
8 area that may be accessed by tools such as xenmon.
|
kaf24@7840
|
9
|
kaf24@7840
|
10 For each domain, xenmon reports various metrics. One part of the display is a
|
kaf24@7840
|
11 group of metrics that have been accumulated over the last second, while another
|
kaf24@7840
|
12 part of the display shows data measured over 10 seconds. Other measurement
|
kaf24@7840
|
13 intervals are possible, but we have just chosen 1s and 10s as an example.
|
kaf24@7840
|
14
|
kaf24@7840
|
15
|
kaf24@7840
|
16 Execution Count
|
kaf24@7840
|
17 ---------------
|
kaf24@7840
|
18 o The number of times that a domain was scheduled to run (ie, dispatched) over
|
kaf24@7840
|
19 the measurement interval
|
kaf24@7840
|
20
|
kaf24@7840
|
21
|
kaf24@7840
|
22 CPU usage
|
kaf24@7840
|
23 ---------
|
kaf24@7840
|
24 o Total time used over the measurement interval
|
kaf24@7840
|
25 o Usage expressed as a percentage of the measurement interval
|
kaf24@7840
|
26 o Average cpu time used during each execution of the domain
|
kaf24@7840
|
27
|
kaf24@7840
|
28
|
kaf24@7840
|
29 Waiting time
|
kaf24@7840
|
30 ------------
|
kaf24@7840
|
31 This is how much time the domain spent waiting to run, or put another way, the
|
kaf24@7840
|
32 amount of time the domain spent in the "runnable" state (or on the run queue)
|
kaf24@7840
|
33 but not actually running. Xenmon displays:
|
kaf24@7840
|
34
|
kaf24@7840
|
35 o Total time waiting over the measurement interval
|
kaf24@7840
|
36 o Wait time expressed as a percentage of the measurement interval
|
kaf24@7840
|
37 o Average waiting time for each execution of the domain
|
kaf24@7840
|
38
|
kaf24@7840
|
39 Blocked time
|
kaf24@7840
|
40 ------------
|
kaf24@7840
|
41 This is how much time the domain spent blocked (or sleeping); Put another way,
|
kaf24@7840
|
42 the amount of time the domain spent not needing/wanting the cpu because it was
|
kaf24@7840
|
43 waiting for some event (ie, I/O). Xenmon reports:
|
kaf24@7840
|
44
|
kaf24@7840
|
45 o Total time blocked over the measurement interval
|
kaf24@7840
|
46 o Blocked time expressed as a percentage of the measurement interval
|
kaf24@7840
|
47 o Blocked time per I/O (see I/O count below)
|
kaf24@7840
|
48
|
kaf24@7840
|
49 Allocation time
|
kaf24@7840
|
50 ---------------
|
kaf24@7840
|
51 This is how much cpu time was allocated to the domain by the scheduler; This is
|
kaf24@7840
|
52 distinct from cpu usage since the "time slice" given to a domain is frequently
|
kaf24@7840
|
53 cut short for one reason or another, ie, the domain requests I/O and blocks.
|
kaf24@7840
|
54 Xenmon reports:
|
kaf24@7840
|
55
|
kaf24@7840
|
56 o Average allocation time per execution (ie, time slice)
|
kaf24@7840
|
57 o Min and Max allocation times
|
kaf24@7840
|
58
|
kaf24@7840
|
59 I/O Count
|
kaf24@7840
|
60 ---------
|
kaf24@7840
|
61 This is a rough measure of I/O requested by the domain. The number of page
|
kaf24@7840
|
62 exchanges (or page "flips") between the domain and dom0 are counted. The
|
kaf24@7840
|
63 number of pages exchanged may not accurately reflect the number of bytes
|
kaf24@7840
|
64 transferred to/from a domain due to partial pages being used by the network
|
kaf24@7840
|
65 protocols, etc. But it does give a good sense of the magnitude of I/O being
|
kaf24@7840
|
66 requested by a domain. Xenmon reports:
|
kaf24@7840
|
67
|
kaf24@7840
|
68 o Total number of page exchanges during the measurement interval
|
kaf24@7840
|
69 o Average number of page exchanges per execution of the domain
|
kaf24@7840
|
70
|
kaf24@7840
|
71
|
kaf24@7840
|
72 Usage Notes and issues
|
kaf24@7840
|
73 ----------------------
|
kaf24@7840
|
74 - Start xenmon by simply running xenmon.py; The xenbake demon is started and
|
kaf24@7840
|
75 stopped automatically by xenmon.
|
kaf24@7840
|
76 - To see the various options for xenmon, run xenmon -h. Ditto for xenbaked.
|
kaf24@7840
|
77 - xenmon also has an option (-n) to output log data to a file instead of the
|
kaf24@7840
|
78 curses interface.
|
kaf24@7840
|
79 - NDOMAINS is defined to be 32, but can be changed by recompiling xenbaked
|
kaf24@7840
|
80 - Xenmon.py appears to create 1-2% cpu overhead; Part of this is just the
|
kaf24@7840
|
81 overhead of the python interpreter. Part of it may be the number of trace
|
kaf24@7840
|
82 records being generated. The number of trace records generated can be
|
kaf24@7840
|
83 limited by setting the trace mask (with a dom0 Op), which controls which
|
kaf24@7840
|
84 events cause a trace record to be emitted.
|
kaf24@7840
|
85 - To exit xenmon, type 'q'
|
kaf24@7840
|
86 - To cycle the display to other physical cpu's, type 'c'
|
kaf24@9685
|
87 - The first time xenmon is run, it attempts to allocate xen trace buffers
|
kaf24@9685
|
88 using a default size. If you wish to use a non-default value for the
|
kaf24@9685
|
89 trace buffer size, run the 'setsize' program (located in tools/xentrace)
|
kaf24@9685
|
90 and specify the number of memory pages as a parameter. The default is 20.
|
kaf24@9685
|
91 - Not well tested with domains using more than 1 virtual cpu
|
kaf24@9685
|
92 - If you create a lot of domains, or repeatedly kill a domain and restart it,
|
kaf24@9685
|
93 and the domain id's get to be bigger than NDOMAINS, then xenmon behaves badly.
|
kaf24@9685
|
94 This is a bug that is due to xenbaked's treatment of domain id's vs. domain
|
kaf24@9685
|
95 indices in a data array. Will be fixed in a future release; Workaround:
|
kaf24@9685
|
96 Increase NDOMAINS in xenbaked and rebuild.
|
kaf24@7840
|
97
|
kaf24@7840
|
98 Future Work
|
kaf24@7840
|
99 -----------
|
kaf24@7840
|
100 o RPC interface to allow external entities to programmatically access processed data
|
kaf24@7840
|
101 o I/O Count batching to reduce number of trace records generated
|
kaf24@7840
|
102
|
kaf24@7840
|
103 Case Study
|
kaf24@7840
|
104 ----------
|
kaf24@7840
|
105 We have written a case study which demonstrates some of the usefulness of
|
kaf24@7840
|
106 this tool and the metrics reported. It is available at:
|
kaf24@7840
|
107 http://www.hpl.hp.com/techreports/2005/HPL-2005-187.html
|
kaf24@7840
|
108
|
kaf24@7840
|
109 Authors
|
kaf24@7840
|
110 -------
|
kaf24@7840
|
111 Diwaker Gupta <diwaker.gupta@hp.com>
|
kaf24@7840
|
112 Rob Gardner <rob.gardner@hp.com>
|
kaf24@7840
|
113 Lucy Cherkasova <lucy.cherkasova.hp.com>
|
kaf24@7840
|
114
|