debuggers.hg

annotate tools/xenmon/README @ 22848:6341fe0f4e5a

Added tag 4.1.0-rc2 for changeset 9dca60d88c63
author Keir Fraser <keir@xen.org>
date Tue Jan 25 14:06:55 2011 +0000 (2011-01-25)
parents 7b9dacaf3340
children
rev   line source
kaf24@7840 1 Xen Performance Monitor
kaf24@7840 2 -----------------------
kaf24@7840 3
kaf24@7840 4 The xenmon tools make use of the existing xen tracing feature to provide fine
kaf24@7840 5 grained reporting of various domain related metrics. It should be stressed that
kaf24@7840 6 the xenmon.py script included here is just an example of the data that may be
kaf24@7840 7 displayed. The xenbake demon keeps a large amount of history in a shared memory
kaf24@7840 8 area that may be accessed by tools such as xenmon.
kaf24@7840 9
kaf24@7840 10 For each domain, xenmon reports various metrics. One part of the display is a
kaf24@7840 11 group of metrics that have been accumulated over the last second, while another
kaf24@7840 12 part of the display shows data measured over 10 seconds. Other measurement
kaf24@7840 13 intervals are possible, but we have just chosen 1s and 10s as an example.
kaf24@7840 14
kaf24@7840 15
kaf24@7840 16 Execution Count
kaf24@7840 17 ---------------
kaf24@7840 18 o The number of times that a domain was scheduled to run (ie, dispatched) over
kaf24@7840 19 the measurement interval
kaf24@7840 20
kaf24@7840 21
kaf24@7840 22 CPU usage
kaf24@7840 23 ---------
kaf24@7840 24 o Total time used over the measurement interval
kaf24@7840 25 o Usage expressed as a percentage of the measurement interval
kaf24@7840 26 o Average cpu time used during each execution of the domain
kaf24@7840 27
kaf24@7840 28
kaf24@7840 29 Waiting time
kaf24@7840 30 ------------
kaf24@7840 31 This is how much time the domain spent waiting to run, or put another way, the
kaf24@7840 32 amount of time the domain spent in the "runnable" state (or on the run queue)
kaf24@7840 33 but not actually running. Xenmon displays:
kaf24@7840 34
kaf24@7840 35 o Total time waiting over the measurement interval
kaf24@7840 36 o Wait time expressed as a percentage of the measurement interval
kaf24@7840 37 o Average waiting time for each execution of the domain
kaf24@7840 38
kaf24@7840 39 Blocked time
kaf24@7840 40 ------------
kaf24@7840 41 This is how much time the domain spent blocked (or sleeping); Put another way,
kaf24@7840 42 the amount of time the domain spent not needing/wanting the cpu because it was
kaf24@7840 43 waiting for some event (ie, I/O). Xenmon reports:
kaf24@7840 44
kaf24@7840 45 o Total time blocked over the measurement interval
kaf24@7840 46 o Blocked time expressed as a percentage of the measurement interval
kaf24@7840 47 o Blocked time per I/O (see I/O count below)
kaf24@7840 48
kaf24@7840 49 Allocation time
kaf24@7840 50 ---------------
kaf24@7840 51 This is how much cpu time was allocated to the domain by the scheduler; This is
kaf24@7840 52 distinct from cpu usage since the "time slice" given to a domain is frequently
kaf24@7840 53 cut short for one reason or another, ie, the domain requests I/O and blocks.
kaf24@7840 54 Xenmon reports:
kaf24@7840 55
kaf24@7840 56 o Average allocation time per execution (ie, time slice)
kaf24@7840 57 o Min and Max allocation times
kaf24@7840 58
kaf24@7840 59 I/O Count
kaf24@7840 60 ---------
kaf24@7840 61 This is a rough measure of I/O requested by the domain. The number of page
kaf24@7840 62 exchanges (or page "flips") between the domain and dom0 are counted. The
kaf24@7840 63 number of pages exchanged may not accurately reflect the number of bytes
kaf24@7840 64 transferred to/from a domain due to partial pages being used by the network
kaf24@7840 65 protocols, etc. But it does give a good sense of the magnitude of I/O being
kaf24@7840 66 requested by a domain. Xenmon reports:
kaf24@7840 67
kaf24@7840 68 o Total number of page exchanges during the measurement interval
kaf24@7840 69 o Average number of page exchanges per execution of the domain
kaf24@7840 70
kaf24@7840 71
kaf24@7840 72 Usage Notes and issues
kaf24@7840 73 ----------------------
kaf24@7840 74 - Start xenmon by simply running xenmon.py; The xenbake demon is started and
kaf24@7840 75 stopped automatically by xenmon.
kaf24@7840 76 - To see the various options for xenmon, run xenmon -h. Ditto for xenbaked.
kaf24@7840 77 - xenmon also has an option (-n) to output log data to a file instead of the
kaf24@7840 78 curses interface.
kaf24@7840 79 - NDOMAINS is defined to be 32, but can be changed by recompiling xenbaked
kaf24@7840 80 - Xenmon.py appears to create 1-2% cpu overhead; Part of this is just the
kaf24@7840 81 overhead of the python interpreter. Part of it may be the number of trace
kaf24@7840 82 records being generated. The number of trace records generated can be
kaf24@7840 83 limited by setting the trace mask (with a dom0 Op), which controls which
kaf24@7840 84 events cause a trace record to be emitted.
kaf24@7840 85 - To exit xenmon, type 'q'
kaf24@7840 86 - To cycle the display to other physical cpu's, type 'c'
kaf24@9685 87 - The first time xenmon is run, it attempts to allocate xen trace buffers
kaf24@9685 88 using a default size. If you wish to use a non-default value for the
kaf24@9685 89 trace buffer size, run the 'setsize' program (located in tools/xentrace)
kaf24@9685 90 and specify the number of memory pages as a parameter. The default is 20.
kaf24@9685 91 - Not well tested with domains using more than 1 virtual cpu
kaf24@9685 92 - If you create a lot of domains, or repeatedly kill a domain and restart it,
kaf24@9685 93 and the domain id's get to be bigger than NDOMAINS, then xenmon behaves badly.
kaf24@9685 94 This is a bug that is due to xenbaked's treatment of domain id's vs. domain
kaf24@9685 95 indices in a data array. Will be fixed in a future release; Workaround:
kaf24@9685 96 Increase NDOMAINS in xenbaked and rebuild.
kaf24@7840 97
kaf24@7840 98 Future Work
kaf24@7840 99 -----------
kaf24@7840 100 o RPC interface to allow external entities to programmatically access processed data
kaf24@7840 101 o I/O Count batching to reduce number of trace records generated
kaf24@7840 102
kaf24@7840 103 Case Study
kaf24@7840 104 ----------
kaf24@7840 105 We have written a case study which demonstrates some of the usefulness of
kaf24@7840 106 this tool and the metrics reported. It is available at:
kaf24@7840 107 http://www.hpl.hp.com/techreports/2005/HPL-2005-187.html
kaf24@7840 108
kaf24@7840 109 Authors
kaf24@7840 110 -------
kaf24@7840 111 Diwaker Gupta <diwaker.gupta@hp.com>
kaf24@7840 112 Rob Gardner <rob.gardner@hp.com>
kaf24@7840 113 Lucy Cherkasova <lucy.cherkasova.hp.com>
kaf24@7840 114