view docs/misc/xenstore.txt @ 16715:c5deb251b9dc

Update version to 3.2.0-rc4
author Keir Fraser <>
date Sat Dec 29 17:57:37 2007 +0000 (2007-12-29)
parents 95bb6485d29d
children ad064e48b6f2
line source
1 Xenstore protocol specification
2 -------------------------------
4 Xenstore implements a database which maps filename-like pathnames
5 (also known as `keys') to values. Clients may read and write values,
6 watch for changes, and set permissions to allow or deny access. There
7 is a rudimentary transaction system.
9 While xenstore and most tools and APIs are capable of dealing with
10 arbitrary binary data as values, this should generally be avoided.
11 Data should generally be human-readable for ease of management and
12 debugging; xenstore is not a high-performance facility and should be
13 used only for small amounts of control plane data. Therefore xenstore
14 values should normally be 7-bit ASCII text strings containing bytes
15 0x20..0x7f only, and should not contain a trailing nul byte. (The
16 APIs used for accessing xenstore generally add a nul when reading, for
17 the caller's convenience.)
19 A separate specification will detail the keys and values which are
20 used in the Xen system and what their meanings are. (Sadly that
21 specification currently exists only in multiple out-of-date versions.)
24 Paths are /-separated and start with a /, just as Unix filenames.
26 We can speak of two paths being <child> and <parent>, which is the
27 case if they're identical, or if <parent> is /, or if <parent>/ is an
28 initial substring of <child>. (This includes <path> being a child of
29 itself.)
31 If a particular path exists, all of its parents do too. Every
32 existing path maps to a possibly empty value, and may also have zero
33 or more immediate children. There is thus no particular distinction
34 between directories and leaf nodes. However, it is conventional not
35 to store nonempty values at nodes which also have children.
37 The permitted character for paths set is ASCII alphanumerics and plus
38 the four punctuation characters -/_@ (hyphen slash underscore atsign).
39 @ should be avoided except to specify special watches (see below).
40 Doubled slashes and trailing slashes (except to specify the root) are
41 forbidden. The empty path is also forbidden. Paths longer than 3072
42 bytes are forbidden; clients specifying relative paths should keep
43 them to within 2048 bytes. (See XENSTORE_*_PATH_MAX in xs_wire.h.)
46 Communication with xenstore is via either sockets, or event channel
47 and shared memory, as specified in io/xs_wire.h: each message in
48 either direction is a header formatted as a struct xsd_sockmsg
49 followed by xsd_sockmsg.len bytes of payload.
51 The payload syntax varies according to the type field. Generally
52 requests each generate a reply with an identical type, req_id and
53 tx_id. However, if an error occurs, a reply will be returned with
54 type ERROR, and only req_id and tx_id copied from the request.
56 A caller who sends several requests may receive the replies in any
57 order and must use req_id (and tx_id, if applicable) to match up
58 replies to requests. (The current implementation always replies to
59 requests in the order received but this should not be relied on.)
61 The payload length (len field of the header) is limited to 4096
62 (XENSTORE_PAYLOAD_MAX) in both directions. If a client exceeds the
63 limit, its xenstored connection will be immediately killed by
64 xenstored, which is usually catastrophic from the client's point of
65 view. Clients (particularly domains, which cannot just reconnect)
66 should avoid this.
68 Existing clients do not always contain defences against overly long
69 payloads. Increasing xenstored's limit is therefore difficult; it
70 would require negotiation with the client, and obviously would make
71 parts of xenstore inaccessible to some clients. In any case passing
72 bulk data through xenstore is not recommended as the performance
73 properties are poor.
76 ---------- Xenstore protocol details - introduction ----------
78 The payload syntax and semantics of the requests and replies are
79 described below. In the payload syntax specifications we use the
80 following notations:
82 | A nul (zero) byte.
83 <foo> A string guaranteed not to contain any nul bytes.
84 <foo|> Binary data (which may contain zero or more nul bytes)
85 <foo>|* Zero or more strings each followed by a trailing nul
86 <foo>|+ One or more strings each followed by a trailing nul
87 ? Reserved value (may not contain nuls)
88 ?? Reserved value (may contain nuls)
90 Except as otherwise noted, reserved values are believed to be sent as
91 empty strings by all current clients. Clients should not send
92 nonempty strings for reserved values; those parts of the protocol may
93 be used for extension in the future.
96 Error replies are as follows:
98 ERROR E<something>|
99 Where E<something> is the name of an errno value
100 listed in io/xs_wire.h. Note that the string name
101 is transmitted, not a numeric value.
104 Where no reply payload format is specified below, success responses
105 have the following payload:
106 OK|
108 Values commonly included in payloads include:
110 <path>
111 Specifies a path in the hierarchical key structure.
112 If <path> starts with a / it simply represents that path.
114 <path> is allowed not to start with /, in which case the
115 caller must be a domain (rather than connected via a socket)
116 and the path is taken to be relative to /local/domain/<domid>
117 (eg, `x/y' sent by domain 3 would mean `/local/domain/3/x/y').
119 <domid>
120 Integer domid, represented as decimal number 0..65535.
121 Parsing errors and values out of range generally go
122 undetected. The special DOMID_... values (see xen.h) are
123 represented as integers; unless otherwise specified it
124 is an error not to specify a real domain id.
128 The following are the actual type values, including the request and
129 reply payloads as applicable:
132 ---------- Database read, write and permissions operatons ----------
134 READ <path>| <value|>
135 WRITE <path>|<value|>
136 Store and read the octet string <value> at <path>.
137 WRITE creates any missing parent paths, with empty values.
139 MKDIR <path>|
140 Ensures that the <path> exists, by necessary by creating
141 it and any missing parents with empty values. If <path>
142 or any parent already exists, its value is left unchanged.
144 RM <path>|
145 Ensures that the <path> does not exist, by deleting
146 it and all of its children. It is not an error if <path> does
147 not exist, but it _is_ an error if <path>'s immediate parent
148 does not exist either.
150 DIRECTORY <path>| <child-leaf-name>|*
151 Gives a list of the immediate children of <path>, as only the
152 leafnames. The resulting children are each named
153 <path>/<child-leaf-name>.
155 GET_PERMS <path>| <perm-as-string>|+
156 SET_PERMS <path>|<perm-as-string>|+?
157 <perm-as-string> is one of the following
158 w<domid> write only
159 r<domid> read only
160 b<domid> both read and write
161 n<domid> no access
162 See section
163 `Permissions' for details of the permissions system.
165 ---------- Watches ----------
167 WATCH <wpath>|<token>|?
168 Adds a watch.
170 When a <path> is modified (including path creation, removal,
171 contents change or permissions change) this generates an event
172 on the changed <path>. Changes made in transactions cause an
173 event only if and when committed. Each occurring event is
174 matched against all the watches currently set up, and each
175 matching watch results in a WATCH_EVENT message (see below).
177 The event's path matches the watch's <wpath> if it is an child
178 of <wpath>.
180 <wpath> can be a <path> to watch or @<wspecial>. In the
181 latter case <wspecial> may have any syntax but it matches
182 (according to the rules above) only the following special
183 events which are invented by xenstored:
184 @introduceDomain occurs on INTRODUCE
185 @releaseDomain occurs on any domain crash or
186 shutdown, and also on RELEASE
187 and domain destruction
189 When a watch is first set up it is triggered once straight
190 away, with <path> equal to <wpath>. Watches may be triggered
191 spuriously. The tx_id in a WATCH request is ignored.
193 Watches are supposed to be restricted by the permissions
194 system but in practice the implementation is imperfect.
195 Applications should not rely on being sent a notification for
196 paths that they cannot read; however, an application may rely
197 on being sent a watch when a path which it _is_ able to read
198 is deleted even if that leaves only a nonexistent unreadable
199 parent. A notification may omitted if a node's permissions
200 are changed so as to make it unreadable, in which case future
201 notifications may be suppressed (and if the node is later made
202 readable, some notifications may have been lost).
204 WATCH_EVENT <epath>|<token>|
205 Unsolicited `reply' generated for matching modfication events
206 as described above. req_id and tx_id are both 0.
208 <epath> is the event's path, ie the actual path that was
209 modifed; however if the event was the recursive removal of an
210 parent of <wpath>, <epath> is just
211 <wpath> (rather than the actual path which was removed). So
212 <epath> is a child of <wpath>, regardless.
214 Iff <wpath> for the watch was specified as a relative pathname,
215 the <epath> path will also be relative (with the same base,
216 obviously).
218 UNWATCH <wpath>|<token>|?
220 ---------- Transactions ----------
222 TRANSACTION_START | <transid>|
223 <transid> is an opaque uint32_t allocated by xenstored
224 represented as unsigned decimal. After this, transaction may
225 be referenced by using <transid> (as 32-bit binary) in the
226 tx_id request header field. When transaction is started whole
227 db is copied; reads and writes happen on the copy.
228 It is not legal to send non-0 tx_id in TRANSACTION_START.
229 Currently xenstored has the bug that after 2^32 transactions
230 it will allocate the transid 0 for an actual transaction.
234 tx_id must refer to existing transaction. After this
235 request the tx_id is no longer valid and may be reused by
236 xenstore. If F, the transaction is discarded. If T,
237 it is committed: if there were any other intervening writes
238 then our END gets get EAGAIN.
240 The plan is that in the future only intervening `conflicting'
241 writes cause EAGAIN, meaning only writes or other commits
242 which changed paths which were read or written in the
243 transaction at hand.
245 ---------- Domain management and xenstored communications ----------
247 INTRODUCE <domid>|<mfn>|<evtchn>|?
248 Notifies xenstored to communicate with this domain.
250 INTRODUCE is currently only used by xend (during domain
251 startup and various forms of restore and resume), and
252 xenstored prevents its use other than by dom0.
254 <domid> must be a real domain id (not 0 and not a special
255 DOMID_... value). <mfn> must be a machine page in that domain
256 represented in signed decimal (!). <evtchn> must be event
257 channel is an unbound event channel in <domid> (likewise in
258 decimal), on which xenstored will call bind_interdomain.
259 Violations of these rules may result in undefined behaviour;
260 for example passing a high-bit-set 32-bit mfn as an unsigned
261 decimal will attempt to use 0x7fffffff instead (!).
263 RELEASE <domid>|
264 Manually requests that xenstored disconnect from the domain.
265 The event channel is unbound at the xenstored end and the page
266 unmapped. If the domain is still running it won't be able to
267 communicate with xenstored. NB that xenstored will in any
268 case detect domain destruction and disconnect by itself.
269 xenstored prevents the use of RELEASE other than by dom0.
271 GET_DOMAIN_PATH <domid>| <path>|
272 Returns the domain's base path, as is used for relative
273 transactions: ie, /local/domain/<domid> (with <domid>
274 normalised). The answer will be useless unless <domid> is a
275 real domain id.
277 IS_DOMAIN_INTRODUCED <domid>| T| or F|
278 Returns T if xenstored is in communication with the domain:
279 ie, if INTRODUCE for the domain has not yet been followed by
280 domain destruction or explicit RELEASE.
282 RESUME <domid>|
284 Arranges that @releaseDomain events will once more be
285 generated when the domain becomes shut down. This might have
286 to be used if a domain were to be shut down (generating one
287 @releaseDomain) and then subsequently restarted, since the
288 state-sensitive algorithm in xenstored will not otherwise send
289 further watch event notifications if the domain were to be
290 shut down again.
292 It is not clear whether this is possible since one would
293 normally expect a domain not to be restarted after being shut
294 down without being destroyed in the meantime. There are
295 currently no users of this request in xen-unstable.
297 xenstored prevents the use of RESUME other than by dom0.
299 ---------- Miscellaneous ----------
301 DEBUG print|<string>|?? sends <string> to debug log
302 DEBUG print|<thing-with-no-nul> EINVAL
303 DEBUG check|?? checks xenstored innards
304 DEBUG <anything-else|> no-op (future extension)
306 These requests should not generally be used and may be
307 withdrawn in the future.