Intel Cache Allocation Technology and Code and Data Prioritization Features

Revision 1.17

1 Basics

Status: Tech Preview
Architecture(s): Intel x86
Component(s): Hypervisor, toolstack
Hardware: L3 CAT: Haswell and beyond CPUs CDP : Broadwell and beyond CPUs L2 CAT: Atom codename Goldmont and beyond CPUs

2 Terminology

3 Overview

Intel provides a set of allocation capabilities including Cache Allocatation Technology (CAT) and Code and Data Prioritization (CDP).

CAT allows an OS or hypervisor to control allocation of a CPU’s shared cache based on application/domain priority or Class of Service (COS). Each COS is configured using capacity bitmasks (CBMs) which represent cache capacity and indicate the degree of overlap and isolation between classes. Once CAT is configured, the processor allows access to portions of cache according to the established COS. Intel Xeon processor E5 v4 family (and some others) introduce capabilities to configure and make use of the CAT mechanism on the L3 cache. Intel Goldmont processor provides support for control over the L2 cache.

Code and Data Prioritization (CDP) Technology is an extension of CAT. CDP enables isolation and separate prioritization of code and data fetches to the L3 cache in a SW configurable manner, which can enable workload prioritization and tuning of cache capacity to the characteristics of the workload. CDP extends CAT by providing separate code and data masks per Class of Service (COS). When SW configures to enable CDP, L3 CAT is disabled.

4 User details

5 Technical details

L3 CAT/CDP and L2 CAT are all members of Intel PSR features, they share the base PSR infrastructure in Xen.

5.1 Hardware perspective

CAT/CDP defines a range of MSRs to assign different cache access patterns which are known as CBMs, each CBM is associated with a COS.

E.g. L2 CAT:

   IA32_PQR_ASSOC       | MSR (per socket)           |    Address     |
 +----+---+-------+     +----------------------------+----------------+
 |    |COS|       |     | IA32_L2_QOS_MASK_0         |     0xD10      |
 +----+---+-------+     +----------------------------+----------------+
        +-------------> | ...                        |  ...           |
                        | IA32_L2_QOS_MASK_n         | 0xD10+n (n<64) |

L3 CAT/CDP uses a range of MSRs from 0xC90 ~ 0xC90+n (n<128).

L2 CAT uses a range of MSRs from 0xD10 ~ 0xD10+n (n<64), following the L3 CAT/CDP MSRs, setting different L2 cache accessing patterns from L3 cache is supported.

Every MSR stores a CBM value. A capacity bitmask (CBM) provides a hint to the hardware indicating the cache space a domain should be limited to as well as providing an indication of overlap and isolation in the CAT-capable cache from other domains contending for the cache.

Sample cache capacity bitmasks for a bitlength of 8 are shown below. Please note that all (and only) contiguous ‘1’ combinations are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).

       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
  COS0 | A  | A  | A  | A  | A  | A  | A  | A  | Default Bitmask
  COS1 | A  | A  | A  | A  | A  | A  | A  | A  |
  COS2 | A  | A  | A  | A  | A  | A  | A  | A  |

       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
  COS0 | A  | A  | A  | A  | A  | A  | A  | A  | Overlapped Bitmask
  COS1 |    |    |    |    | A  | A  | A  | A  |
  COS2 |    |    |    |    |    |    | A  | A  |

       | M7 | M6 | M5 | M4 | M3 | M2 | M1 | M0 |
  COS0 | A  | A  | A  | A  |    |    |    |    | Isolated Bitmask
  COS1 |    |    |    |    | A  | A  |    |    |
  COS2 |    |    |    |    |    |    | A  | A  |

We can get the CBM length through CPUID. The default value of CBM is calculated by (1ull << cbm_len) - 1. That is a fully open bitmask, all ones bitmask. The COS[0] always stores the default value without change.

There is a IA32_PQR_ASSOC register which stores the COS ID of the VCPU. HW enforces cache allocation according to the corresponding CBM.

5.2 The relationship between L3 CAT/CDP and L2 CAT

HW may support all features. By default, CDP is disabled on the processor. If the L3 CAT MSRs are used without enabling CDP, the processor operates in a traditional CAT-only mode. When CDP is enabled:

L2 CAT is independent of L3 CAT/CDP, which means L2 CAT can be enabled while L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are both enabled.

As a requirement, the bits of CBM of CAT/CDP must be continuous.

N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same associate register IA32_PQR_ASSOC, which means one COS is associated with a pair of L2 CAT CBM and L3 CAT/CDP CBM.

Besides, the max COS of L2 CAT may be different from L3 CAT/CDP (or other PSR features in future). In some cases, a domain is permitted to have a COS that is beyond one (or more) of PSR features but within the others. For instance, let’s assume the max COS of L2 CAT is 8 but the max COS of L3 CAT is 16, when a domain is assigned 9 as COS, the L3 CAT CBM associated to COS 9 would be enforced, but for L2 CAT, the HW works as default value is set since COS 9 is beyond the max COS (8) of L2 CAT.

5.3 Design Overview

5.4 Implementation Description

6 Limitations

CAT/CDP can only work on HW which enables it(check by CPUID). So far, there is no HW which enables both L2 CAT and L3 CAT/CDP. But SW implementation has considered such scenario to enable both L2 CAT and L3 CAT/CDP.

7 Testing

We can execute above xl commands to verify L2 CAT and L3 CAT/CDP on different HWs support them.

For example:

root@:~$ xl psr-hwinfo --cat
Cache Allocation Technology (CAT): L2
Socket ID       : 0
Maximum COS     : 3
CBM length      : 8
Default CBM     : 0xff

root@:~$ xl psr-cat-cbm-set -l2 1 0x7f

root@:~$ xl psr-cat-show -l2 1
Socket ID       : 0
Default CBM     : 0xff
   ID                     NAME             CBM
    1                 ubuntu14            0x7f

8 Areas for improvement

A hexadecimal number is used to set/show CBM for a domain now. Although this is convenient to cover overlap/isolated bitmask requirement, it is not user-friendly.

To improve this, the libxl interfaces can be wrapped in libvirt to provide more user-friendly interfaces to user, e.g. a percentage number of the cache to set and show.

9 Known issues


10 References

“INTEL RESOURCE DIRECTOR TECHNOLOGY (INTEL RDT) ALLOCATION FEATURES” Intel 64 and IA-32 Architectures Software Developer Manuals, vol3

11 History

Date Revision Version Notes
2016-08-12 1.0 Xen 4.9 Design document written
2017-02-13 1.7 Xen 4.9 Changes:
1. Modify the design document to cover L3 CAT/CDP and L2 CAT;
2. Fix typos;
3. Amend description of feat_mask to make it clearer;
4. Other minor changes.
2017-02-15 1.8 Xen 4.9 Changes:
1. Add content in ‘Areas for improvement’;
2. Adjust revision number.
2017-03-16 1.9 Xen 4.9 Changes:
1. Add ‘CMT’ in ‘Terminology’;
2. Change ‘feature list’ to ‘feature array’.
3. Modify data structure descriptions.
4. Adjust revision number.
2017-05-03 1.11 Xen 4.9 Changes:
1. Modify data structure descriptions.
2. Adjust revision number.
2017-07-13 1.14 Xen 4.10 Changes:
1. Fix a typo.
2017-08-01 1.15 Xen 4.10 Changes:
1. Add ‘alt_type’ in ‘feat_props’ structure.
2017-08-04 1.16 Xen 4.10 Changes:
1. Remove special character which may cause html creation failure.
2018-07-10 1.17 Xen 4.12 Changes:
1. Reformat complete document to enable PDF creation.