Using Intel Advanced Matrix Extensions with Oracle Linux

April 19, 2023 | 3 minute read
Text Size 100%:


The application of deep-learning techniques to such problem domains as natural language processing, image processing, recommendation systems, and AI/machine learning continue to proliferate and consume massive amounts of data in search of (probable) answers. These algorithms are rooted in the processing of matrices of data of very large datasets.

Classic matrix processing algorithm performance is fraught with nested loops and often little real work within. Common optimizations are to unroll the loop(s) entirely to maximize the compute pipeline and avoid stalls or bubbles due to control loop decisions. A drawback to this optimization approach is that each kind of matrix size, data type and operation combination needs a dedicated routine, and likely other fine-tuning, for performing that operation.

The Intel Advanced Matrix Extensions (AMX) instruction set elevates matrix operation peformance even further by providing dedicated matrix processing hardware.

Intel Advanced Matrix Extensions (AMX)

The Intel AMX [1] instruction set extension provides a set of registers in which matrix data is held while an instruction is utilized to process the data within the matrix. The matrix dimensions are programmable, yielding a much more compute efficient mechanism for handling enormous streams of matrix operations.

For a deeper dive into AMX technology itself, please examine the Intel documentation [2,3].

Intel’s Sapphire Rapids is the first processor to feature the AMX extension.

Software Support

Unbreakable Enterprise Kernel

Oracle’s Unbreakable Enterprise Kernel 7 Update 1, includes kernel support for AMX. The support includes both userspace use of AMX as well as virtualization support for AMX.

The kernel does not require an explicit configuration option for AMX. On AMX capable CPUs, the kernel detects the feature at run-time and enables its use.

One can determine if the current CPU is AMX capable via the following:

$ cpuid -1 | grep AMX
AMX-BF16: tile bfloat16 support         = true
AMX-TILE: tile architecture support     = true
AMX-INT8: tile 8-bit integer support    = true
AMX-FP16: FP16 tile operations          = true  

Or alternatively, from the Flags list:

$ lscpu | grep amx

When programming for AMX, be aware that the application must first request and thus be granted access to AMX via:


This is because the matrix registers add about 8KiB (on Sapphire Rapids) to the kernel-managed state for processes. For performance reasons, it is best not to have to save/restore these registers unless actually needed by the process!

Once granted, the configuration and usage of AMX can commence.

AMX enjoys support in the Intel compiler, of course, as well as GNU GCC 11[4] and LLVM 12[5]. Be aware of AMX intrinsic libraries of each compiler that can facilitate usage of AMX without having to write all the low level nuts and bolts to make it work.

For a start, you can look at the kernel selftest for AMX as a way to program AMX directly in tools/testing/selftests/x86/amx.c.

The AMX operations are now being incorporated into software suites, such as oneDNN[6] and libXSMM[7].


Virtualization support for AMX within QEMU is available with Oracle’s QEMU 6.1.1-5. When running on a AMX-capable host, supplying the “-cpu host” option enables AMX features for the guest:

qemu-system-x86_64 -cpu host ...  

and then from within the guest, one can check for the existence of AMX features using the cpuid command as outlined above.


The Intel AMX technology is still emerging, so the following are good places to find more information:

  1. Intel AMX Marketing

  2. Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B 3C 3D and 4
    • In particular, Chapter 18 Programming with Intel Advanced Matrix Extensions
  3. Intel Developer AMX Example

  4. GNU GCC 11 Release Series

  5. LLVM 11 Release Notes

  6. oneAPI Deep Neural Network Library (oneDNN)


Eric DeVolder

Previous Post

Hands-On Training with Luna Labs

Craig McBride | 1 min read

Next Post

New Oracle Linux developer preview releases now available

Simon Coter | 2 min read