Skip to content

Instantly share code, notes, and snippets.

@zboralski
Created March 15, 2026 20:41
Show Gist options
  • Select an option

  • Save zboralski/524d292e8c1fa5cb64500a85874a333b to your computer and use it in GitHub Desktop.

Select an option

Save zboralski/524d292e8c1fa5cb64500a85874a333b to your computer and use it in GitHub Desktop.
New Apple GPU Targets in Xcode 26.4 Beta 3 Metal Toolchain

New Apple GPU Targets in Xcode 26.4 Beta 3 Metal Toolchain

Date: 2026-03-13 Toolchain cryptex: v17.5.5179.6 (dated 2026-03-03) AIR-NT version: 32023.883 (metalfe-32023.883)

Overview

The Metal toolchain shipped with Xcode 26.4 beta 3 adds new Apple GPU architecture targets to the AIRNT offline compiler plugins. These targets were previously absent from all AIRNT plugin variants, blocking offline cross-compilation for M5 Pro and A19 Pro GPUs.

This update removes the last architecture gaps in the current Apple GPU lineup.

New GPU Targets

The main libapplegpu-nt.dylib plugin now reports 19 architectures via -archs:

applegpu_g13d  applegpu_g13g  applegpu_g13p  applegpu_g13s
applegpu_g14d  applegpu_g14g  applegpu_g14p  applegpu_g14s
applegpu_g15d  applegpu_g15g  applegpu_g15p  applegpu_g15s
applegpu_g16g  applegpu_g16p  applegpu_g16s
applegpu_g17g  applegpu_g17p  applegpu_g17s   ← NEW
applegpu_g18p                                  ← NEW

New target mapping:

Target Product Generation
applegpu_g17s M5 Pro / M5 Max G17
applegpu_g18p A19 Pro G18

Previously known targets (still present):

Target Product Generation
applegpu_g17g M5 G17
applegpu_g17p A18 Pro G17
applegpu_g16g M4 G16
applegpu_g16s M4 Pro / M4 Max G16
applegpu_g16p A18 G16
applegpu_g15g M3 G15

Why This Matters

Before this update, cross-architecture shader compilation was blocked for:

Target Before After
M5 Pro / M5 Max (g17s) Not in any AIRNT plugin Supported (main plugin)
A19 Pro (g18p) Not in any AIRNT plugin Supported (main plugin)

This enables:

  • Cross-arch ISA diffs covering M3 through A19 Pro (8 targets, 4 GPU generations)
  • Instruction-level comparison between M5 (runtime) and M5 Pro (offline) to confirm ISA equivalence
  • First look at G18 codegen without needing physical A19 Pro hardware

Plugin Size Changes

The main plugin shrank 37% while gaining new targets:

Plugin v17.3 (Xcode 26.3) v17.5 (beta 3) Delta
libapplegpu-nt.dylib 168 MB 106 MB −37%
libapplegpu23-nt.dylib 131 MB 129 MB −1.5%
libapplegpu24-nt.dylib 171 MB 169 MB −1.2%
libapplegpuG9G12-nt.dylib 182 MB 181 MB −0.5%

Metal 4.0 Standard

-std=metal4.0 is now accepted by the Metal frontend:

xcrun metal -std=metal4.0 -c shader.metal -o shader.air

New Metallib Files

37 metallib files ship in the applegpu-nt/ directory, including:

  • tex_atomic_emu_g17.metallib — texture atomics emulation for G17
  • tensor.metallib — tensor operations support library
  • vft_rt_gen1_agx3.metallib — AGX3-specific ray-tracing variant
  • ei_rt_g16p_*.metallib — ray-tracing for G16 phone (A18)
  • runtime.gen15.metallib — G15 runtime support

Ray-tracing coverage now spans G13 through G16P with per-stepping variants (a0/b0/c0).

How to Compile for the New Targets

The new targets live in the main libapplegpu-nt.dylib, not the versioned plugins. They require a macOS 26 AIR triple:

# 1. Compile Metal source to AIR v2.5
xcrun metal -std=macos-metal2.4 -mmacosx-version-min=13.0 \
    -c -o shader.air shader.metal

# 2. Disassemble to LLVM IR
air-opt -S shader.air -o shader.ll

# 3. Patch triple to macOS 26
sed -i '' 's/macosx13.0.0/macosx26.0.0/g' shader.ll

# 4. Reassemble
air-opt shader.ll -o shader_patched.air

# 5. Link (use macos 13.0 platform_version to keep AIR v2.5 container)
air-lld -arch air64_v25 \
    -platform_version macos 13.0 13.0 \
    -o shader.metallib shader_patched.air

# 6. Compile with AIRNT main plugin
applegpu-nt \
    -load libMTLPasses.dylib \
    -load libapplegpu-nt.dylib \
    -force-legacy-2024-arch \
    -arch applegpu_g17s \
    -N pipeline.mtlp-json \
    shader.metallib \
    -o output.metallibpackage

The pipeline script (pipeline.mtlp-json) is documented in man(5) metal-pipelines-script:

{"pipelines":{"compute_pipelines":[{"compute_function":"kernel_name"}]}}

Method comparison

Three AIRNT methods cover the full target space:

Method Targets Plugin Triple patch
airnt M3, M4, A18 (G15/G16) libapplegpu23-nt.dylib none needed
airnt_v24 M4Pro, A18Pro (G16s/G17p) libapplegpu24-nt.dylib macosx15.0.0
airnt_main M5Pro, A19Pro (G17s/G18p) libapplegpu-nt.dylib macosx26.0.0

All three require -force-legacy-2024-arch (v24 and main) or work directly (v23). M5 (g17g) uses gpu_compile at runtime on the local GPU and does not need AIRNT.

Cross-Architecture ISA Results

Instruction counts for cos_op across all 8 architectures:

M3       40 real   (G15)   via AIRNT offline   (applegpu23-nt)
M4       38 real   (G16)   via AIRNT offline   (applegpu23-nt)
M4Pro    38 real   (G16)   via AIRNT offline   (applegpu24-nt)
A18      38 real   (G16)   via AIRNT offline   (applegpu23-nt)
A18Pro   38 real   (G17)   via AIRNT offline   (applegpu24-nt)
M5       37 real   (G17)   via gpu_compile     (local M5 hardware)
M5Pro    37 real   (G17)   via AIRNT offline   (applegpu-nt main)
A19Pro   37 real   (G18)   via AIRNT offline   (applegpu-nt main)

M5, M5Pro, and A19Pro produce byte-identical ISA. This is not a local-GPU artifact — M5 compiles via runtime gpu_compile on physical M5 hardware, while M5Pro and A19Pro compile via the AIRNT offline compiler with no GPU involvement. Three independent compilation paths converge to the same binary. The G17/G18 compiler backend is unified.

Additional kernels:

Kernel M3 (G15) M4 (G16) M5 (G17) A19Pro (G18)
cos_op 40 38 37 37
exp_op 18 16 14 14
oracle_half_sin 31 29 28 28
int_heavy 81 80 89 89

The int_heavy result is notable: G17/G18 emits more instructions than G15/G16 for integer-heavy workloads, suggesting different scheduling or instruction selection — not a simple "newer = fewer instructions" relationship.

Cryptex History

Three Metal toolchain cryptex mounts present on this machine:

Version Date Notes
v17.3.7003.10 2026-02-17 Xcode 26.3 stable
v17.5.5170.4 2026-02-16 Beta 2
v17.5.5179.6 2026-03-03 Beta 3 (current)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment