Skip to content

Instantly share code, notes, and snippets.

@runlevel5
Created March 27, 2026 02:07
Show Gist options
  • Select an option

  • Save runlevel5/8f7199a4654339840e94872ba987b508 to your computer and use it in GitHub Desktop.

Select an option

Save runlevel5/8f7199a4654339840e94872ba987b508 to your computer and use it in GitHub Desktop.
NaCl SFI Design for ppc64le

NaCl SFI Design for ppc64le

Overview

Google Native Client (NaCl) uses Software Fault Isolation (SFI) to sandbox untrusted native code. Each supported architecture has a unique SFI implementation. The closest existing analogy for ppc64le is the MIPS 32-bit port: both use fixed-length instructions, pure software isolation (no hardware segments), and dedicated mask registers.

NaCl currently supports: x86-32 (hardware segments), x86-64 (base register + zero-extension), ARM 32-bit (bic masking), and MIPS 32-bit (mask registers). No PPC support has ever existed.

Common SFI Invariants (All Architectures)

  1. No unsafe instructions (syscalls, privileged ops forbidden)
  2. All indirect control flow sandboxed via mask-and-jump pseudo-instructions
  3. All memory accesses sandboxed (masked or otherwise constrained)
  4. Fixed-size instruction bundles; pseudo-instructions cannot straddle boundaries
  5. Guard regions catch off-by-displacement accesses near sandbox boundaries
  6. Trampolines provide the only controlled exit from untrusted to trusted code

Proposed SFI Parameters

Parameter Value Rationale
Bundle size 16 bytes (4 instructions) NACL_BLOCK_SHIFT=4. Fixed 4-byte POWER instructions. Matches ARM/MIPS. Longest pseudo-instruction (3 insns = 12 bytes) fits.
Sandbox size 1 GiB NACL_MAX_ADDR_BITS=30. Matches ARM/MIPS model. Simpler than x86-64's 4 GiB + guard region approach.
Halt instruction trap = 0x7FE00008 Unconditional trap. Raises SIGTRAP, caught by NaCl signal handler. NACL_HALT_LEN=4.
NOP instruction ori 0,0,0 = 0x60000000 Standard POWER NOP encoding.
Control flow mask 0x0FFFFFF0 256 MB code region, 16-byte aligned (clears upper bits + bottom 4 bits).
Data flow mask 0x3FFFFFFF 1 GiB data region.
Reserved registers r13, r14, r15 r13 = thread context pointer, r14 = control flow mask, r15 = data flow mask.
Upper guard 128 KB Covers DS-form displacement (signed 16-bit << 2 = ±128 KB with 4-byte granularity).
Stack alignment 16 bytes ELFv2 ABI requirement. NACL_STACK_ALIGN_MASK = ~0xF.
Red zone 288 bytes ELFv2 ABI. NACL_STACK_RED_ZONE = 288.
Data segment start 0x10000000 (256 MB) Same as MIPS. Code region: 0–256 MB; data region: 256 MB–1 GiB.
Trampoline region 0x100000x1FFFF (64 KB) Standard NaCl trampoline area.
NACL_TRAMPRET_FIX -8 TBD based on detailed ABI analysis.
NACL_USERRET_FIX -4 TBD.
NACL_SYSARGS_FIX 0 TBD.

Pseudo-Instructions

Indirect Branch Sandboxing

# nacljmp rX — sandbox indirect branch via CTR
and     rX, rX, r14        # Mask to 256 MB code region + 16-byte alignment
mtctr   rX                  # Move to Count Register
bctr                        # Branch to CTR
# Total: 3 instructions = 12 bytes (fits 16-byte bundle with 1 NOP pad)

Indirect Branch via LR (Function Returns)

# naclret — sandbox return via LR
mflr    rX                  # Move LR to GPR
and     rX, rX, r14        # Mask to code region + alignment
mtctr   rX                  # Move to CTR (not LR, to use bctr)
bctr                        # Branch to CTR
# Total: 4 instructions = 16 bytes (exactly fills a 16-byte bundle)

Note: Returns use CTR instead of LR to avoid an unsandboxed blr. The validator must ensure all returns go through this sequence.

Data Access Sandboxing

# Before any load/store with explicit address register
and     rX, rX, r15        # Mask to 1 GiB data region
lwz     rY, offset(rX)     # Load word (D-form, ±32 KB displacement)

For loads/stores with small immediate displacement off a known-safe register (SP/r1), the guard region (128 KB) exceeds the maximum displacement, so no masking is needed.

Stack Pointer Modification

# After arbitrary SP changes
and     r1, r1, r15        # Mask SP to data region

Small immediate adjustments to SP (e.g., addi r1, r1, -frame_size) don't need masking if the adjustment is within the guard region size.


Forbidden Instructions

The validator must reject all of the following:

Category Instructions
System calls sc, scv (POWER9+)
Privileged returns rfid, rfscv, rfebb
MSR access mtmsr, mtmsrd, mfmsr
TLB/SLB management tlbie, tlbiel, tlbia, slbie, slbia, slbmte, slbmfee, slbmfev
Privileged SPR writes mtspr to SRR0, SRR1, DAR, DSISR, SPRG0-3, and other supervisor SPRs
Privileged SPR reads mfspr from privileged SPRs (info leak prevention)
Hypervisor hrfid, and all hypervisor-privileged instructions
Debug/attention attn
Reserved register writes Any instruction that modifies r13, r14, or r15 (except through approved pseudo-instruction sequences)
Prefixed instructions All POWER10 8-byte prefixed instructions (pld, pstd, paddi, etc.) — 34-bit displacements (±8 GiB) exceed guard regions
Unsandboxed branches blr, bctr, bctrl, blrl not preceded by the required masking sequence within the same bundle

Trampoline Design

Each trampoline slot is 32 bytes (NACL_SYSCALL_BLOCK_SHIFT=5), located at 0x10000 + (syscall_num * 32):

# Trampoline slot (8 instructions, 32 bytes)
# Located at 0x10000 + (syscall_num * 32)
li      r0, SYSCALL_NUM             # Load syscall number into r0
lis     r12, HI(NaClSyscallSeg)     # Load upper half of target (PATCHED at load time)
ori     r12, r12, LO(NaClSyscallSeg) # Load lower half (PATCHED at load time)
mtctr   r12
bctr                                 # Jump to syscall handler
nop                                  # Pad to 32 bytes
nop
nop

The lis/ori pair is patched at load time with the actual address of NaClSyscallSeg. This is the same pattern used by the MIPS port.


NaClThreadContext

struct NaClThreadContext {
    /* Callee-saved GPRs (ELFv2: r14-r31, but r14/r15 are mask regs) */
    uint64_t r16, r17, r18, r19, r20, r21, r22, r23;
    uint64_t r24, r25, r26, r27, r28, r29, r30, r31;

    /* Special registers */
    uint64_t r1;            /* stack pointer */
    uint64_t r2;            /* TOC pointer */
    uint64_t prog_ctr;      /* resume PC */
    uint64_t lr;            /* link register */
    uint64_t cr;            /* condition register (callee-saved fields) */

    /* NaCl control */
    uint64_t sysret;
    uint64_t new_prog_ctr;
    uint64_t trusted_stack_ptr;
    uint32_t tls_idx;
    uint64_t tls_value1;
    uint64_t tls_value2;
    uint64_t guard_token;
    uint64_t syscall_routine;

    /* FPU state (callee-saved: f14-f31) */
    double   f14, f15, f16, f17, f18, f19, f20, f21;
    double   f22, f23, f24, f25, f26, f27, f28, f29, f30, f31;

    /* VSX/AltiVec state (callee-saved: v20-v31) */
    __uint128_t v20, v21, v22, v23, v24, v25, v26, v27;
    __uint128_t v28, v29, v30, v31;
};

r14 and r15 are not saved — they always hold the mask constants and are reloaded by NaClSwitch.


NaClSyscallSeg (Untrusted to Trusted Transition)

NaClSyscallSeg:
    # Validate r13 via guard_token
    ld      r12, GUARD_TOKEN_OFFSET(r13)
    cmpdi   r12, EXPECTED_GUARD_VALUE
    bne     abort

    # Save untrusted callee-saved GPRs into NaClThreadContext
    std     r16, R16_OFFSET(r13)
    std     r17, R17_OFFSET(r13)
    # ... r18-r31 ...
    std     r31, R31_OFFSET(r13)
    std     r1,  SP_OFFSET(r13)
    std     r2,  TOC_OFFSET(r13)
    mflr    r12
    std     r12, LR_OFFSET(r13)
    mfcr    r12
    std     r12, CR_OFFSET(r13)

    # Save callee-saved FPRs
    stfd    f14, F14_OFFSET(r13)
    # ... f15-f31 ...
    stfd    f31, F31_OFFSET(r13)

    # Save callee-saved VRs (v20-v31) if AltiVec/VSX is used
    # stvx    v20, 0, rSCRATCH  (with appropriate addressing)
    # ...

    # Switch to trusted stack
    ld      r1, TRUSTED_STACK_OFFSET(r13)

    # Load trusted TOC
    ld      r2, TRUSTED_TOC_ADDR    # From a known fixed location

    # Call NaClSyscallCSegHook(natp)
    mr      r3, r13                 # First argument = thread context pointer
    bl      NaClSyscallCSegHook
    nop                             # TOC restore nop (ELFv2 convention)

NaClSwitch (Trusted to Untrusted Transition)

NaClSwitch:
    # r3 = pointer to NaClThreadContext

    # Clear caller-saved FPRs to prevent info leaks (f0-f13)
    xxlxor  vs0, vs0, vs0          # Clear f0/v0
    # ... clear f1-f13, v0-v19 ...

    # Restore callee-saved GPRs
    ld      r16, R16_OFFSET(r3)
    # ... r17-r31 ...
    ld      r31, R31_OFFSET(r3)
    ld      r1,  SP_OFFSET(r3)
    ld      r2,  TOC_OFFSET(r3)
    ld      r12, LR_OFFSET(r3)
    mtlr    r12
    ld      r12, CR_OFFSET(r3)
    mtcr    r12

    # Restore callee-saved FPRs
    lfd     f14, F14_OFFSET(r3)
    # ... f15-f31 ...
    lfd     f31, F31_OFFSET(r3)

    # Restore callee-saved VRs (v20-v31)
    # lvx     v20, 0, rSCRATCH
    # ...

    # Load mask registers (constants)
    lis     r14, 0x0FFF
    ori     r14, r14, 0xFFF0       # r14 = 0x0FFFFFF0 (control flow mask)
    lis     r15, 0x3FFF
    ori     r15, r15, 0xFFFF       # r15 = 0x3FFFFFFF (data flow mask)

    # Set thread context pointer
    mr      r13, r3

    # Jump to untrusted code
    ld      r12, NEW_PROG_CTR_OFFSET(r3)
    mtctr   r12
    bctr

Open Issues

1. Dual Indirect Branch Targets (CTR and LR)

POWER has two indirect branch mechanisms: bctr (Branch to Count Register) and blr (Branch to Link Register). Both must be sandboxed. The proposed design routes all indirect branches through CTR, including returns (which normally use blr). This means the compiler must emit a 4-instruction return sequence (mflr + and + mtctr + bctr) instead of a single blr. The validator must reject any bare blr instruction.

2. TOC Pointer (r2) Handling

ELFv2 ABI uses r2 as the TOC (Table of Contents) pointer for position-independent code. Untrusted NaCl code needs its own TOC for function calls and global variable access. r2 must be:

  • Saved/restored at sandbox transitions (included in NaClThreadContext)
  • Allowed to be modified by untrusted code (it's not a reserved register)
  • The trusted runtime must use a separate TOC, loaded from a known location during NaClSyscallSeg

TOC-relative loads (ld rX, offset(r2)) are safe as long as r2 points within the sandbox and the displacement is within the guard region.

3. Atomic Operations

POWER uses lwarx/stwcx (load-reserved/store-conditional) and ldarx/stdcx pairs for atomics. These pairs must:

  • Have their memory operands sandboxed (masked with r15)
  • Ideally stay within the same bundle to avoid correctness issues if a context switch occurs between them
  • A lwarx + and + stwcx sequence is 3 instructions (12 bytes), which fits in a 16-byte bundle — but the mask must come before the lwarx, not between the pair

Proposed pattern:

and     rADDR, rADDR, r15   # Mask address
lwarx   rDATA, 0, rADDR     # Load-reserved
stwcx.  rNEW, 0, rADDR      # Store-conditional
# 3 instructions = 12 bytes, fits in 16-byte bundle

CAS (compare-and-swap) loops that include a branch will span multiple bundles, which is fine — only the lwarx/stwcx pair and its address masking need to be in the same bundle.

4. POWER10 Prefixed Instructions

POWER10 introduced 8-byte "prefixed" instructions (e.g., pld, pstd, paddi) with 34-bit immediate fields, enabling displacements up to ±8 GiB. These must be forbidden entirely in the initial port:

  • They break the fixed 4-byte instruction assumption
  • 34-bit displacements far exceed any practical guard region
  • The validator would need to handle variable-length instruction decoding

Future work could allow prefixed instructions with validated displacements, but this significantly complicates the validator.

5. Signal Handling

POWER has different signal frame layouts than x86/ARM. The NaCl signal handler that catches sandbox faults (SIGSEGV, SIGTRAP from trap halt instructions) needs to:

  • Understand the POWER ucontext_t / mcontext_t structures
  • Extract the faulting PC and register state
  • Determine if the fault is within the sandbox (expected) or indicates a bug

6. Validator Complexity

The POWER ISA is large. The instruction encoding uses a 6-bit primary opcode with extended opcodes in varying bit positions across many instruction formats (I, B, SC, D, DS, DQ, DX, X, XL, XFX, XFL, XS, XO, A, M, MD, MDS, VA, VC, VX, etc.). The recommended approach is an allowlist — only permit known-safe instructions — rather than a denylist. Estimated at 5,000–15,000 lines of code, making it the largest component of the port.


Component Inventory and Effort Estimate

Component Estimated Complexity Estimated LOC
Instruction validator Very High 5,000–15,000
sel_ldr arch support (trampoline, syscall, context switch, address space) High 2,000–4,000 (asm+C)
LLVM/Clang NaCl backend (ppc64le-nacl code generation) Very High 3,000–10,000
nacl_config.h additions Low ~50
Daemon CMake integration Low ~30
IRT compilation Medium Build system work
Testing infrastructure High Significant

Total estimated effort: Several engineer-months minimum.


References

  • NaCl SFI paper: "Native Client: A Sandbox for Portable, Untrusted x86 Native Code" (Yee et al., 2009)
  • ARM NaCl SFI: "Adapting Software Fault Isolation to Contemporary CPU Architectures" (Sehr et al., 2010)
  • MIPS NaCl: chromium.googlesource.com/native_client/src/native_client, src/trusted/service_runtime/arch/mips/
  • POWER ISA: "Power ISA Version 3.1" (IBM, 2020)
  • ELFv2 ABI: "64-Bit ELF V2 ABI Specification: Power Architecture" (OpenPOWER Foundation)
  • Daemon NaCl integration: github.com/DaemonEngine/Daemon, src/engine/framework/VirtualMachine.cpp
  • DaemonEngine NaCl fork: github.com/DaemonEngine/native_client
  • Saigo toolchain: github.com/DaemonEngine/DaemonSaigoNativeClientToolkit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment