Skip to content

Instantly share code, notes, and snippets.

@opsec-ee
Last active March 22, 2026 02:26
Show Gist options
  • Select an option

  • Save opsec-ee/735cefd4cf629891bde799c6ba677ae3 to your computer and use it in GitHub Desktop.

Select an option

Save opsec-ee/735cefd4cf629891bde799c6ba677ae3 to your computer and use it in GitHub Desktop.
JSON Polymorphic Dispatch — Graph Coloring Edition (magician ready)
/*
* @file polymorphic-2.c
* @version 2.2.0
* @author H. Overman <opsec.ee@pm.me>
* @brief JSON Polymorphic Dispatch — Graph Coloring Edition
* SoA layout, BTB dealiasing, group-direct loops,
* graph coloring reorder, 4-state gate system.
*
* Copyright (c) 2025 H. Overman <opsec.ee@pm.me>
* CC-BY-NC 4.0 non-commercial license
*
* Euman = ∫(optimization/time) △ performance
*
*
* =================================================================
* CHANGELOG
* =================================================================
*
* v2.2.0 PPO-3 v3.0 compliance pass
* - FRONT/LEAD/REAR contracts on every function (replaces
* type-arrow {{0 [ ... ] 1}} notation — PPO-3 §CONTRACTS)
* - IFP invariant block I1..I7 in causal order (§REINDEX)
* - IFP edge graph in file header (§DISCOVER)
* - CLOCK_MONOTONIC benchmarking replaces __rdtsc
* (§BENCHMARK CLOCK DISCIPLINE)
* - All constants derived with provenance (§THRESHOLDS)
* - static_assert on every compile-time structural invariant
* (§STATIC VERIFICATION)
* - Self-tests at startup with concrete inputs/outputs
* (§SELF-TESTS AT STARTUP)
* - Emergent statistics validated against theory
* (§EMERGENT STATISTICS)
* - p[] storage justified per §EMERGENCE vs IMPOSITION
* - Build flags documented per §BUILD HYGIENE
* - Triple check sequence applied (§TRIPLE CHECK)
* - 144,000 ratio exception documented (§THRESHOLDS)
* - Phase ordering documented and verified (§PHASE ORDERING)
*
* v6.5.1 NULL-001 UAF fix — cached store->count into local
* before free-path branch (CERT MEM01-C, CWE-416)
*
* v6.5.0 Graph coloring reorder, SoA layout, BTB dealiasing,
* group-direct loops, madvise(MADV_SEQUENTIAL),
* branchless parse_number, FMA-style P3 accumulation,
* depth-1 object boundary fix, software prefetch,
* __builtin_expect on hot paths, 4-state gate system.
*
*
* =================================================================
* IFP INVARIANTS (causal order)
* =================================================================
*
* I1 MMAP_STREAM mmap'd JSON byte stream.
* PROT_READ | MAP_PRIVATE. Immutable source.
* madvise(MADV_SEQUENTIAL) for kernel readahead.
* Annihilated by munmap after I3 is populated.
*
* I2 OBJECT_SPANS Depth-1 '{' ... '}' boundary pointer pairs.
* Single-pass depth tracker, string-escape aware.
* Fallback: depth-1 if no depth-2 objects found.
* Depends: I1
*
* I3 SOA_STORE Structure-of-Arrays: v[count * 7], n[count],
* p[count]. Values flat-packed for cache-line
* streaming. Counts per object. Pattern per
* object (see I4 justification).
* Depends: I2
*
* I4 PATTERN_CLASS p[i] = (n[i] - 1) & 0x3
* Computable from I3.n[]. Stored — not computed
* on access — because:
* (a) graph coloring reorder (I6) permutes
* all three arrays; p[] must be materialized
* for the sort key;
* (b) dispatch inner loop indexes p[idx] on
* every iteration; recomputing (n[idx]-1)&3
* adds a data dependency to the critical path.
* Four patterns: P0(single), P1(pair),
* P2(triple), P3(multi).
* Bit mask & 0x3 guarantees mutual exclusivity
* BY CONSTRUCTION — not by convention.
* Depends: I3
*
* I5 TRANS_MATRIX 4x4 transition count matrix from original order.
* trans[a][b] = count of adjacent pairs where
* p[i-1]==a and p[i]==b.
* Depends: I4
*
* I6 COLOR_ORDER Permutation of {0,1,2,3} minimizing total
* boundary misprediction cost. Heap's algorithm
* (B.R. Heap, "Permutations by Interchanges",
* The Computer Journal 6(3), 1963, pp. 293-4)
* enumerates all 24 permutations. Cost function:
* sum of bidirectional transition counts at each
* adjacent group boundary in the permutation.
* Depends: I5
*
* I7 GROUP_STORE SoA arrays reordered by pattern group per I6.
* Contiguous pattern blocks: intra-group branch
* is direct (backward jump, 100% predictable).
* Misprediction occurs ONLY at K group boundaries
* instead of N iterations.
* Depends: I3, I6
*
*
* =================================================================
* IFP EDGE GRAPH
* =================================================================
*
* I1 ───> I2 ───> I3 ───> I4 ───> I5 ───> I6
* | |
* +────────────────────────> I7
*
* Phase order: PARSE -> CLASSIFY -> MEASURE -> COLOR -> REORDER
* |
* v
* DISPATCH
*
* Barrier: I7 must complete before any dispatch loop begins.
* Seal: I1 annihilated (munmap) after I3 populated.
* I2 annihilated (free) after I3 populated.
* Phase order mirrors IFP edge graph.
*
*
* =================================================================
* TIMING (8-run, CLOCK_MONOTONIC, fill from target hardware)
* =================================================================
*
* Group-Direct (ns/op) Interleaved (ns/op)
* Run 1: 5.31 4.12
* Run 2: 5.15 4.19
* Run 3: 5.98 4.17
* Run 4: 5.01 4.13
* Run 5: 5.23 4.12
* Run 6: 5.04 4.15
* Run 7: 4.79 4.13
* Run 8: 5.00 4.18
* Min: 4.79 4.12
* Max: 5.98 4.19
* Mean: 5.19 4.15
* Spread: 1.19 0.07
*
* Hardware: Arch Linux, gcc (GCC) 14.x, -O3 -march=native -flto
* Dataset: mixed.json, 16 objects, 10M iterations
*
* Note: interleaved spread 0.07 ns vs group-direct 1.19 ns.
* Group-direct's iteration distribution (integer division per
* group) creates uneven loop trip counts susceptible to
* scheduler jitter. Interleaved's single uniform loop is
* effectively immune. The BTB prediction advantage of
* group-direct (direct branch within groups) is negated
* on this 16-object dataset — the working set fits entirely
* in L1, so memory latency hiding (the primary benefit of
* group-direct at scale) is not exercised.
*
* Fill procedure:
* 1. ASan clean first (§BUILD HYGIENE).
* 2. Build with release flags (see below).
* 3. Pin to isolated core: taskset -c 3 ./polymorphic-2 mixed.json
* 4. Run 8 times. Record ns/dispatch from CLOCK_MONOTONIC output.
* 5. Compute Min/Max/Mean/Spread. Fill above.
*
*
* =================================================================
* ACKNOWLEDGMENTS
* =================================================================
*
* - B.R. Heap, "Permutations by Interchanges", The Computer
* Journal 6(3), 1963 — Heap's algorithm for permutation
* enumeration in graph_color_reorder().
* - Intel 64 and IA-32 Architectures Software Developer's Manual,
* Vol. 3A, §11.5.3 — 64-byte cache line size on x86-64.
* - CERT MEM01-C — "Do not use freed memory." Applied in main()
* UAF fix (store=NULL poison after objstore_free).
* - CWE-416 — Use After Free. Remediated in v6.5.1.
* - NIST 800-53 SI-16 — Memory Protection.
*
*
* =================================================================
* BUILD FLAGS
* =================================================================
*
* This codebase uses GNU C extensions: __builtin_prefetch,
* __builtin_expect, __asm__ volatile (x86-64 "x" constraint),
* computed goto (Labels as Values), aligned_alloc.
* ISO C23 (-std=c2x) with -Wpedantic rejects these.
* Use -std=gnu2x to enable GNU extensions.
*
* Release:
* gcc -std=gnu2x -O3 -march=native -flto -funroll-loops \
* -DNDEBUG -Wall -Wextra -Wpedantic \
* -o polymorphic-2 polymorphic-2.c -lrt
*
* Computed-goto pedantic warnings suppressed locally via
* #pragma GCC diagnostic in dispatch_interleaved().
* The pragma documents the exception at the exact site.
* -lrt: required for clock_gettime on older glibc.
*
* ASan:
* gcc -std=gnu2x -fsanitize=address,undefined -O1 -g \
* -Wall -Wextra -Wpedantic \
* -o polymorphic-2_asan polymorphic-2.c -lrt
*
* ASan clean before any performance measurement.
* Measuring broken code is decimal arithmetic on a hex problem.
*
* Clock skew after file transfer: find . | xargs touch
*
*
* =================================================================
* 144,000 RATIO EXCEPTION
* =================================================================
*
* This program parses JSON numbers (IEEE 754 by JSON spec, RFC 8259)
* and executes synthetic arithmetic workloads to benchmark dispatch
* overhead. The benchmark constants (SCALE_P0, DECAY_P3, STEP_X)
* are design parameters for the synthetic workload — precision is
* not the invariant under test. Dispatch pattern prediction is.
*
* Ratio arithmetic is not applied to the synthetic workload.
* The dispatch overhead measurement is independent of value
* precision. The constants are documented with their exact
* IEEE 754 representation where inexact (DECAY_P3 = 0.9).
*
* Timing arithmetic uses integer nanoseconds from CLOCK_MONOTONIC.
* No IEEE 754 in the measurement path.
*
* =================================================================
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <stdbool.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <time.h>
#include <math.h>
/*
==================================================================
* STRUCTURAL CONSTANTS — every constant derived, not asserted
==================================================================
*/
/*
* VALUES_PER_OBJ = 7
*
* Maximum numeric values extracted per JSON object.
* Derivation: 7 doubles * 8 bytes = 56 bytes.
* 56 < 64 (x86-64 L1 cache line, Intel SDM Vol. 3A §11.5.3).
* One object's values fit in a single cache line with 8 bytes
* padding. Increasing to 8 would exactly fill the line but
* leave no room for prefetch metadata. 7 is the structural
* maximum for single-line streaming.
*
* Spot-check:
* 7 * sizeof(double) = 7 * 8 = 56 (< 64) correct
* 8 * sizeof(double) = 8 * 8 = 64 (== 64) no headroom
*/
#define VALUES_PER_OBJ 7
/*
* CACHE_LINE_BYTES = 64
*
* x86-64 L1 data cache line size.
* Source: Intel SDM Vol. 3A §11.5.3, "Cache Line Size = 64 Bytes."
* Also: AMD64 Architecture Programmer's Manual Vol. 2, §7.6.1.
* Used for aligned_alloc alignment of SoA value array.
*/
#define CACHE_LINE_BYTES 64
/*
* PATTERN_COUNT = 4
*
* Number of dispatch patterns. Derivation:
* p[i] = (n[i] - 1) & 0x3
* 0x3 = 0b11 => 4 possible values: {0, 1, 2, 3}
* Bit mask guarantees exactly 4 patterns by construction.
* Any n >= 1 maps to one of {P0, P1, P2, P3}.
*
* Spot-check:
* n=1: (1-1) & 3 = 0 -> P0 correct
* n=2: (2-1) & 3 = 1 -> P1 correct
* n=4: (4-1) & 3 = 3 -> P3 correct
* n=5: (5-1) & 3 = 0 -> P0 wraps, correct by mask
*/
#define PATTERN_COUNT 4
/*
* PATTERN_MASK = 0x3
*
* Bit mask for pattern classification.
* PATTERN_MASK = PATTERN_COUNT - 1.
* Works because PATTERN_COUNT is a power of 2.
* 4 = 2^2, 4 - 1 = 3 = 0b11
*/
#define PATTERN_MASK 0x3
/*
* PREFETCH_AHEAD = 8
*
* Software prefetch distance in objects.
* Derivation:
* Each object's values: VALUES_PER_OBJ * 8 = 56 bytes.
* Prefetch distance: 8 * 56 = 448 bytes ~ 7 cache lines.
* Typical L2 miss latency: ~200 cycles (Skylake).
* Typical dispatch iteration: ~10-25 cycles.
* 8 iterations * ~15 cycles/iter = ~120 cycles lookahead.
* This covers most of the L2 miss penalty.
* Tunable: profile on target hardware and adjust.
*
* Spot-check:
* 8 * 56 = 448 bytes correct
* 448 / 64 = 7 cache lines correct
* 7 lines at 64B/line = 448B round-trip check
*/
#define PREFETCH_AHEAD 8
/*
* SCALE_P0 = 1000.0
*
* Synthetic workload scaling factor for P0 (single-value) pattern.
* Design parameter: scales the single extracted value to a range
* comparable to multi-value patterns for benchmark fairness.
* 1000.0 is exact in IEEE 754 (integer, fits in 52-bit mantissa).
* No precision loss.
*/
#define SCALE_P0 1000.0
/*
* DECAY_P3 = 0.9
*
* Synthetic workload decay factor for P3 (multi-value) chain.
* Design parameter: models diminishing contribution of successive
* values in a multiply-accumulate chain.
*
* IEEE 754 representation: INEXACT.
* 0.9 in double: 0.90000000000000002220446049250313080847...
* Exact ratio: 9/10 = 129600/144000
* The inexactness does not affect benchmark validity because
* dispatch overhead — not arithmetic precision — is measured.
*/
#define DECAY_P3 0.9
/*
* STEP_X = 1.5
*
* Iteration step factor: x = iteration * STEP_X.
* Design parameter: generates a monotonically increasing sequence
* of work values. 1.5 is exact in IEEE 754 (1 + 1/2, representable
* as 0x3FF8000000000000). No precision loss.
*/
#define STEP_X 1.5
/*
* INITIAL_SPAN_CAP = 1024
*
* Initial allocation capacity for object span array.
* Power of 2 for efficient realloc doubling.
* 1024 * sizeof(Span) = 1024 * 16 = 16384 bytes = 256 cache lines.
* Fits comfortably in L2 (typical 256KB+).
*/
#define INITIAL_SPAN_CAP 1024
/*
* WARMUP_ITERS = 10000
*
* Cache warmup iteration count. Walks the value array once
* via prefetch to pull data into L1/L2 before timed dispatch.
* 10000 iterations at 56 bytes/object = 560KB touched.
* Sufficient to warm L2 for datasets up to ~500K objects.
*/
#define WARMUP_ITERS 10000
/*
* HEAP_PERM_COUNT = 24
*
* Total permutations of PATTERN_COUNT elements.
* 4! = 24. Heap's algorithm enumerates exactly this many.
* Used as a sanity bound — not in the code path, but documents
* why full enumeration is feasible for K=4.
*
* Derivation (Python):
* import math; math.factorial(4) # 24
*/
#define HEAP_PERM_COUNT 24
/*
==================================================================
* STATIC VERIFICATION — compile-time structural invariants
*
* If it can be checked at compile time, check it at compile time.
* A runtime check on a structural invariant is a notation failure.
==================================================================
*/
static_assert(sizeof(double) == 8,
"Type-punning in parse_number requires 8-byte double (IEEE 754)");
static_assert(VALUES_PER_OBJ == 7,
"Values-per-object must be 7 for single-cache-line streaming");
static_assert(VALUES_PER_OBJ * sizeof(double) < CACHE_LINE_BYTES,
"One object's values must fit within a single cache line");
static_assert(PATTERN_COUNT == 4,
"Dispatch table has exactly 4 entries");
static_assert((PATTERN_COUNT & (PATTERN_COUNT - 1)) == 0,
"PATTERN_COUNT must be power of 2 for bit-mask classification");
static_assert(PATTERN_MASK == (PATTERN_COUNT - 1),
"PATTERN_MASK must equal PATTERN_COUNT - 1");
static_assert(PREFETCH_AHEAD > 0,
"Prefetch distance must be positive");
static_assert(INITIAL_SPAN_CAP > 0
&& (INITIAL_SPAN_CAP & (INITIAL_SPAN_CAP - 1)) == 0,
"Initial span capacity must be a positive power of 2");
/*
==================================================================
* GATE SYSTEM — 4-state logic (Z/X/0/1)
*
* Z = high-impedance (gate not applicable, skip)
* X = unknown (insufficient info, fail closed)
* 0 = deny (attack/fault detected, block)
* 1 = allow (verified safe, proceed)
*
* Gate states follow from the pivot — they do not precede it.
* Z/X fire at the structure, not at the symptom.
==================================================================
*/
typedef enum {
GATE_Z = 0,
GATE_X = 1,
GATE_0 = 2,
GATE_1 = 3
} gate_state_t;
typedef struct {
void *value;
double dvalue;
gate_state_t state;
const char *reason;
} GateResult;
/*
==================================================================
* Gate check macros
*
* GATE_NULL_CHECK: ptr null -> gate-0, goto label
* GATE_ALLOC_CHECK: allocation null -> gate-0, goto label
* GATE_VALIDATE_STR: C-string null -> gate-0, goto label
*
* These are runtime contracts (can fail with correct code under
* load). They use explicit conditional + goto, not assert().
* assert() is dead in release builds (-DNDEBUG).
* These checks survive release. (PPO-3 §ASSERT IS DEAD)
==================================================================
*/
#define GATE_NULL_CHECK(ptr, label) \
do { \
if (__builtin_expect((ptr) == NULL, 0)) { \
fprintf(stderr, "[GATE-0] NULL-001 %s:%d: " \
"null check failed on '%s'\n", \
__func__, __LINE__, #ptr); \
goto label; \
} \
} while (0)
#define GATE_ALLOC_CHECK(ptr, label) \
do { \
if (__builtin_expect((ptr) == NULL, 0)) { \
fprintf(stderr, "[GATE-0] NULL-002 %s:%d: " \
"allocation failed for '%s'\n", \
__func__, __LINE__, #ptr); \
goto label; \
} \
} while (0)
#define GATE_VALIDATE_STR(ptr, label) \
do { \
if (__builtin_expect((ptr) == NULL, 0)) { \
fprintf(stderr, "[GATE-0] VALIDATE-001 %s:%d: " \
"raw C-string param '%s' is null\n", \
__func__, __LINE__, #ptr); \
goto label; \
} \
} while (0)
#define GATE_RESULT_OK(val) \
(GateResult){ .value = (val), .dvalue = 0.0, \
.state = GATE_1, .reason = "OK" }
#define GATE_RESULT_OK_D(d) \
(GateResult){ .value = NULL, .dvalue = (d), \
.state = GATE_1, .reason = "OK" }
#define GATE_RESULT_FAIL(msg) \
(GateResult){ .value = NULL, .dvalue = 0.0, \
.state = GATE_0, .reason = (msg) }
/*
==================================================================
* OPT-402: SoA (Structure of Arrays) layout
*
* Separate contiguous arrays for values, counts, patterns,
* and group metadata. Values are VALUES_PER_OBJ-wide per object,
* flat-packed for cache-line streaming. Counts and patterns are
* uint8_t arrays — fit in cache easily.
*
* group_start[4]: byte offset where each pattern group begins
* group_count[4]: object count per pattern group
* group_order[4]: optimal traversal order from graph coloring (I6)
==================================================================
*/
typedef struct {
double *v;
uint8_t *n;
uint8_t *p;
size_t count;
size_t group_start[PATTERN_COUNT];
size_t group_count[PATTERN_COUNT];
uint8_t group_order[PATTERN_COUNT];
} ObjStore;
/*
* Compile-time check on group array sizes.
* Must match PATTERN_COUNT to avoid silent buffer overrun
* if PATTERN_COUNT is ever changed.
*/
static_assert(sizeof(((ObjStore *)0)->group_start) ==
PATTERN_COUNT * sizeof(size_t),
"group_start array size must match PATTERN_COUNT");
static_assert(sizeof(((ObjStore *)0)->group_order) ==
PATTERN_COUNT * sizeof(uint8_t),
"group_order array size must match PATTERN_COUNT");
/*
==================================================================
* Object span — pointer pair marking one JSON object boundary
==================================================================
*/
typedef struct {
const char *start;
const char *end;
} Span;
static_assert(sizeof(Span) == 2 * sizeof(const char *),
"Span must be exactly two pointers (no padding)");
/*
==================================================================
* Timing helper — integer nanosecond delta from CLOCK_MONOTONIC
*
* FRONT: Two struct timespec from clock_gettime(CLOCK_MONOTONIC)
* t0 captured before workload, t1 captured after.
* LEAD: Integer subtraction: (sec delta * 1e9) + nsec delta.
* REAR: Total elapsed wall-time nanoseconds as int64_t.
*
* Gate walk:
* Z: n/a (always applicable for timed regions)
* X: n/a (CLOCK_MONOTONIC is monotonic by POSIX guarantee)
* 0: n/a (subtraction cannot fail on valid timespec)
* 1: nanosecond delta returned
==================================================================
*/
static inline int64_t timespec_diff_ns(const struct timespec *t0,
const struct timespec *t1)
{
return (int64_t)(t1->tv_sec - t0->tv_sec) * 1000000000LL
+ (int64_t)(t1->tv_nsec - t0->tv_nsec);
}
/*
==================================================================
* ObjStore cleanup — destructor
*
* FRONT: ObjStore pointer, possibly NULL.
* LEAD: Conditional free of v[], n[], p[], then store itself.
* REAR: All heap memory returned to allocator.
* Caller MUST set pointer to NULL after call
* (CERT MEM01-C: poison after free).
*
* Gate walk:
* Z: store is NULL -> no-op (safe, no memory to free)
* X: n/a (destructor has no ambiguous state)
* 0: n/a (free() has no failure mode per C standard)
* 1: all memory freed
*
* Contract operator: (AS/--\WAS) — lossy.
* Information destroyed: heap contents of v[], n[], p[].
==================================================================
*/
static void objstore_free(ObjStore *store)
{
if (!store) {
return; /* gate-Z: null is safe no-op */
}
free(store->v); /* free(NULL) is safe per C standard */
free(store->n);
free(store->p);
free(store);
}
/*
==================================================================
* Fast JSON number parser — branchless sign (OPT-406)
*
* FRONT: Pointer-to-pointer *s into JSON byte stream.
* *s positioned at whitespace, sign, or digit.
* Caller guarantees *s is within mmap'd region
* (bounds enforced by extract_numbers' p < end check).
*
* LEAD: Sequential digit accumulation.
* Integer part: val = val * 10 + digit.
* Fractional part: val += digit * decreasing power of 0.1.
* Exponent: multiply/divide by 10^exp.
* Sign: branchless bit-OR of sign bit into IEEE 754 double
* via uint64_t type-pun (memcpy, not union — no UB).
*
* REAR: IEEE 754 double in *val*, *s advanced past consumed digits.
* -0.0 is a valid output (harmless for dispatch workload).
*
* Gate walk:
* Z: n/a (always produces a value from digit stream)
* X: n/a (caller validates stream bounds)
* 0: n/a (pure computation, no allocation, no I/O)
* 1: number parsed, pointer advanced
*
* Contract operator: (AS/.\IS) — bijective.
* Domain: valid JSON number byte sequence.
* Codomain: IEEE 754 double (modulo representation limits).
* Round-trip: double -> printf -> parse_number recovers
* the same double for all exactly-representable values.
*
* OPT-406 branchless sign derivation:
* neg = (*p == '-') evaluates to 0 or 1.
* p += neg skips the '-' if present, no branch.
* After accumulation, set sign bit:
* bits |= ((uint64_t)neg << 63)
* When neg==0: OR with 0, no change (positive).
* When neg==1: OR sets bit 63, IEEE 754 sign bit (negative).
* -0.0 = 0x8000000000000000. Harmless: -0.0 == 0.0 in C.
==================================================================
*/
static inline double parse_number(const char **s)
{
const char *p = *s;
double val = 0;
while (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r')
p++;
/* OPT-406: branchless sign extraction */
int neg = (*p == '-');
p += neg;
/* Integer part */
while (__builtin_expect(*p >= '0' && *p <= '9', 1)) {
val = val * 10.0 + (*p - '0');
p++;
}
/* Fractional part */
if (*p == '.') {
p++;
double frac = 0.1;
while (*p >= '0' && *p <= '9') {
val += (*p - '0') * frac;
frac *= 0.1;
p++;
}
}
/* Exponent (cold path — OPT-410) */
if (__builtin_expect(*p == 'e' || *p == 'E', 0)) {
p++;
int exp_neg = (*p == '-');
p += (*p == '-' || *p == '+');
int exp = 0;
while (*p >= '0' && *p <= '9') {
exp = exp * 10 + (*p - '0');
p++;
}
double mult = 1.0;
for (int i = 0; i < exp; i++)
mult *= 10.0;
val = exp_neg ? val / mult : val * mult;
}
*s = p;
/*
* OPT-406: branchless negate via IEEE 754 bit manipulation.
* memcpy for type-punning — no UB (C11 §6.5/7 exception).
* sizeof(double)==8 verified by static_assert above.
*/
uint64_t bits;
memcpy(&bits, &val, 8);
bits |= ((uint64_t)neg << 63);
memcpy(&val, &bits, 8);
return val;
}
/*
==================================================================
* Extract all numbers from a JSON object/array
*
* FRONT: JSON byte range [json, end). Output buffer vals[max].
* All three pointers validated at entry.
*
* LEAD: Linear scan. Skip non-numeric characters.
* On '"', enter string-skip inner loop (handles '\"'
* escapes). On digit/sign/dot, call parse_number.
* Accumulate into vals[0..n-1].
*
* REAR: vals[0..n-1] populated with extracted doubles.
* Return value n = count of numbers found, 0 <= n <= max.
*
* Gate walk:
* Z: n/a (always applicable when called)
* X: n/a (caller provides bounded range from Span)
* 0: json/end/vals is NULL -> return 0 (safe empty result)
* 1: count returned, vals populated
*
* Contract operator: (AS/--\WAS) — lossy.
* String keys, structural tokens, whitespace discarded.
* Only numeric values survive.
==================================================================
*/
static inline int extract_numbers(const char *json, const char *end,
double *vals, int max)
{
if (__builtin_expect(json == NULL || end == NULL ||
vals == NULL, 0)) {
return 0; /* gate-0: invalid input, return empty */
}
int n = 0;
const char *p = json;
while (p < end && n < max) {
while (p < end && !(*p >= '0' && *p <= '9')
&& *p != '-' && *p != '.') {
if (__builtin_expect(*p == '"', 0)) {
p++;
while (p < end && *p != '"') {
if (*p == '\\')
p++;
p++;
}
}
p++;
}
if (p < end && ((*p >= '0' && *p <= '9')
|| *p == '-' || *p == '.')) {
vals[n++] = parse_number(&p);
}
}
return n; /* gate-1 */
}
/*
==================================================================
* OPT-408: Depth-aware object boundary finder
*
* FRONT: mmap'd JSON byte stream data[0..len-1].
* size_t *out_count for returning span count.
* data is immutable (PROT_READ from mmap).
*
* LEAD: Single-pass depth tracker. Maintains depth counter
* and in_string flag. Records Span{start,end} at every
* depth-2 '{' to '}' transition (top-level array of
* objects pattern). Handles '\"' escapes inside strings.
*
* Fallback: if no depth-2 objects found, re-scan for
* depth-1 objects (flat array of objects at root level).
*
* REAR: Heap-allocated Span array, *out_count set.
* Caller owns the array — must free().
* Array is contiguous, suitable for sequential scan.
*
* Gate walk:
* Z: n/a
* X: n/a
* 0: data NULL, out_count NULL, len 0, or alloc failure -> NULL
* 1: spans array returned, *out_count set
*
* Contract operator: (AS/--\WAS) — lossy.
* All non-structural content (values, keys, whitespace) discarded.
* Only object boundary pointers survive.
==================================================================
*/
static Span *find_objects(const char *data, size_t len, size_t *out_count)
{
if (__builtin_expect(data == NULL || out_count == NULL, 0)) {
if (out_count)
*out_count = 0;
return NULL; /* gate-0 */
}
if (__builtin_expect(len == 0, 0)) {
*out_count = 0;
return NULL; /* gate-0: empty input */
}
size_t cap = INITIAL_SPAN_CAP;
Span *spans = malloc(cap * sizeof(Span));
if (__builtin_expect(spans == NULL, 0)) {
*out_count = 0;
return NULL; /* gate-0: allocation failure */
}
size_t n = 0;
int depth = 0;
int in_string = 0;
const char *obj_start = NULL;
for (size_t i = 0; i < len; i++) {
char c = data[i];
if (__builtin_expect(in_string, 0)) {
if (c == '\\') {
i++;
continue;
}
if (c == '"')
in_string = 0;
continue;
}
if (c == '"') {
in_string = 1;
continue;
}
if (c == '{') {
depth++;
/* OPT-408: only record depth-2 objects */
if (depth == 2)
obj_start = &data[i];
} else if (c == '}') {
if (depth == 2 && obj_start) {
if (n >= cap) {
cap *= 2;
Span *tmp = realloc(spans,
cap * sizeof(Span));
if (__builtin_expect(tmp == NULL, 0)) {
free(spans);
*out_count = 0;
return NULL; /* gate-0 */
}
spans = tmp;
}
spans[n].start = obj_start;
spans[n].end = &data[i + 1];
n++;
obj_start = NULL;
}
depth--;
}
}
/* Fallback: depth-1 objects if no depth-2 found */
if (n == 0) {
depth = 0;
in_string = 0;
obj_start = NULL;
for (size_t i = 0; i < len; i++) {
char c = data[i];
if (in_string) {
if (c == '\\') {
i++;
continue;
}
if (c == '"')
in_string = 0;
continue;
}
if (c == '"') {
in_string = 1;
continue;
}
if (c == '{') {
depth++;
if (depth == 1)
obj_start = &data[i];
} else if (c == '}') {
if (depth == 1 && obj_start) {
if (n >= cap) {
cap *= 2;
Span *tmp = realloc(spans,
cap * sizeof(Span));
if (__builtin_expect(tmp == NULL, 0)) {
free(spans);
*out_count = 0;
return NULL;
}
spans = tmp;
}
spans[n].start = obj_start;
spans[n].end = &data[i + 1];
n++;
obj_start = NULL;
}
depth--;
}
}
}
*out_count = n;
return spans; /* gate-1 */
}
/*
==================================================================
* OPT-401: Graph coloring — build transition matrix, determine
* optimal group traversal order, reorder objects by pattern.
*
* FRONT: Populated ObjStore with v[count * VALUES_PER_OBJ],
* n[count], p[count]. count > 0. All arrays non-NULL.
*
* LEAD: 1. Build 4x4 transition matrix from original p[] order.
* trans[a][b] = count of (p[i-1]==a, p[i]==b) adjacencies.
* 2. Enumerate all 24 permutations of {0,1,2,3} using
* Heap's algorithm (B.R. Heap, 1963).
* 3. For each permutation, compute cost = sum of
* bidirectional transition counts at adjacent positions.
* cost += trans[perm[j]][perm[j+1]] + reverse.
* 4. Select permutation with minimum cost.
* 5. Allocate new v[], n[], p[] arrays.
* 6. Copy objects into new arrays sorted by pattern group.
* 7. Swap new arrays into store, free old.
*
* REAR: store->v, store->n, store->p reordered.
* group_start[], group_count[], group_order[] populated.
* Objects within each group are contiguous.
* Intra-group transitions: 0 (all same pattern).
* Inter-group transitions: exactly (non_empty_groups - 1).
*
* Gate walk:
* Z: n/a
* X: n/a
* 0: store NULL, member array NULL, or allocation failure
* during reorder -> arrays unchanged, GATE_0 returned
* 1: reorder complete, GATE_1 returned
*
* Contract operator: (AS/.\IS) — bijective.
* Domain: set of objects {o_0, ..., o_{n-1}}.
* Codomain: same set in different order.
* Permutation preserves all values — no information lost.
* Structural argument: we copy every element exactly once via
* write_idx[] cursors that partition [0, count) without overlap.
==================================================================
*/
static gate_state_t graph_color_reorder(ObjStore *store)
{
GATE_NULL_CHECK(store, gate_fail);
GATE_NULL_CHECK(store->v, gate_fail);
GATE_NULL_CHECK(store->n, gate_fail);
GATE_NULL_CHECK(store->p, gate_fail);
size_t count = store->count;
if (__builtin_expect(count == 0, 0)) {
return GATE_1; /* vacuously ok */
}
/* I5: Build 4x4 transition matrix from original object order */
uint64_t trans[PATTERN_COUNT][PATTERN_COUNT] = {{0}};
for (size_t i = 1; i < count; i++) {
trans[store->p[i - 1]][store->p[i]]++;
}
/* I6: Enumerate all 24 permutations, select minimum cost */
uint8_t best_order[PATTERN_COUNT] = {0, 1, 2, 3};
uint64_t best_cost = UINT64_MAX;
uint8_t perm[PATTERN_COUNT] = {0, 1, 2, 3};
int c[PATTERN_COUNT] = {0, 0, 0, 0};
/*
* Heap's algorithm (B.R. Heap, The Computer Journal, 1963).
*
* Generates all n! permutations by single transpositions.
* For n=4: exactly 24 permutations, 23 swaps.
*
* Cost function: sum of bidirectional transition counts at
* each adjacent boundary in the candidate permutation.
* cost = Σ_{j=0}^{K-2} (trans[perm[j]][perm[j+1]]
* + trans[perm[j+1]][perm[j]])
* Minimizing cost minimizes BTB mispredictions at group
* boundaries.
*/
/* Evaluate initial permutation {0,1,2,3} */
{
uint64_t cost = 0;
for (int j = 0; j < PATTERN_COUNT - 1; j++) {
cost += trans[perm[j]][perm[j + 1]];
cost += trans[perm[j + 1]][perm[j]];
}
if (cost < best_cost) {
best_cost = cost;
memcpy(best_order, perm, PATTERN_COUNT);
}
}
int i = 0;
while (i < PATTERN_COUNT) {
if (c[i] < i) {
if (i % 2 == 0) {
uint8_t tmp = perm[0];
perm[0] = perm[i];
perm[i] = tmp;
} else {
uint8_t tmp = perm[c[i]];
perm[c[i]] = perm[i];
perm[i] = tmp;
}
uint64_t cost = 0;
for (int j = 0; j < PATTERN_COUNT - 1; j++) {
cost += trans[perm[j]][perm[j + 1]];
cost += trans[perm[j + 1]][perm[j]];
}
if (cost < best_cost) {
best_cost = cost;
memcpy(best_order, perm, PATTERN_COUNT);
}
c[i]++;
i = 0;
} else {
c[i] = 0;
i++;
}
}
memcpy(store->group_order, best_order, PATTERN_COUNT);
/* Count per group */
size_t gcnt[PATTERN_COUNT] = {0};
for (size_t j = 0; j < count; j++)
gcnt[store->p[j]]++;
/* Compute group start offsets in optimal order */
size_t offset = 0;
for (int g = 0; g < PATTERN_COUNT; g++) {
uint8_t pat = best_order[g];
store->group_start[g] = offset;
store->group_count[g] = gcnt[pat];
offset += gcnt[pat];
}
/* I7: Reorder — build new arrays sorted by pattern group */
double *new_v = aligned_alloc(CACHE_LINE_BYTES,
count * VALUES_PER_OBJ * sizeof(double));
GATE_ALLOC_CHECK(new_v, gate_fail);
uint8_t *new_n = malloc(count);
if (__builtin_expect(new_n == NULL, 0)) {
free(new_v);
goto gate_fail;
}
uint8_t *new_p = malloc(count);
if (__builtin_expect(new_p == NULL, 0)) {
free(new_v);
free(new_n);
goto gate_fail;
}
size_t write_idx[PATTERN_COUNT];
for (int g = 0; g < PATTERN_COUNT; g++)
write_idx[g] = store->group_start[g];
/* Map pattern -> group index for write cursor lookup */
int pat_to_group[PATTERN_COUNT];
for (int g = 0; g < PATTERN_COUNT; g++)
pat_to_group[best_order[g]] = g;
for (size_t j = 0; j < count; j++) {
int g = pat_to_group[store->p[j]];
size_t wi = write_idx[g]++;
memcpy(&new_v[wi * VALUES_PER_OBJ],
&store->v[j * VALUES_PER_OBJ],
VALUES_PER_OBJ * sizeof(double));
new_n[wi] = store->n[j];
new_p[wi] = store->p[j];
}
/* Swap in new arrays, free old */
free(store->v);
free(store->n);
free(store->p);
store->v = new_v;
store->n = new_n;
store->p = new_p;
printf("Graph coloring order: [P%d, P%d, P%d, P%d]\n",
best_order[0], best_order[1],
best_order[2], best_order[3]);
printf("Boundary cost: %lu (minimized from transition matrix)\n",
best_cost);
printf("Groups: ");
for (int g = 0; g < PATTERN_COUNT; g++) {
printf("[P%d: %zu @ %zu] ", best_order[g],
store->group_count[g], store->group_start[g]);
}
printf("\n");
return GATE_1;
gate_fail:
fprintf(stderr, "[GATE-0] graph_color_reorder: gate denied\n");
return GATE_0;
}
/*
==================================================================
* Parse JSON file into SoA store
* (OPT-402 SoA, OPT-405 madvise, OPT-408 depth-aware)
*
* FRONT: Filesystem path to JSON file (null-terminated C string).
*
* LEAD: Phase sequence (mirrors IFP edge graph):
* 1. open + fstat + mmap (I1: MMAP_STREAM)
* 2. madvise(MADV_SEQUENTIAL) for kernel readahead
* 3. find_objects (I2: OBJECT_SPANS)
* 4. Allocate ObjStore + SoA arrays (I3: SOA_STORE)
* 5. extract_numbers per span + pattern classify (I4)
* 6. Annihilate I2 (free spans), I1 (munmap)
* 7. graph_color_reorder (I5 -> I6 -> I7)
*
* REAR: Fully populated and reordered ObjStore.
* I1 and I2 annihilated — only I7 (GROUP_STORE) survives.
* Caller owns the ObjStore — must objstore_free().
*
* Gate walk:
* Z: n/a
* X: n/a
* 0: filename NULL, open fail, mmap fail, parse fail, alloc fail
* 1: populated ObjStore returned
*
* Contract operator: (AS/--\WAS) — lossy.
* JSON structural tokens, string keys, whitespace discarded.
* Only numeric values and object boundaries survive.
==================================================================
*/
static ObjStore *parse_json_soa(const char *filename)
{
if (__builtin_expect(filename == NULL, 0)) {
fprintf(stderr, "[GATE-0] VALIDATE-001 parse_json_soa: "
"filename is null\n");
return NULL;
}
int fd = open(filename, O_RDONLY);
if (fd < 0)
return NULL;
struct stat st;
fstat(fd, &st);
char *data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (data == MAP_FAILED) {
close(fd);
return NULL;
}
/* OPT-405: hint sequential access for kernel readahead */
madvise(data, st.st_size, MADV_SEQUENTIAL);
/* Phase: I1 (MMAP_STREAM) established. Find I2 (OBJECT_SPANS). */
size_t obj_count;
Span *spans = find_objects(data, st.st_size, &obj_count);
if (!spans || obj_count == 0) {
munmap(data, st.st_size);
close(fd);
return NULL;
}
/* Phase: I2 established. Build I3 (SOA_STORE). */
ObjStore *store = calloc(1, sizeof(ObjStore));
if (__builtin_expect(store == NULL, 0)) {
fprintf(stderr, "[GATE-0] NULL-002 parse_json_soa: "
"ObjStore allocation failed\n");
free(spans);
munmap(data, st.st_size);
close(fd);
return NULL;
}
store->count = obj_count;
store->v = aligned_alloc(CACHE_LINE_BYTES,
obj_count * VALUES_PER_OBJ * sizeof(double));
if (__builtin_expect(store->v == NULL, 0)) {
fprintf(stderr, "[GATE-0] NULL-002 parse_json_soa: "
"values array allocation failed\n");
goto parse_fail;
}
store->n = malloc(obj_count);
if (__builtin_expect(store->n == NULL, 0)) {
fprintf(stderr, "[GATE-0] NULL-002 parse_json_soa: "
"counts array allocation failed\n");
goto parse_fail;
}
store->p = malloc(obj_count);
if (__builtin_expect(store->p == NULL, 0)) {
fprintf(stderr, "[GATE-0] NULL-002 parse_json_soa: "
"patterns array allocation failed\n");
goto parse_fail;
}
memset(store->v, 0, obj_count * VALUES_PER_OBJ * sizeof(double));
/*
* Phase: populate I3 + I4 (SOA_STORE + PATTERN_CLASS).
*
* I4 justification (§EMERGENCE vs IMPOSITION):
* p[i] = (n[i] - 1) & PATTERN_MASK is computable from n[i].
* Stored because:
* (a) graph_color_reorder permutes all arrays — sort key
* must be materialized;
* (b) dispatch_interleaved reads p[idx] every iteration —
* recomputing adds a data dependency to critical path.
* The category IS the bit-mask of (count - 1). It is not
* a label assigned by the programmer — it is a measurement
* of the object's numeric arity, projected onto 2 bits.
*/
for (size_t i = 0; i < obj_count; i++) {
double *vals = &store->v[i * VALUES_PER_OBJ];
int nv = extract_numbers(spans[i].start, spans[i].end,
vals, VALUES_PER_OBJ);
if (nv <= 0) {
nv = 1;
vals[0] = 0.0;
}
store->n[i] = (uint8_t)nv;
store->p[i] = (uint8_t)((nv - 1) & PATTERN_MASK);
}
/* Phase: annihilate I2 (spans) and I1 (mmap).
* Only I3/I4 survive from here forward. */
free(spans);
munmap(data, st.st_size);
close(fd);
/* Phase: I5 -> I6 -> I7 via graph coloring reorder */
gate_state_t gs = graph_color_reorder(store);
if (__builtin_expect(gs == GATE_0, 0)) {
fprintf(stderr, "[GATE-0] parse_json_soa: "
"graph coloring reorder failed\n");
}
return store; /* gate-1 */
parse_fail:
free(spans);
objstore_free(store);
munmap(data, st.st_size);
close(fd);
return NULL;
}
/*
==================================================================
* OPT-404: Group-direct dispatch loops
* OPT-403: BTB dealiasing via section attributes
* OPT-407: FMA-style accumulation in P3
* OPT-409: Tuned prefetch per group
*
* Each pattern has its own tight loop — the branch predictor
* sees a direct backward branch (100% predictable) within
* each group. Misprediction occurs ONLY at group transitions.
*
* FRONT: Reordered ObjStore (I7 complete).
* Iteration count: total synthetic operations.
* base_iter: starting iteration offset for x computation.
*
* LEAD: Four sequential tight loops, one per pattern group.
* Iteration distribution: proportional to group size,
* last group gets remainder (integer division rounding).
* Within each loop:
* x = (base_iter + i) * STEP_X
* Prefetch PREFETCH_AHEAD objects ahead.
* Pattern-specific arithmetic:
* P0: x + v[0] * SCALE_P0
* P1: x * v[0] + v[1]
* P2: (x + v[0] - v[1]) * v[2]
* P3: chain multiply-add with DECAY_P3 factor
* Anti-optimization barrier: asm volatile("" :: "x"(result))
* prevents dead-code elimination of result.
* "x" constraint: XMM register (x86-64 SSE, Intel SDM
* Vol. 2, Table 3-8).
*
* REAR: Accumulated synthetic result (double).
* Value is workload-dependent — only the timing matters.
*
* Gate walk:
* Z: group with cnt==0 -> skip (no objects in this pattern)
* X: n/a
* 0: store NULL, store->v NULL, store->n NULL, count zero
* 1: result returned
*
* Contract operator: (AS/--\WAS) — lossy.
* Individual per-iteration results overwritten. Only final
* accumulation survives.
==================================================================
*/
static GateResult dispatch_groups(const ObjStore *store,
size_t iterations,
size_t base_iter)
{
if (__builtin_expect(store == NULL, 0))
return GATE_RESULT_FAIL("NULL-001: store is null");
if (__builtin_expect(store->v == NULL, 0))
return GATE_RESULT_FAIL("NULL-001: store->v is null");
if (__builtin_expect(store->n == NULL, 0))
return GATE_RESULT_FAIL("NULL-001: store->n is null");
if (__builtin_expect(store->count == 0, 0))
return GATE_RESULT_FAIL("store->count is zero");
double result = 0.0;
const double *v = store->v;
const uint8_t *n = store->n;
for (int g = 0; g < PATTERN_COUNT; g++) {
size_t start = store->group_start[g];
size_t cnt = store->group_count[g];
uint8_t pat = store->group_order[g];
if (cnt == 0) {
continue; /* gate-Z: empty group, skip */
}
/*
* Iteration distribution: proportional to group size.
* Last group absorbs integer division remainder to
* guarantee total iterations == requested count.
*/
size_t group_iters = (iterations * cnt) / store->count;
if (g == PATTERN_COUNT - 1) {
size_t used = 0;
for (int gg = 0; gg < PATTERN_COUNT - 1; gg++) {
used += (iterations * store->group_count[gg])
/ store->count;
}
group_iters = iterations - used;
}
switch (pat) {
case 0: /* P0: x + v[0] * SCALE_P0 */
for (size_t i = 0; i < group_iters; i++) {
size_t idx = start + (i % cnt);
double x = (base_iter + i) * STEP_X;
__builtin_prefetch(
&v[(start + ((i + PREFETCH_AHEAD) % cnt))
* VALUES_PER_OBJ], 0, 1);
result = x + v[idx * VALUES_PER_OBJ] * SCALE_P0;
__asm__ volatile("" :: "x"(result));
}
break;
case 1: /* P1: x * v[0] + v[1] */
for (size_t i = 0; i < group_iters; i++) {
size_t idx = start + (i % cnt);
double x = (base_iter + i) * STEP_X;
__builtin_prefetch(
&v[(start + ((i + PREFETCH_AHEAD) % cnt))
* VALUES_PER_OBJ], 0, 1);
const double *vp = &v[idx * VALUES_PER_OBJ];
result = x * vp[0] + vp[1];
__asm__ volatile("" :: "x"(result));
}
break;
case 2: /* P2: (x + v[0] - v[1]) * v[2] */
for (size_t i = 0; i < group_iters; i++) {
size_t idx = start + (i % cnt);
double x = (base_iter + i) * STEP_X;
__builtin_prefetch(
&v[(start + ((i + PREFETCH_AHEAD) % cnt))
* VALUES_PER_OBJ], 0, 1);
const double *vp = &v[idx * VALUES_PER_OBJ];
result = (x + vp[0] - vp[1]) * vp[2];
__asm__ volatile("" :: "x"(result));
}
break;
case 3: /* P3: chain multiply-add (OPT-407) */
for (size_t i = 0; i < group_iters; i++) {
size_t idx = start + (i % cnt);
double x = (base_iter + i) * STEP_X;
__builtin_prefetch(
&v[(start + ((i + PREFETCH_AHEAD) % cnt))
* VALUES_PER_OBJ], 0, 1);
const double *vp = &v[idx * VALUES_PER_OBJ];
int nv = n[idx];
/*
* OPT-407: FMA-style chain.
* result = ((x * v[0]) * 0.9 + v[1]) * 0.9 + v[2] ...
* DECAY_P3 = 0.9 damps successive contributions.
* Unrolled to VALUES_PER_OBJ iterations maximum.
* Conditional on nv avoids reading uninitialized v[].
*/
result = x * vp[0];
if (nv > 1) result = result * DECAY_P3 + vp[1];
if (nv > 2) result = result * DECAY_P3 + vp[2];
if (nv > 3) result = result * DECAY_P3 + vp[3];
if (nv > 4) result = result * DECAY_P3 + vp[4];
if (nv > 5) result = result * DECAY_P3 + vp[5];
if (nv > 6) result = result * DECAY_P3 + vp[6];
__asm__ volatile("" :: "x"(result));
}
break;
}
base_iter += group_iters;
}
return GATE_RESULT_OK_D(result); /* gate-1 */
}
/*
==================================================================
* OPT-404 variant: Computed-goto dispatch for interleaved access
*
* FRONT: Reordered ObjStore (I7 complete).
* Iteration count: total synthetic operations.
*
* LEAD: Single loop over all iterations.
* Each iteration:
* idx = i % count (round-robin over objects)
* Prefetch PREFETCH_AHEAD objects ahead.
* Computed goto on p[idx] into pattern handler.
* Same arithmetic as dispatch_groups per pattern.
* Computed goto (GCC Labels as Values extension):
* dispatch[] = { &&IP0, &&IP1, &&IP2, &&IP3 }
* goto *dispatch[p[idx]]
* Eliminates switch comparison chain. BTB sees the
* indirect branch once per iteration. On reordered data,
* the BTB prediction rate is high within contiguous groups.
*
* REAR: Accumulated synthetic result (double).
*
* Gate walk:
* Z: n/a
* X: n/a
* 0: store NULL, arrays NULL, count zero
* 1: result returned
*
* Contract operator: (AS/--\WAS) — lossy.
==================================================================
*/
static GateResult dispatch_interleaved(const ObjStore *store,
size_t iterations)
{
if (__builtin_expect(store == NULL, 0))
return GATE_RESULT_FAIL("NULL-001: store is null");
if (__builtin_expect(store->v == NULL, 0))
return GATE_RESULT_FAIL("NULL-001: store->v is null");
if (__builtin_expect(store->p == NULL, 0))
return GATE_RESULT_FAIL("NULL-001: store->p is null");
if (__builtin_expect(store->n == NULL, 0))
return GATE_RESULT_FAIL("NULL-001: store->n is null");
if (__builtin_expect(store->count == 0, 0))
return GATE_RESULT_FAIL("store->count is zero");
/*
* OPT-403: computed goto dispatch table.
* GNU C extension: Labels as Values (GCC, Clang).
* -Wpedantic warns on this construction. Suppressed locally
* via pragma because the extension IS the correct notation
* for indirect branch dispatch — it is intentional, not
* accidental. The pragma documents the exception precisely
* where it occurs.
*/
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpedantic"
static void *dispatch[] = { &&IP0, &&IP1, &&IP2, &&IP3 };
double result = 0.0;
const double *v = store->v;
const uint8_t *p = store->p;
const uint8_t *n = store->n;
size_t count = store->count;
for (size_t i = 0; i < iterations; i++) {
size_t idx = i % count;
double x = i * STEP_X;
__builtin_prefetch(
&v[((i + PREFETCH_AHEAD) % count) * VALUES_PER_OBJ],
0, 1);
const double *vp = &v[idx * VALUES_PER_OBJ];
goto *dispatch[p[idx]];
#pragma GCC diagnostic pop
IP0:
result = x + vp[0] * SCALE_P0;
goto inext;
IP1:
result = x * vp[0] + vp[1];
goto inext;
IP2:
result = (x + vp[0] - vp[1]) * vp[2];
goto inext;
IP3:
result = x * vp[0];
{
int nv = n[idx];
if (nv > 1) result = result * DECAY_P3 + vp[1];
if (nv > 2) result = result * DECAY_P3 + vp[2];
if (nv > 3) result = result * DECAY_P3 + vp[3];
if (nv > 4) result = result * DECAY_P3 + vp[4];
if (nv > 5) result = result * DECAY_P3 + vp[5];
if (nv > 6) result = result * DECAY_P3 + vp[6];
}
inext:
__asm__ volatile("" :: "x"(result));
}
return GATE_RESULT_OK_D(result); /* gate-1 */
}
/*
==================================================================
* SELF-TESTS AT STARTUP
*
* Tests run before any external interaction (§SELF-TESTS).
* Concrete inputs, specific expected outputs.
* These are spot-checks on known mathematical truths.
* A failure halts startup and reports the error.
*
* FRONT: No input. Tests use compile-time string literals
* and known constants.
*
* LEAD: Three test groups:
* TC-A: parse_number on IEEE 754-exact values.
* TC-B: extract_numbers on known JSON fragments.
* TC-C: pattern classification boundary values.
* Each test compares against exact expected value.
*
* REAR: GATE_1 if all tests pass. GATE_0 on first failure
* (with diagnostic to stderr).
*
* Gate walk:
* Z: n/a
* X: n/a
* 0: any test fails -> diagnostic + GATE_0
* 1: all tests pass
==================================================================
*/
static gate_state_t self_test(void)
{
int failures = 0;
/*
* TC-A: parse_number on IEEE 754-exact values.
*
* All test values are exactly representable:
* 42 = integer, exact in 52-bit mantissa
* -7 = integer, exact (sign bit set)
* 1.5 = 1 + 1/2 = 0x3FF8000000000000, exact
* 0.25 = 1/4 = 0x3FD0000000000000, exact
* 1e3 = 1000, integer, exact
*/
{
struct { const char *input; double expected; } cases[] = {
{ "42", 42.0 },
{ "-7", -7.0 },
{ "1.5", 1.5 },
{ "0.25", 0.25 },
{ "1e3", 1000.0 },
};
int ncases = (int)(sizeof(cases) / sizeof(cases[0]));
for (int i = 0; i < ncases; i++) {
const char *p = cases[i].input;
double got = parse_number(&p);
if (got != cases[i].expected) {
fprintf(stderr,
"[SELF-TEST FAIL] TC-A-%d: "
"parse_number(\"%s\") = %.17g, "
"expected %.17g\n",
i, cases[i].input,
got, cases[i].expected);
failures++;
}
}
}
/*
* TC-B: extract_numbers on known JSON fragments.
*
* Fragment 1: {"x":1,"y":2}
* Two numeric values: 1.0 and 2.0.
* String keys "x" and "y" discarded.
*
* Fragment 2: {"a":100}
* One numeric value: 100.0.
*/
{
const char frag1[] = "{\"x\":1,\"y\":2}";
double vals1[VALUES_PER_OBJ] = {0};
int n1 = extract_numbers(frag1, frag1 + sizeof(frag1) - 1,
vals1, VALUES_PER_OBJ);
if (n1 != 2 || vals1[0] != 1.0 || vals1[1] != 2.0) {
fprintf(stderr,
"[SELF-TEST FAIL] TC-B-0: "
"extract_numbers frag1: n=%d v=[%.1f,%.1f] "
"expected n=2 v=[1.0,2.0]\n",
n1, vals1[0], vals1[1]);
failures++;
}
const char frag2[] = "{\"a\":100}";
double vals2[VALUES_PER_OBJ] = {0};
int n2 = extract_numbers(frag2, frag2 + sizeof(frag2) - 1,
vals2, VALUES_PER_OBJ);
if (n2 != 1 || vals2[0] != 100.0) {
fprintf(stderr,
"[SELF-TEST FAIL] TC-B-1: "
"extract_numbers frag2: n=%d v=[%.1f] "
"expected n=1 v=[100.0]\n",
n2, vals2[0]);
failures++;
}
}
/*
* TC-C: pattern classification boundary values.
*
* p = (n - 1) & PATTERN_MASK
*
* Derivation (spot-check, all 7 valid n values):
* n=1: (0) & 3 = 0 P0
* n=2: (1) & 3 = 1 P1
* n=3: (2) & 3 = 2 P2
* n=4: (3) & 3 = 3 P3
* n=5: (4) & 3 = 0 P0 (wraps)
* n=6: (5) & 3 = 1 P1 (wraps)
* n=7: (6) & 3 = 2 P2 (wraps)
*/
{
uint8_t expected[] = { 0, 1, 2, 3, 0, 1, 2 };
int ncases = (int)(sizeof(expected) / sizeof(expected[0]));
for (int i = 0; i < ncases; i++) {
int nv = i + 1;
uint8_t got = (uint8_t)((nv - 1) & PATTERN_MASK);
if (got != expected[i]) {
fprintf(stderr,
"[SELF-TEST FAIL] TC-C-%d: "
"pattern(n=%d) = %d, expected %d\n",
i, nv, got, expected[i]);
failures++;
}
}
}
/*
* TC-D: gate-0 on NULL input.
*
* extract_numbers must return 0 on NULL json pointer.
* This is a runtime contract, not an assert.
*/
{
double vals[VALUES_PER_OBJ];
int n = extract_numbers(NULL, NULL, vals, VALUES_PER_OBJ);
if (n != 0) {
fprintf(stderr,
"[SELF-TEST FAIL] TC-D-0: "
"extract_numbers(NULL) returned %d, "
"expected 0\n", n);
failures++;
}
}
if (failures > 0) {
fprintf(stderr, "[SELF-TEST] %d failure(s). Halting.\n",
failures);
return GATE_0;
}
printf("Self-test: all passed (TC-A through TC-D)\n\n");
return GATE_1;
}
/*
==================================================================
* main — benchmark harness
*
* FRONT: CLI arguments:
* argv[1] = JSON filename (default: "mixed.json")
* argv[2] = iteration count (default: 10,000,000)
*
* LEAD: Phase sequence:
* 1. Self-test (halts on failure)
* 2. parse_json_soa (I1 -> I7)
* 3. Cache warmup (WARMUP_ITERS prefetch passes)
* 4. Benchmark 1: dispatch_groups (CLOCK_MONOTONIC)
* 5. Benchmark 2: dispatch_interleaved (CLOCK_MONOTONIC)
* 6. Transition analysis (emergent statistics)
* 7. Contiguity validation (theory check)
*
* REAR: Benchmark results to stdout. Exit 0 on success.
* ObjStore freed and NULL-poisoned (CERT MEM01-C).
*
* Gate walk:
* Z: n/a
* X: n/a
* 0: self-test fail, parse fail, empty store, dispatch fail
* 1: benchmarks completed, exit 0
*
* v6.5.1 FIX: CWE-416 UAF eliminated.
* Root cause: objstore_free(store) in the count==0 branch,
* then store->count dereference after it.
* Fix: cache store->count into local 'count' immediately after
* the null gate. All subsequent references use the local.
* objstore_free path gets store=NULL poison (CERT MEM01-C).
*
* NIST 800-53: SI-16 (Memory Protection)
* CWE: CWE-416 (Use After Free) — REMEDIATED v6.5.1
* CERT: MEM01-C (Do not use freed memory)
==================================================================
*/
int main(int argc, char **argv)
{
const char *filename = argc > 1 ? argv[1] : "mixed.json";
size_t iterations = argc > 2 ? (size_t)atol(argv[2]) : 10000000;
printf("JSON Polymorphic Dispatch — Graph Coloring Edition\n");
printf("File: %s\n", filename);
printf("Optimizations: OPT-401..OPT-410, GATE-001\n");
printf("PPO-3 v3.0 compliant\n\n");
/* Phase 1: Self-test before any external interaction */
if (self_test() != GATE_1)
return 1;
/* Phase 2: Parse + reorder (I1 -> I7) */
ObjStore *store = parse_json_soa(filename);
if (__builtin_expect(store == NULL, 0)) {
printf("[GATE-0] Failed to parse JSON file\n");
return 1;
}
/*
* CWE-416 FIX: cache store->count into stack local BEFORE
* any branch that calls objstore_free().
*
* The local is a value copy — immune to UAF. All subsequent
* count references in main() use this local, never store->count.
*
* 0D gate walk:
* Z: n/a (count is used in all paths below)
* X: no (store proven non-null above, count deterministic)
* 0: no (local copy breaks free->deref dependency)
* 1: yes (value captured before any free path)
*/
const size_t count = store->count;
if (__builtin_expect(count == 0, 0)) {
printf("[GATE-0] Empty object store\n");
objstore_free(store);
store = NULL; /* CERT MEM01-C: poison after free */
return 1;
}
printf("Loaded %zu objects\n\n", count);
/* Pattern distribution — emergent statistic */
int patterns[PATTERN_COUNT] = {0};
for (size_t i = 0; i < count; i++)
patterns[store->p[i]]++;
printf("Pattern distribution: P0=%d P1=%d P2=%d P3=%d\n",
patterns[0], patterns[1], patterns[2], patterns[3]);
/*
* Emergent check: pattern counts must sum to total count.
* (§EMERGENT STATISTICS: "Category counts must sum to total.")
*/
{
int sum = 0;
for (int d = 0; d < PATTERN_COUNT; d++)
sum += patterns[d];
if ((size_t)sum != count) {
fprintf(stderr,
"[EMERGENT FAIL] Pattern sum %d != count %zu\n",
sum, count);
objstore_free(store);
store = NULL;
return 1;
}
printf("Pattern sum check: %d == %zu (OK)\n\n", sum, count);
}
/* Phase 3: Cache warmup */
for (size_t i = 0; i < WARMUP_ITERS; i++) {
__builtin_prefetch(
&store->v[(i % count) * VALUES_PER_OBJ], 0, 3);
}
/*
* === Benchmark 1: Group-direct dispatch (OPT-404) ===
*
* Timing: CLOCK_MONOTONIC wall-time nanoseconds.
* (§BENCHMARK CLOCK DISCIPLINE: "Performance claims require
* CLOCK_MONOTONIC.")
*/
{
struct timespec t0, t1;
clock_gettime(CLOCK_MONOTONIC, &t0);
GateResult gr = dispatch_groups(store, iterations, 0);
clock_gettime(CLOCK_MONOTONIC, &t1);
if (__builtin_expect(gr.state != GATE_1, 0)) {
printf("[GATE-0] dispatch_groups failed: %s\n",
gr.reason);
} else {
int64_t ns = timespec_diff_ns(&t0, &t1);
__asm__ volatile("" :: "x"(gr.dvalue));
printf("=== GROUP-DIRECT DISPATCH (OPT-404) ===\n");
printf("Iterations: %zu\n", iterations);
printf("Wall time: %ld ns\n", (long)ns);
printf("ns/dispatch: %.2f\n",
(double)ns / (double)iterations);
printf("Gate state: 1 (ALLOW)\n\n");
}
}
/*
* === Benchmark 2: Interleaved dispatch (sorted, OPT-401+403) ===
*/
{
struct timespec t0, t1;
clock_gettime(CLOCK_MONOTONIC, &t0);
GateResult gr = dispatch_interleaved(store, iterations);
clock_gettime(CLOCK_MONOTONIC, &t1);
if (__builtin_expect(gr.state != GATE_1, 0)) {
printf("[GATE-0] dispatch_interleaved failed: %s\n",
gr.reason);
} else {
int64_t ns = timespec_diff_ns(&t0, &t1);
__asm__ volatile("" :: "x"(gr.dvalue));
printf("=== INTERLEAVED DISPATCH "
"(sorted, OPT-401+403) ===\n");
printf("Iterations: %zu\n", iterations);
printf("Wall time: %ld ns\n", (long)ns);
printf("ns/dispatch: %.2f\n",
(double)ns / (double)iterations);
printf("Gate state: 1 (ALLOW)\n\n");
}
}
/*
* === TRANSITION ANALYSIS ===
*
* Emergent statistics (§EMERGENT STATISTICS):
* After graph coloring reorder, objects within each group
* are contiguous. The transition matrix should be nearly
* diagonal. Cross-pattern transitions should equal exactly
* (non_empty_groups - 1), because the only transitions
* occur at group boundaries.
*/
printf("=== TRANSITION ANALYSIS ===\n");
uint64_t trans[PATTERN_COUNT][PATTERN_COUNT] = {{0}};
for (size_t i = 1; i < count; i++)
trans[store->p[i - 1]][store->p[i]]++;
printf("After reorder (should be nearly diagonal):\n");
for (int d = 0; d < PATTERN_COUNT; d++) {
printf(" P%d -> ", d);
for (int cc = 0; cc < PATTERN_COUNT; cc++)
printf("P%d:%4lu ", cc, trans[d][cc]);
printf("\n");
}
size_t same = 0, diff = 0;
for (size_t i = 1; i < count; i++) {
if (store->p[i] == store->p[i - 1])
same++;
else
diff++;
}
printf("Same-pattern transitions: %zu (%.1f%%)\n",
same, 100.0 * same / (same + diff));
printf("Cross-pattern transitions: %zu (%.1f%%) "
"— these are mispredicts\n",
diff, 100.0 * diff / (same + diff));
/*
* Contiguity validation (§EMERGENT STATISTICS):
*
* After perfect reorder, cross-pattern transitions ==
* (non_empty_groups - 1). This is because each non-empty
* group forms a contiguous block; the only cross-pattern
* transition occurs at each block boundary.
*
* non_empty_groups: count of groups with group_count > 0.
* Expected diff: non_empty_groups - 1 (or 0 if <= 1 group).
*
* Derivation:
* K non-empty groups form K contiguous blocks.
* K blocks have K-1 boundaries.
* Each boundary is exactly one cross-pattern transition.
* Total cross-pattern transitions = K - 1.
*/
{
int non_empty = 0;
for (int g = 0; g < PATTERN_COUNT; g++) {
if (store->group_count[g] > 0)
non_empty++;
}
size_t expected_diff = (non_empty > 0)
? (size_t)(non_empty - 1) : 0;
if (diff == expected_diff) {
printf("Contiguity check: cross-transitions %zu == "
"expected %zu (non_empty_groups=%d - 1) (OK)\n",
diff, expected_diff, non_empty);
} else {
fprintf(stderr,
"[EMERGENT WARN] Contiguity violated: "
"cross-transitions %zu != expected %zu "
"(non_empty_groups=%d - 1)\n",
diff, expected_diff, non_empty);
}
}
/* Gate summary */
printf("\n=== GATE SUMMARY ===\n");
printf("All gates passed: state 1 (ALLOW)\n");
objstore_free(store);
store = NULL; /* CERT MEM01-C: poison after free */
return 0;
}
@opsec-ee
Copy link
Author

opsec-ee commented Feb 19, 2026

file: mixed.json

{"data": [
{"x": 42},
{"a": 1.5, "b": 2.7},
{"p": 3.14, "q": 2.71, "r": 1.41},
{"w": 10, "x": 20, "y": 30, "z": 40},
{"val": 100},
{"m": 5.5, "n": 6.6},
{"i": 1, "j": 2, "k": 3},
{"alpha": 7, "beta": 8, "gamma": 9, "delta": 10},
{"single": 999},
{"pair1": 11, "pair2": 22},
{"t1": 100, "t2": 200, "t3": 300},
{"q1": 1, "q2": 2, "q3": 3, "q4": 4},
{"lone": 77},
{"duo_a": 33, "duo_b": 44},
{"tri_x": 55, "tri_y": 66, "tri_z": 77},
{"multi_a": 1, "multi_b": 2, "multi_c": 3, "multi_d": 4, "multi_e": 5}
]}

@opsec-ee
Copy link
Author

opsec-ee commented Feb 19, 2026

$ gcc -std=gnu2x -O3 -march=native -flto -funroll-loops -DNDEBUG -Wall -Wextra -Wpedantic -o polymorphic-2 polymorphic-2.c -lrt

$ ./polymorphic-2 mixed.json 10000000

JSON Polymorphic Dispatch — Graph Coloring Edition
File: mixed.json
Optimizations: OPT-401..OPT-410, GATE-001
PPO-3 v3.0 compliant

Self-test: all passed (TC-A through TC-D)

Graph coloring order: [P1, P3, P0, P2]
Boundary cost: 4 (minimized from transition matrix)
Groups: [P1: 4 @ 0] [P3: 3 @ 4] [P0: 5 @ 7] [P2: 4 @ 12]
Loaded 16 objects

Pattern distribution: P0=5 P1=4 P2=4 P3=3
Pattern sum check: 16 == 16 (OK)

=== GROUP-DIRECT DISPATCH (OPT-404) ===
Iterations: 10000000
Wall time: 50115228 ns
ns/dispatch: 5.01
Gate state: 1 (ALLOW)

=== INTERLEAVED DISPATCH (sorted, OPT-401+403) ===
Iterations: 10000000
Wall time: 41438445 ns
ns/dispatch: 4.14
Gate state: 1 (ALLOW)

=== TRANSITION ANALYSIS ===
After reorder (should be nearly diagonal):
P0 -> P0: 4 P1: 0 P2: 1 P3: 0
P1 -> P0: 0 P1: 3 P2: 0 P3: 1
P2 -> P0: 0 P1: 0 P2: 3 P3: 0
P3 -> P0: 1 P1: 0 P2: 0 P3: 2
Same-pattern transitions: 12 (80.0%)
Cross-pattern transitions: 3 (20.0%) — these are mispredicts
Contiguity check: cross-transitions 3 == expected 3 (non_empty_groups=4 - 1) (OK)

=== GATE SUMMARY ===
All gates passed: state 1 (ALLOW)

@opsec-ee
Copy link
Author

$ LSAN_OPTIONS=detect_leaks=0 ./polymorphic-2_asan mixed.json 10000

JSON Polymorphic Dispatch — Graph Coloring Edition
File: mixed.json
Optimizations: OPT-401..OPT-410, GATE-001
PPO-3 v3.0 compliant

Self-test: all passed (TC-A through TC-D)

Graph coloring order: [P1, P3, P0, P2]
Boundary cost: 4 (minimized from transition matrix)
Groups: [P1: 4 @ 0] [P3: 3 @ 4] [P0: 5 @ 7] [P2: 4 @ 12]
Loaded 16 objects

Pattern distribution: P0=5 P1=4 P2=4 P3=3
Pattern sum check: 16 == 16 (OK)

=== GROUP-DIRECT DISPATCH (OPT-404) ===
Iterations: 10000
Wall time: 45266 ns
ns/dispatch: 4.53
Gate state: 1 (ALLOW)

=== INTERLEAVED DISPATCH (sorted, OPT-401+403) ===
Iterations: 10000
Wall time: 49817 ns
ns/dispatch: 4.98
Gate state: 1 (ALLOW)

=== TRANSITION ANALYSIS ===
After reorder (should be nearly diagonal):
P0 -> P0: 4 P1: 0 P2: 1 P3: 0
P1 -> P0: 0 P1: 3 P2: 0 P3: 1
P2 -> P0: 0 P1: 0 P2: 3 P3: 0
P3 -> P0: 1 P1: 0 P2: 0 P3: 2
Same-pattern transitions: 12 (80.0%)
Cross-pattern transitions: 3 (20.0%) — these are mispredicts
Contiguity check: cross-transitions 3 == expected 3 (non_empty_groups=4 - 1) (OK)

=== GATE SUMMARY ===
All gates passed: state 1 (ALLOW)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment