Skip to content

Instantly share code, notes, and snippets.

@alexispurslane
Last active April 2, 2026 14:34
Show Gist options
  • Select an option

  • Save alexispurslane/3b43bc467567d2fc94dd6fbcab6edda8 to your computer and use it in GitHub Desktop.

Select an option

Save alexispurslane/3b43bc467567d2fc94dd6fbcab6edda8 to your computer and use it in GitHub Desktop.
Embryo Engine Architecture 2.0

Custom Space ImSim Engine Architecture

Core Philosophy: Aggressive scoping. Maximize behavioral emergence via a message bus and array of structs architecture similar to Caves of Qud (ImSim mechanics), game world scale enabled by a 64-bit physics engine, and nested physics grids for complex space architectures and interactions (such as “space legs”, ship boarding, seamless EVA), while cutting corners on graphics (Blinn-Phong, dynamic lights only) and rigid-body rotational physics to maintain high performance and solo-developer feasibility.

Core Technology: Written entirely in Odin, leveraging native manual memory management (Arenas, implicit contexts), native operator overloading for vector math (core:math/linalg), tagged unions for ADTs, and zero-overhead C-bindings for Raylib 5.5 and official Lua 5.4 bindings.

1. Fundamental Data Flow (The RAM and PCB)

The engine relies on a strictly managed memory layout. By using Odin, we completely eliminate Garbage Collection (GC) overhead naturally, and are able to safely and concisely manage how memory is allocated via custom context.allocator assignments, making frame-times perfectly deterministic.

1.1 The Entity-Component Storage & Prefab Registry

  • The Component Prefab Registry: To prevent wasting memory on per-instance metadata, and keep track of the knowledge needed for every instance of a component in a central space for making new ones, class-level data (like Priority and initialization functions) is stripped out of the component instances and stored in a central, globally accessible array.
ComponentTypeID :: u8 // Strict 0-63 integer index assigned at runtime

ComponentPrefab :: struct {
    name:             string,
    priority:         int,
    base_update_proc: UpdateProc, // The native proc, or the universal Lua wrapper proc

    // Modding fields (Zero if native)
    is_lua:           bool,
    lua_update_ref:   i32, // luaL_ref to the mod's update() closure
    lua_default_ref:  i32, // luaL_ref to the mod's default data table
}

// The central definition of all component types in the game
GlobalRegistry: [64]ComponentPrefab
  • Registration: Native Odin components register themselves into this array automatically during engine startup (via init procs or compile-time generation). Lua components register themselves dynamically when mod scripts are evaluated.
  • The Concrete Component Definition: We explicitly avoid OOP polymorphism (struct embedding). A Component is a strict, concrete struct holding its type, its update procedure pointer, and an untyped pointer (rawptr) to its specific state data.
UpdateProc :: proc(comp: ^Component, entity_id: EntityID, game_state: ^GameState, entity_bus: ^ShadowBus, out: ^CommandArena)

Component :: struct {
    type_id:  ComponentTypeID,
    update:   UpdateProc, 
    data:     rawptr, // Points to arbitrary backing memory (e.g., ^HealthData)
}
  • Native Component Implementation & Persistent Allocation: To define a native behavior, we declare a pure data struct. When the engine instantiates this component, it must ensure the data memory outlives the current frame.
    • The Allocation Fix (The OS Heap): Even though the Single-Threaded Resolution phase (which executes Spawn commands) defaults to using the sim_frame_arena, we explicitly pass engine.persistent_allocator into Odin's new() function. This allocator maps directly to the standard OS heap. We must use the OS heap here, rather than a long-lived arena, because entities and their components are destroyed at random, unpredictable times during gameplay. This requires individual, piecemeal memory freeing (e.g., free(comp.data, engine.persistent_allocator)) which bump arenas do not support.
// 1. Define the pure data layout
HealthData :: struct {
    hp:     f32,
    max_hp: f32,
}

// 2. Define the specific update logic
health_update :: proc(comp: ^Component, entity_id: EntityID, game_state: ^GameState, entity_bus: ^ShadowBus, out: ^CommandArena) {
    // Safely downcast the untyped data pointer
    data := cast(^HealthData)comp.data

    // Access custom data natively
    if data.hp <= 0 {
        // Because we receive entity_id, we know exactly which entity to destroy
        append(&out.commands, CmdDestroyEntity{id = entity_id})
    }
}

// 3. Spawning the component (Execution during Resolution Phase)
// Explicitly allocate using a persistent allocator so it survives the frame flush!
comp.data = new(HealthData, engine.persistent_allocator)
  • Entity Definition & Lifecycle: Entities are referred to by Generational IDs (an EntityID struct containing an index and a generation). These IDs point to slots within a massive, pre-allocated flat array [dynamic]Entity (the Entity Buffer). This massive buffer is initialized at startup using the exact same engine.persistent_allocator (the OS heap) as the individual component data structs.
Entity :: struct {
    generation:     u32,

    // Wrapped in Odin's 'Maybe' to allow abstract entities to safely omit them 
    // without requiring heap-allocated pointers.
    transform:      Maybe(TransformComponent),
    physics:        Maybe(PhysicsBodyComponent),

    // Separated Arrays: Both share the same 0-63 ID space defined by the Registry
    components:     [64]^Component,    
    lua_components: [64]^LuaComponent, 
}
  • The Flat Data Exemption: The hardcoded Transform and Physics components act purely as flat data containers. By wrapping them in Odin's Maybe(), they remain completely flat in memory (implemented as a tagged union under the hood, avoiding pointer indirection and cache misses entirely) while safely allowing abstract manager entities to omit them. They contain no update procedure pointers. Their state is read and mutated exclusively by external systems (like the 64-bit Physics Engine). Excluding them from the dynamic arrays prevents thousands of useless function calls per frame.
  • Creation: When a new Entity is created, first the Free List Queue is consulted to see if there are any existing entity slots that can be reused. If there are, that Entity ID and generation are returned. Otherwise, a new slot at the end of the [dynamic]Entity buffer is utilized. If capacity is reached, the buffer grows natively via Odin's dynamic array resizing (which safely draws from the persistent_allocator).
  • Destruction: Entities are never destroyed mid-tick. Components issue a deferred CmdDestroyEntity command. During the single-threaded resolution phase, the entity is zeroed-out except for its generation number, which is incremented (instantly invalidating any leftover IDs in the wild). Its underlying component data pointers are explicitly freed via the persistent_allocator, and its array index is pushed onto the Free List Queue.
  • Zero-Cost Lookups: When a system calls entity.components[id], the engine performs an instantaneous, zero-overhead memory offset. Deduplication is implicit (only one component can occupy index 42).

1.2 The Tripartite Message Bus & Tagged Unions

Instead of interface-based messaging (which requires slow type assertions), the engine utilizes Odin's Tagged Unions (union) to represent all events in the game perfectly as Abstract Data Types (ADTs).

Message :: union {
    MsgDamage,
    MsgExplosion,
    MsgSensorPing,
}

The engine routes the previous frame's messages into three distinct, read-only routing structures:

  1. The Direct Inbox: Messages explicitly targeted at a specific entity.
  2. The Spatial Bus: Area-of-Effect messages bucketed by spatial hash grid cells. These messages strictly carry their local position and area radius.
  3. The Global Bus: Full universe broadcasts.
  • The Simulation Frame Arena: Because messages strictly live for exactly one frame, we do not need complex object pooling. All single-threaded parts of the Simulation Thread (such as resolving commands, populating the new Message Buses, and any miscellaneous per-frame calculations) allocate their memory exclusively from one giant Simulation Frame Arena whose massive memory buffer is allocated exactly once at startup. At the end of the simulation tick, the engine simply calls free_all(sim_frame_arena). Note: In Odin, free_all on an arena does not return memory to the OS; it simply resets the internal pointer offset to zero. This guarantees zero memory leaks, zero fragmentation, zero thread contention, and a completely GC-free frame loop.

1.3 Worker Command Arenas (The Context Swapping Trick)

  • The Problem: If thousands of entities generate state-change commands concurrently, writing to a shared array requires locks or atomics, destroying parallel scaling. Furthermore, applying changes immediately causes "skew" (later entities reading data from the future). If native components or Lua scripts need to allocate temporary memory during their Update() loop, hitting the OS heap causes severe fragmentation.
  • The Solution: The engine provides each worker with a mem.Arena and an isolated Lua VM natively bound to it. By overwriting context.allocator for the worker thread, the arena becomes the universal, lock-free scratchpad for everything that happens natively on that core, while the Lua VM handles its own memory directly from the same arena.
// ECS Commands are strictly for modifying components or emitting messages
Command :: union { CmdModifyComponent, CmdAddComponent, CmdRemoveComponent, CmdEmitMessage, CmdSpawnPrefab, CmdDestroyEntity }

// Physics Handoffs are processed in a completely separate resolution loop
GridHandoff :: struct { 
    entity:        EntityID, 
    old_grid:      EntityID, 
    new_grid:      EntityID, 
    new_transform: TransformComponent,
    new_velocity:  [3]f64 // Linear velocity is relative to the new grid!
}

WorkerContext :: struct {
    thread_arena: mem.Arena,        
    commands:     [dynamic]Command,     // For the ECS Workers
    handoffs:     [dynamic]GridHandoff, // For the Physics Workers
    lua_vm:       ^lua.State,           // Strictly isolated per-worker 
}

workers: [NUM_CPU]WorkerContext

// Initialization (done exactly ONCE per worker at startup)
mem.arena_init(&workers[i].thread_arena, massive_buffer)

// We pass a custom Odin allocator wrapper to Lua, providing a pointer 
// to this specific worker's arena as the userdata (ud) argument.
workers[i].lua_vm = lua.newstate(custom_odin_lua_alloc, &workers[i].thread_arena)
luaL_openlibs(workers[i].lua_vm)

// --- Inside the worker thread ---
process_entity :: proc(worker: ^WorkerContext, entity_id: EntityID) {

    // THE MAGIC TRICK: Override the default allocator for this entire scope!
    // Any Odin procedure called from here on will implicitly use the Thread Arena.
    context.allocator = mem.arena_allocator(&worker.thread_arena)

    // Because we call free_all at the end of every frame, the arena is wiped.
    // We safely re-initialize the dynamic array header using the new context.
    if len(worker.commands) == 0 {
        worker.commands = make([dynamic]Command, 0, 1024) 
    }

    // 1. If a native component calls `new()` or `make()` inside its Update(),
    // it seamlessly allocates into the Thread Arena! Zero OS heap hits.
    run_components(entity_id, worker.lua_vm, worker.commands)
}
  • The Entity Resolution Lifecycle:
      1. Parallel Update: Workers read the buses and execute component logic, implicitly dropping all native allocations into their specific thread arena. Lua scripts executing concurrently on the worker.lua_vm draw from the exact same arena.
      1. Resolution Phase (Single Thread): A nested loop on the main Simulation Thread iterates for w in workers for c in w.commands, executing state mutations, allocating new component data pointers on the engine’s persistent allocator, and allocating resulting messages to push onto the three message buses into the giant single-threaded Simulation Frame Arena.
      1. The Flush: After the following parallel physics phase and single-threaded physics resolution phase runs, the engine calls free_all on all thread-local worker arenas. This instantly invalidates all temporary memory (including the commands arrays, handoffs arrays, and Lua allocations) used by the components and scripts that frame, readying the massive_buffer for the next tick.

2. The Entity Update Loop (The CPU)

The engine utilizes a hybrid of the Actor Model and Double-Buffered State Resolution. Throughout this entire phase, the global game state tuple remains completely immutable.

2.1 The Worker Pool (OS Threads & Work-Stealing)

Because we know exactly how many cores the CPU has, we don't need the overhead of an M:N green-thread scheduler (like Go's goroutines). We map exactly one OS thread to one physical core.

  • The Queue: The engine maintains a work queue (core:sync/chan) populated with entity IDs.
  • Work-Stealing: Workers do not have statically assigned entities. An idle thread pulls from the channel until it's empty.
work_queue: chan EntityID

// Spawn 1:1 OS threads
for i in 0..<NUM_CPU {
    thread.create(proc(t: ^thread.Thread) {
        for {
            id, ok := recv(&work_queue)
            if !ok { break }
            process_entity(&workers[i], id)
        }
    })
}

2.2 The Entity Bus (The Shadow Bus)

When a worker begins updating an Entity, it constructs an ephemeral Entity Bus.

  1. Filtering (Thread-Local Allocation): The worker rapidly builds a local slice of pointers to messages relevant to this entity (combining Direct, Global, and relevant Spatial messages). Because context.allocator was swapped to the thread_arena, this temporary slice is constructed entirely lock-free.
  2. The Component Loop: Once all mods and native systems are initialized, the engine reads the central GlobalRegistry, sorts the active ComponentTypeIDs based strictly on their declared priority, and caches the result. The worker loop strictly iterates over this pre-sorted slice, passing the specific entity_id into the update call so the component knows exactly which entity it's acting on:
for type_id in Engine.sorted_update_order {
    prefab := GlobalRegistry[type_id]

    if prefab.is_lua {
        if comp := entity.lua_components[type_id]; comp != nil {
            // Execute the universal Lua wrapper stored on the prefab
            prefab.base_update_proc(comp, entity_id, game_state, entity_bus, out_arena)
        }
    } else {
        if comp := entity.components[type_id]; comp != nil {
            // Execute the direct native procedure pointer
            comp.update(comp, entity_id, game_state, entity_bus, out_arena) 
        }
    }
}
  1. Deferred Swap-and-Pop: Components can “consume” a message on the Entity Bus, using an O(1) unordered remove (swap with the last element and pop) to prevent subsequent components from seeing it.

3. The Physics Engine (64-bit Parallel Grids)

The physics engine runs after the ECS resolution phase. It strictly handles Kinematics and Linear Dynamics (no rotational torque). All physical calculations are performed using strict 64-bit math (f64) to ensure precision across astronomical distances without jitter.

  • Odin Advantage: By utilizing core:math/linalg, 64-bit vectors natively support operator overloading (vel += acc * dt), dramatically improving physics code readability compared to languages without it.

3.1 Physics Grids as Components

  • The Grid Component: Instead of existing as abstract engine concepts, Physics Grids are standard components holding a base radius, a Spatial Hash, and a flat [dynamic]EntityID slice of inhabitants. It derives its origin and rotation purely from its host entity's TransformComponent (which is safely unpacked from its Maybe wrapper).
  • The Dense Grid Array (Performance): To prevent the physics solver from scanning the massive generic Entity Buffer every frame, the engine maintains a separate, dense slice of pointers to all active PhysicsGridComponents. This is what the physics worker pool pulls from.
  • Transform Data: The TransformComponent stores CurrentLocalPosition ([3]f64), PreviousLocalPosition ([3]f64 - crucial for CCD), LocalRotation (quaternion128), and GridEntityID.
  • Intra-Grid Parenting (Hardpoints): To attach entities within the same grid (e.g., a turret to a ship hull), the TransformComponent supports an optional ParentEntityID. During the resolution phase, the engine recursively calculates the final LocalPosition of child entities relative to their parents before passing the flattened grid list to the physics solver.
  • Physics Isolation & Interaction Example: Sibling grids cannot physically overlap. Because siblings never overlap, there is zero ambiguity regarding ownership. The physics solver processes each PhysicsGridComponent entirely independently.
    • Example: If two crew members are inside a Ship Grid, they only interact with each other and the ship's interior. A spacewalker floating perfectly still in the parent Space Grid will collide with the ship's exterior hull as it flies by, but will never interact with the crew inside. To maintain this illusion, physics grids must always be strictly contained within convex, impassable collision meshes (like a ship's hull) so that objects cannot clip through grid boundaries without physically triggering a transition.

3.2 The Parallel Physics Worker Pool (In-Place Mutation)

  • The Workers: A dedicated pool of Physics OS Threads pulls grid pointers from a channel.
  • The WorkerContext Pattern: The physics engine reuses the exact same [NUM_CPU]WorkerContext memory pattern (thread-local mem.Arena + [dynamic]GridHandoff array) used by the ECS worker pool. By overriding context.allocator at the top of the physics tick, workers handle temporary math allocations and handoff triggers completely lock-free.
  • Direct Mutation: Because sibling grids do not physically overlap, there is no risk of two threads trying to update the exact same entity's transform or check collisions simultaneously. Therefore, the physics workers do not use deferred commands for standard movement. They directly and aggressively mutate the TransformComponent and PhysicsBodyComponent values of their grid's inhabitants in parallel.

3.3 Per-Grid Spatial Hashing & Re-bucketing

  • Nested Hashes: Because Grids are individual components, each PhysicsGridComponent manages its own isolated, infinite sparse hash map (map[u64][dynamic]EntityID) for broad-phase collision and AoE message routing.
  • Scaled Buckets: The bucket size for each grid is dynamically scaled according to the radius of its physics grid. The root Space Grid might use 10km wide buckets, while a Capital Ship's interior Grid uses 5m wide buckets, ensuring optimal memory usage at all scales.
  • The Re-bucketing Step (Staleness Fix): As the physics worker directly mutates an entity's position, its new hash bucket is calculated. If it crosses a bucket boundary (NewBucket != OldBucket), its ID is instantly removed from the old bucket array and appended to the new one. This strictly prevents "ghost" collisions or entities missing AoE messages due to stale bucketing.

3.4 Narrow Phase: Continuous Collision Detection (CCD)

  • Shapes: Strictly limited to Spheres and OBBs (Oriented Bounding Boxes). Complex ships use Compound Colliders (arrays of Spheres/OBBs).
  • The Anti-Tunneling Sweep (CCD): To prevent the "Fast-Projectile Trap" (where objects move so fast they teleport through walls between frames), the engine performs Continuous Collision Detection for all moving objects. The narrow phase doesn't just check static overlaps; it sweeps the object's collider mathematically from its PreviousLocalPosition to its CurrentLocalPosition.
  • Math: Uses swept Pythagorean distance for Spheres, and Swept Separating Axis Theorem (SAT) or Raycasting for OBBs.
  • Resolution (Supreme Authority): Objects do not spin upon impact. If a sweep detects a hit, the entity is snapped to the exact point of impact along the normal, bouncing or sliding based on its restitution. To prevent 1-frame visual lag ("spongy" physics), this mutation happens instantly. Simultaneously, the worker pushes a MsgCollisionOccurred message to the Direct Inbox of the involved entities so the ECS components can react to the consequences on the next frame.

3.5 The Airlock Handoff (Deferred Grid Transitions)

Because the physics solver is parallelized, an entity cannot be teleported between grids instantly. Modifying one grid's inhabitant array from a thread that is processing a different grid would cause a data race.

  • The Handoff Trigger (Smallest Grid Wins): The engine cannot simply wait for an entity to go "out of bounds" of its current grid to trigger a transition. For example, an entity residing in the infinite "Space" grid is never technically out of bounds! Instead, after applying impulses and moving an object, the physics worker uses the BVH collision algorithm to re-evaluate the entity's new global position against the entire grid hierarchy. It searches for the smallest (deepest nested) physics grid that fully envelops the entity. If this new target grid differs from the entity's current GridEntityID (e.g., the spacewalker flew through an open/retracted hangar bay door and is now contained within the Ship Grid's bounding volume), a handoff is initiated.
  • The Math (On-Demand Matrix Calculation): When a handoff is triggered, the worker traverses up from the old grid to the lowest common ancestor of the old and new grids, accumulating relative transforms.
    1. New Local Pos = Exited_Grid Pos + (Exited_Grid Rot * Entity Local Pos)
    2. Tangential Velocity = CrossProduct(Exited_Grid Angular Velocity, Radius Vector to Entity)
    3. New Local Velocity = Exited_Grid Linear Velocity + Tangential Velocity + Entity Local VelocityThen, the worker traverses down from the lowest common ancestor to the new physics grid, adjusting the position, rotation, and velocity to eventually yield values that are purely relative to the newly chosen physics grid.
  • The Handoff Trigger: Because modifying the grids' dynamic arrays in parallel is unsafe, the worker does not finalize the move. It packages the pre-calculated, math-perfect transform and new relative velocity into a GridHandoff struct and appends it into its thread-local handoffs array.

3.6 Single-Threaded Physics Resolution (The Grid Swap)

Unlike the main ECS resolution phase which handles all logical state changes via commands, the Physics Resolution Phase has exactly one job: safely moving entities between grids based on explicit handoff triggers.

  • Once all physics workers finish their ticks, a completely separate, single-threaded loop executes.
  • It iterates through every worker's handoffs array looking strictly for GridHandoff triggers.
  • For each trigger, it performs three tasks:
    1. It explicitly removes the entity ID from the old PhysicsGridComponent's inhabitant array and appends it to the new PhysicsGridComponent's inhabitant array.
    2. It updates the entity's TransformComponent with the newly computed local transform.
    3. It updates the entity's PhysicsBodyComponent with the newly computed local linear velocity.
  • Because this step is single-threaded, it completely eliminates array-mutation race conditions while allowing the heavy math (CCD, SAT, BVH traversal, and matrix multiplication) to remain fully parallelized in the step prior.
  • Grid Destruction (The "Spacing" Mechanic): This same single-threaded step processes grid destruction. If a parent entity hosting a grid (like a Capital Ship) is destroyed, its inhabitants are not deleted. Instead, the engine treats the destruction as a forced handoff, using the exact same math to eject surviving entities into the next surviving ancestor grid (e.g., Space) — inheriting the correct global transform and explosive velocity.

3.7 Spatial Queries & Raycasting (The BVH Shortcut)

Immersive Sims rely heavily on line-of-sight checks and laser raycasts. Because entities exist inside completely isolated nested physics grids, the engine splits raycasting into two strict paradigms to maintain performance:

  • Local Raycasts (Default): Because physics grids cannot overlap and typically represent enclosed spaces (like a ship hull), the vast majority of interactions (e.g., a security camera looking for the player) are confined to the same grid. A standard gameState.Raycast(origin, direction) strictly queries the current PhysicsGridComponent's spatial hash.
  • Global Raycasts (GlobalRaycast): When a weapon or sensor must cross grid boundaries (e.g., a Capital Ship firing into Space), the engine leverages the strict, non-overlapping Physics Grid hierarchy as a massive, "free" Bounding Volume Hierarchy (BVH).
    • The algorithm traverses down the grid tree, summing up the transforms of each physics grid as it goes to establish global coordinates.
    • It performs broad-phase ray intersections against the bounding volumes (radius) of the grids.
    • For each bounding volume (which is really a physics grid) that the ray intersects, we resolve the ray’s intersections within that physics grid’s spatial hash, then proceed to scan its children for intersecting BVs/PGs and resolve them; if the ray doesn’t intersect the BV/PG, we don’t even look at its children.

4. The Renderer (Raylib Integration)

The engine completely divorces the f64 simulation from the f32 visual representation. To guarantee cross-platform native execution, the pipeline targets OpenGL 3.3 Core using Odin's native vendor:raylib bindings.

4.1 Recursive Matrix Multiplication, Camera Authority, & Additive Polish

  • Rotation Authority (Main Thread Fast Path): The Main Thread has 100% authority over the camera's Local Rotation. To render the frame with zero latency, it calculates the camera's temporary Global Rotation. The updated Local Rotation is sent to the Simulation Thread via channel, which blindly accepts it.
  • Position Authority (Simulation Interpolated Path): The Main Thread has 0% authority over the player's position. Input events are sent to the Simulation Thread, which calculates collisions and rigidly moves the player.
  • Interpolation & Additive Polish: To keep positional movement looking smooth, the Main Thread uses an Alpha blend factor (calculated from the fixed-timestep accumulator) to interpolate the camera and mesh positions between the last two completed physics ticks. Additionally, visual flair like head-bobbing or screen-shake is added as an additive offset entirely on the Main Thread, completely independent of the actual 64-bit collision cylinder tracking the player's true position.

4.2 Culling, Downcasting, and the Origin Camera Trick

  • Recursive Math: The Main thread calculates absolute 64-bit Global Positions for the Camera and all entities by walking up the Physics Grid Hierarchy.
  • The Downcast & Culling: RelativePos = ObjectGlobal - CameraGlobal. If RelativePos is larger than the Far Clipping Plane (e.g., 20km), the object is culled entirely. If within visual range, this f64 relative position is safely cast to an f32 vector ([3]f32) and passed directly to Raylib.
  • The Origin Camera: To bypass floating-point jitter, the engine then translates the universe around the camera. rl.Camera3D is permanently locked at (0, 0, 0).

4.3 The Raylib API Contract (High vs. Low Level)

  • High-Level (rl): The engine populates rl.Camera3D at the origin and calls rl.DrawModel or rl.DrawMeshInstanced. Raylib internally handles the VAO bindings and MVP matrix generation.
  • Low-Level (rlgl): During initialization, the engine drops into the rlgl backend to manually construct Framebuffer Objects (FBOs) with floating-point RGBA16F attachments and set FBO modes for deferred shading and HDR support, bypassing Raylib's default 8-bit limits.

4.4 The OpenGL 3.3 Deferred Rendering Pipeline

  • Pass 0 (Shadows): Renders shadow-casting Spotlights’ and Point Lights’ (whether point lights cast shadows is toggled via map data) POVs into depth textures/cubemaps.
  • Pass 1 (G-Buffer): Renders camera-relative meshes into three textures:
    • Albedo (RGB) + Metallic (A),
    • Normal (RGB) + Roughness (A)
    • Absolute world Position (RGBA32F).
  • Pass 2 (Lighting): Binds G-Buffer.
    • Renders specific geometric volumes (Spheres for point lights, Cones for spotlights).
    • Custom deferred rendering shader calculates Blinn-Phong lighting for each rendered pixel in each lighting shape using:
      • the light’s information (from a uniform)
      • the material and surface info from the G-Buffer
      • converting the GLTF PBR values (Metallic/Roughness) to Blinn-Phong parameters strictly via ALU math to save memory bandwidth
    • Renders into an HDR buffer with additive blending turned on.
  • Pass 3 & 4 (Tone Mapping): Ping-pong downsamples the HDR buffer to a 1x1 texture to find average luminance, then applies Auto-Exposure and the Timothy Lottes Tone Mapper to present a filmic LDR image to the screen.

5. Modding System (Native Lua 5.4 Bindings)

Because Odin supports native C-ABI, the engine directly embeds the real, official C implementation of Lua 5.4 using Odin's vendored bindings (vendor:lua). This requires absolutely no bridging overhead or external language wrapper penalties. We get 100% feature parity with the standard Lua C API.

5.1 The Lua Component Definition & Prototype Instantiation

Because we split Component and LuaComponent into two distinctly typed arrays, a LuaComponent does not need to embed or inherit any logic. It is a pure, independent data struct managed explicitly by the Registry's branching loop.

LuaComponent :: struct {
    type_id:      ComponentTypeID,
    instance_ref: i32, // luaL_ref to THIS specific entity's Lua data table
}
  • Instantiating a Mod Component (Table Inheritance): When a command requires attaching a Lua component to an entity, we initialize the component directly from its class prefab. However, we don't just assign the prefab's default data table to instance_ref. If we did, every entity would share the exact same health pool! Instead, the engine uses the Lua C-API to execute Prototype Inheritance:
    1. It fetches the prefab's default table via GlobalRegistry[id].lua_default_ref.
    2. It executes lua_newtable(L) to create a completely empty, lightweight instance table for the new entity.
    3. It creates a metatable with an __index field pointing back to the prefab's default table, and attaches it via lua_setmetatable(L, -2).
    4. It stores this new empty table into the Lua Registry and sets the resulting integer to instance_ref.
    5. Result: Reading a value like self_data.hp seamlessly falls back to the prefab's defaults, but writing self_data.hp = 50 stores the mutation strictly on the entity's individual instance table. Zero deep-copying required!

5.2 The Zero-Copy Odin-Lua Bridge (Userdata and Metatables)

Serializing Odin structs into Lua tables every frame generates massive garbage collection overhead. Instead, the engine relies on the Lua C API's Userdata and Metamethods to perform zero-copy reads.

  1. Native Structs (GameState, EntityBus): Instead of copying data, Odin allocates a lightweight Lua Userdata (lua_newuserdatauv) and places the raw Odin pointer (e.g., ^GameState) inside it.
  2. The __index Metamethod: Odin attaches a metatable to this Userdata containing a custom C-closure proc "c" bound to the __index event. When the Lua script tries to read a property (e.g., gameState.tick_count), Lua triggers the C-closure. The closure retrieves the raw Odin pointer from the Userdata, reads the actual memory (state.tick_count), and pushes the resulting integer onto the Lua stack.
  3. Strict Read-Only Guarantee: Because the engine deliberately omits binding a __newindex metamethod (which governs writing), it is physically impossible for a Lua script to accidentally mutate the Game State or Entity Bus in place.

5.3 Mod Initialization (Registration via Prefab)

When the engine boots, it mounts the modder's .lua files into the Lua states. The modder registers their components using the exposed API:

-- Inside a modder's .lua file
Engine.RegisterComponent({
    name = "MyMod:AntiGravity",
    priority = 10,
    default_data = { charge = 100.0, active = true },
    
    update = function(self_data, entity_id, game_state, entity_bus, out)
        if self_data.active and self_data.charge < 0 then
            out:ModifyComponent(entity_id, "MyMod:AntiGravity", { active = false })
            out:EmitAoEMessage(entity_id, "EMP", { radius = 50 })
        end
    end
})
  • The C-API Translation: When this Lua function executes, the C-backend intercepts it. It assigns the next available ComponentTypeID. It stores the name, priority, and flags is_lua = true in GlobalRegistry[id].
  • Registry Referencing: It calls luaL_ref on both the update function and the default_data table, saving those integers into GlobalRegistry[id].lua_update_ref and lua_default_ref.

5.4 The Unified Wrapper Execution & Memory Binding (Zero-GC)

When the worker thread checks the Registry and detects is_lua == true, it bypasses native execution and triggers lua_component_update_wrapper. The following zero-GC sequence occurs:

  1. State & Prefab Retrieval: The wrapper receives the ^LuaComponent pointer. It uses comp.type_id to query the GlobalRegistry and retrieve the lua_update_ref. It knows which OS thread is executing, so it retrieves the persistent worker.lua_vm.
  2. Pushing the Stack: The wrapper uses lua_rawgeti to fetch the specific mod's update function (from the class prefab) and the specific entity's data table (from comp.instance_ref). It pushes the lightweight Userdata wrappers for the GameState, the EntityBus, and the CommandArena, along with the EntityID (typically converted to a lightweight integer representing the generational ID), onto the Lua stack.
  3. The Call: The wrapper executes lua_pcall.
  4. Zero-GC Allocations: Crucially, when worker.lua_vm was initialized at startup (lua.newstate), its custom C-allocation callback was passed a direct pointer to the worker's thread_arena as its userdata argument. Therefore, any strings, tables, or generic userdata created by the Lua script during lua_pcall are silently and automatically allocated out of the lock-free native bump arena. At the end of the frame, when free_all is called on the worker's arena, all of this Lua-generated memory is instantly reclaimed, ensuring the Lua script never triggers a heavy C-level garbage collection pause.

5.5 Issuing Commands from Lua (Strictly Deferred)

Because the game state tuple is strictly immutable during the parallel update phase, Lua scripts cannot directly mutate anything. All interaction happens via the out Userdata (which wraps a pointer to the worker's [dynamic]Command array).

  • Command Methods: The out Userdata has an __index metatable that exposes C-functions mimicking object-oriented methods.
  • Direct Array Injection: When the Lua script calls out:AddComponent({ type = "Explosion" }), it invokes the bound Odin procedure. This procedure reads the arguments off the Lua stack, constructs an Odin command struct, and executes append(&worker.commands, cmd). This pushes the command directly into the worker's native Thread Arena dynamically, entirely bypassing Lua heap allocations.

6. Content Pipeline and Asset Management

The engine uses a Blender-to-Odin pipeline, treating standard 3D modeling software as the primary level editor to avoid building custom editor tooling.

6.1 The Formats

  • Models and Scenes: .gltf (GL Transmission Format). It is an open standard, deeply supported by Blender, and cleanly parsed into JSON/Binary chunks via cgltf.
  • Textures: .png files. They provide adequate visual fidelity for the scoped visual style without requiring complex texture compression pipelines.

6.2 Scenes as Data (The extras Property)

Instead of a proprietary scene format, the engine uses GLTF's native JSON metadata to define entities and components.

  • In Blender, custom properties can be added to any object. The GLTF exporter embeds these into the extras field of the node.
  • Workflow: A modder places a crate mesh in Blender and adds a custom property: components: {"Core:Physics": {"mass": 50}, "Core:Health": {"hp": 100}}.
  • When the Odin engine loads the GLTF, it intercepts the extras JSON, spawns a new Entity Generational ID, and initializes the specified components with the provided values.

6.3 Collision Authoring (Naming Conventions)

Complex Compound Colliders (Spheres and OBBs) are authored directly alongside the visual mesh in Blender without requiring separate files.

  • Workflow: Modders create primitive shapes (Cubes and IcoSpheres) to map out hitboxes over their high-poly ship models. They name these objects with strict prefixes: COL_OBB_MainHull or COL_SPH_Cockpit.
  • Engine Parsing: During GLTF ingestion, the engine checks node names. If a node starts with a collision prefix, the engine extracts its local transform and bounding dimensions, converts it directly into a physics collider component, and deletes the visual node so Raylib never pushes it to the GPU.

6.4 The Prefab System (Dynamic Spawning & Deep Cloning)

While static levels are loaded directly, dynamic entities (missiles, dropped weapons, spawned AI) use a GLTF-based Prefab system.

  • Ingestion: If a GLTF node is prefixed with PREFAB_ (e.g., PREFAB_Missile), the engine parses it into an Entity but does not inject it into the active game world. Instead, it stores it in a dormant table.
  • Spawning: Components issue a CmdSpawnPrefab{name: "PREFAB_Missile", local_pos: ..., target_grid: ...} command to the Arena. During the single-threaded resolution phase, the engine allocates a new Generational ID and attaches it to the specified physics grid.
  • The Deep Clone Trap: A shallow memory copy of a Prefab would cause all spawned missiles to share the exact same pointers in RAM. Every native component is required to implement an explicit clone procedure to deep-copy its internal arrays/structs, while Lua components rely on the prototype inheritance logic defined in Section 5.1, ensuring spawned entities possess completely unique memory addresses.

7. Core Engine Systems (Bridging Simulation to Game)

7.1 The Two-Loop Architecture & Frame Arenas

Raylib's windowing, event polling, and rendering must occur on the main OS thread. To manage memory flawlessly, the engine maintains two massive, independent frame arenas (allocated once at startup):

  1. The Main (Render) Thread: Loops continuously, ticking UI, polling inputs, passing unconsumed inputs to the Simulation via a thread-safe queue, and issuing rl.Draw() calls based on the Render State. It utilizes a giant Render Frame Arena for all of its single-threaded operations (building UI draw lists, calculating temporary matrices, etc.). It calls free_all on this arena at the end of every rendered frame, instantly resetting the memory offset for the next frame.
  2. The Simulation Thread: Runs on a fixed-timestep accumulator, completely decoupled from the rendering framerate. It utilizes the giant Simulation Frame Arena for its single-threaded resolution phases, and its worker pools rely strictly on the Thread-Local Arenas.

7.2 Shared Memory Quad Buffering (Lock-Free Interpolation)

If the Render Thread needs to read two states to interpolate between them (Older and Newer), but the Simulation Thread is actively writing the next frame's state, standard double or triple buffering array rotation creates a mathematical race condition. If the Render thread grabs indices 0 and 1, and the Sim thread blindly rotates the array and writes to index 2, that index 2 is the exact same pointer the Render thread grabbed as "Older".

To solve this, the engine uses perfectly lock-free Quad Buffering with Atomic Exchange: [4]^RenderState.

  • The Setup: The engine pre-allocates 4 RenderState memory blocks at startup.
    • The Render Thread uniquely owns two pointers: render_older and render_newer.
    • The Simulation Thread uniquely owns one pointer: sim_writing.
    • The engine maintains one globally shared, atomic pointer: shared_newest.
  • The Simulation Write (End of Tick): The Simulation Thread always writes its ECS data to the memory block pointed to by sim_writing. When the tick completes, the Simulation thread executes an atomic exchange:
// Push the finished frame to the shared slot, and instantly yank out whatever 
// was sitting there to use as our next blank canvas.
sim_writing = sync.atomic_exchange(&shared_newest, sim_writing)

(Because the Render thread exclusively owns the pointers it is actively reading, it is mathematically impossible for the sim_writing pointer to overwrite an actively-rendering frame).

  • The Render Read (Start of Frame): At the start of its frame, the Render thread checks if shared_newest has a new, unseen frame. If it does, it swaps its oldest, useless frame into the shared slot to be eventually recycled by the simulation thread:
latest := sync.atomic_load(&shared_newest)
if latest != render_newer {
    // Grab the new frame, and throw our oldest frame into the shared slot
    pulled_frame := sync.atomic_exchange(&shared_newest, render_older)
    render_older = render_newer
    render_newer = pulled_frame
}
// Interpolate between render_older and render_newer safely!

7.3 Input & UI Event Streaming

Because UI is entirely decoupled from the ECS logic, input events flow through a strict hierarchy on the Main Thread before reaching the simulation.

  1. The Poll Phase: The Main Thread polls Raylib for hardware state (keyboard, mouse position, clicks).
  2. The UI Intercept Phase: The Main Thread passes these inputs to the Master UI Lua VM. If a player clicks on an IMGUI button, the UI "consumes" that input event so the simulation never sees it (preventing the player from accidentally firing a weapon while clicking a menu).
  3. The Channel Handoff: Unconsumed hardware inputs (like pressing 'W' to move) are packed into standard InputEvent structs. Additionally, the Lua UI scripts can generate custom ECS messages. Both streams are pushed down a thread-safe queue to the Simulation.
  4. The ECS Injection: Before waking up the ECS Worker Pool, the Simulation Thread empties the queue and pushes all events directly onto the Global Message Bus for the Player Entity and other relevant components to process during the tick.

7.4 Main-Thread Immediate Mode UI (Lua + RayGUI)

Placing UI components inside a parallel ECS is an anti-pattern. Instead, the engine manages UI entirely on the Main Thread using a dedicated Master UI Lua VM integrated with Raylib's Immediate Mode GUI (vendor:raylib/raygui).

  • Direct Rendering: The Odin engine exposes native RayGUI binding functions (e.g., raygui.Button, raylib.DrawText) directly to the Main Thread Lua VM.
  • The Draw Loop: After drawing the 3D scene, the Main Thread invokes the Lua UI scripts. Lua issues immediate-mode commands: if raygui.Button(rect, "Engage Warp") then ... end. Odin intercepts these and instantly fires the OpenGL 2D commands over the 3D frame. No slow texture construction is necessary.
  • Reading ECS Data: The Lua UI VM is provided access to the read-only Render Snapshot. It can extract an enemy's absolute 3D coordinate, project it to a 2D screen-space coordinate, and draw a health bar directly over their head using standard Raylib 2D calls.

7.5 The Master Execution Map (Thread Synchronization)

To visualize exactly how and when the two threads communicate, here is the chronological pipeline. The only two points of contact between the concurrent threads are the Input Queue (7.3) and the shared_newest Atomic Pointer (7.2).

Main (Render) Thread: Runs continuously, locked to OS window refresh (or uncapped).

  1. OS Poll: Raylib processes OS window events and grabs raw hardware input.
  2. UI Intercept: Master RayGUI Lua VM runs. If the player interacts with a UI panel, the input is consumed here.
  3. Communication 1 (Send): All unconsumed InputEvents and Messages to put on the bus are pushed into the thread-safe Input Queue.
  4. Communication 2 (Receive): The thread atomically checks the shared_newest pointer. If a new frame exists, it pulls it into render_newer and recycles render_older.
  5. Render Math: Calculates the Alpha blend factor based on the Sim Thread's accumulator. Walks the Physics Grid Hierarchy to calculate 64-bit global camera relative positions.
  6. OpenGL Pipeline: Executes the Deferred Shading pipeline (Shadows, G-Buffer, Lighting, Tone Mapping) using the interpolated state.
  7. Immediate UI: Draws the RayGUI/Lua 2D overlay directly on top of the 3D frame.
  8. Flush: Calls free_all(render_frame_arena) and swaps the window buffers (rl.EndDrawing()). Loops back to Step 1.

Simulation Thread: Loops continuously, but ECS/Physics logic only fires when Accumulator >= FixedDeltaTime.

  1. Time Accumulation: Adds wall-clock time to the accumulator. If time is ready, it begins a tick.
  2. Communication 1 (Receive): Drains the Input Queue. Formats inputs into Messages and pushes them onto the Global Message Bus. Pushes the messages produced by the UI directly onto the Tripartite Message Bus.
  3. Parallel ECS Phase: Wakes up the ECS Worker Pool. Cores evaluate entity logic, run Lua/Odin updates lock-free, and write commands to their specific thread_arenas.
  4. ECS Resolution: Single-threaded loop resolves ECS commands (which are either modifications to an entity’s own components, directives to add a component or remove a component from another entity, or directives to release a message on a bus, which will be seen by other entities) and writes new messages to the Tripartite Message Bus.
  5. Parallel Physics Phase: Wakes up Physics Worker Pool. Cores directly mutate transforms and run CCD/SAT collision sweeps lock-free for the entities in each physics grid. Produces physics grid BVH handoff triggers to a grid handoff array.
  6. Physics Resolution: Single-threaded loop executes explicit grid handoff array swaps and clears physics command buffers.
  7. Memory Construction: Writes the flattened, renderable ECS state directly into the isolated sim_writing Quad Buffer memory block.
  8. Communication 2 (Send): Atomically exchanges sim_writing with the shared_newest pointer.
  9. Flush: Calls free_all(sim_frame_arena) and clears all Worker thread_arenas. Subtracts FixedDeltaTime from the accumulator and loops back to Step 1.

7.6 Spatial Audio (The 64-bit Sound Problem)

Like the renderer, Raylib’s PlaySound3D() function strictly expects 32-bit floats, which requires a custom audio resolution pipeline.

  • The Math: Audio components calculate their position relative to the camera in 64-bit space, just like the renderer, and downcast the [3]f32 offset before passing it to Raylib.
  • The Grid Muffling Trick: Because of the Nested Physics Grids, audio can determine structural obstruction without expensive raycasting. If an explosion occurs, the Audio Emitter checks its GridID against the Listener's GridID. If they differ (e.g., the explosion is in the Space Grid, but the player is inside the Ship Grid), the engine applies a Low-Pass Filter and heavy volume attenuation, instantly creating the immersive "muffled outside noise" effect.

8. Serialization

The AoS architecture combined with Odin's native reflection allows perfect save-states.

  • The Instantiation: When spawning a prefab or loading a save file, the engine reads the component's string name and maps it to a ComponentTypeID. If the GlobalRegistry flags it as a Lua component, the engine allocates a LuaComponent, executes the metatable inheritance logic on the lua_default_ref, and stores the new table's handle in instance_ref. For native components, it allocates the data via the persistent_allocator as defined in Section 1.1.
  • The Dump: The ECS array is traversed. Entities are saved with their exact string component names alongside their raw struct data. Because the engine strictly serializes at the end of a simulation tick (when the Tripartite Message Bus and all Command Arenas have been completely flushed), there are no in-flight messages or transient events to serialize, maintaining perfect causal simplicity.
  • Lua Serialization: Because we use the native Lua C API, the LuaComponent adapter relies on custom serialization routines that traverse the mod's specific Lua table, extract primitive values (numbers, strings, booleans), and serialize them alongside the native data, allowing mod state to persist across saves.
  • The Pointer Trap: No component uses a raw memory pointer to refer to another entity. All relationships are strictly defined by EntityID, ensuring the game state can be safely dumped to disk and mapped back to RAM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment