-
-
Save TheYkk/959756387290f598d6e0934125a4de1e to your computer and use it in GitHub Desktop.
| 🚀 High-Level Goal | |
| Support a 64 v 64 (128 total) “Hell-Let-Loose–style” FPS with Godot clients and an authoritative Rust server, while keeping latency low (< 80 ms RTT budget) and bandwidth reasonable for both clients (< 250 kbps) and the server box (< 25 Mbps). | |
| ──────────────────────────────────────── | |
| 1. Core Design Pillars | |
| ──────────────────────────────────────── | |
| • Authoritative server ‑ no trust in clients | |
| • UDP first, with a light reliability/ordering layer (think ENet/Laminar/QUIC) | |
| • Fixed-rate server simulation tick, client-side prediction + interpolation | |
| • Delta-compressed, relevance-filtered snapshots (a.k.a. interest management) | |
| • Multi-threaded ECS simulation on the server; network I/O kept lock-free | |
| • Single box for 128 players, but layout is shard-friendly if we ever split | |
| ──────────────────────────────────────── | |
| 2. Top-Level Architecture | |
| ──────────────────────────────────────── | |
| Godot Client <-UDP/QUIC-> Rust “Game-Core” (authoritative) <-TCP-> Lobby / DB | |
| ``` | |
| ┌────────┐ inputs ┌─────────────┐ events ┌──────────┐ | |
| │ Godot │───────────►│ Net Front │──────────►│ Match │ | |
| │ Client │◄───────────│ Gate (IO) │◄──────────│ Lobby │ | |
| └────────┘ snapshots └──────┬──────┘ └──────────┘ | |
| │ | |
| lock-free | |
| channels | |
| │ | |
| ┌─────▼─────┐ | |
| │ Game ECS │ | |
| │ (Bevy?) │ | |
| └─────┬─────┘ | |
| │ | |
| ┌─────▼─────┐ | |
| │ Worker │ | |
| │ Threads │ | |
| └───────────┘ | |
| ``` | |
| Why two layers inside the server? | |
| • Net Front Gate = purely async I/O, packet (de)frag, (de)crypt, acks. | |
| • Game ECS = deterministic world updated at fixed Δt, batch-consumes inputs, emits snapshots. | |
| ──────────────────────────────────────── | |
| 3. Transport & Packet Layout | |
| ──────────────────────────────────────── | |
| Transport: UDP (or QUIC if you want built-in encryption + congestion control). | |
| Max safe MTU: 1200 bytes (fits inside most home NAT-MTUs). | |
| Packet Header (7 bytes): | |
| ``` | |
| uint16 seq_id | |
| uint16 ack_of_remote | |
| uint32 ack_bitfield (32 earlier acks) | |
| uint8 flags (bit0=reliable, bit1=frag, bit2=control…) | |
| ``` | |
| Payload = 1‒N “messages” TLVed inside the datagram: | |
| Msg-Types (1 byte id + 1 byte len if <256): | |
| 00 Heartbeat / ping | |
| 01 InputCmd (bitfield buttons 2B + 3×pos32 or delta16 + uint8 tick) | |
| 02 SnapshotDelta (compressed) | |
| 03 SnapshotBaseline (full state if delta lost) | |
| 04 Event/RPC (grenade exploded, chat, UI) | |
| 05 StreamFrag (map chunk, voice, etc.) | |
| Reliability: | |
| • “reliable” flag + sliding window resends. | |
| • Unreliable for InputCmds (they become obsolete quickly). | |
| • Semi-reliable for SnapshotBaselines. | |
| ──────────────────────────────────────── | |
| 4. Tick & Time Model | |
| ──────────────────────────────────────── | |
| Simulation tick = 60 Hz (Δt = 16.66 ms) | |
| Networking tick = 20 Hz (every 3rd sim tick we send a snapshot) | |
| Client Render (144 Hz) | |
| ┌───────────────────────────────────┐ | |
| Timeline → │I I I│I I I│I I I│ … (inputs @ 60) │ | |
| ├─┬─┬─┴─┬─┬─┴─┬─┬─┴─┬───────────────┤ | |
| Server Sim │S │S │S │S │S │S │S … (60 Hz) │ | |
| └────────┬────────┬────────┬────────┘ | |
| Snapshot Tx ▲ ▲ ▲ (20 Hz) | |
| Interpolation buf. 2.5 ticks ≈ 40 ms | |
| Client-side: | |
| • Sends InputCmd every render frame (ideally 60 Hz limit). | |
| • Predicts locally. | |
| • Keeps 100 ms of history; on mismatch vs authoritative state ⇒ smooth rewind/correct. | |
| Server: | |
| • Collects all inputs with tick ID ≤ current-tick. | |
| • Simulates physics, hit-scan. | |
| • Serializes state diff vs. last ACKed snapshot per client. | |
| • Runs interest mgmt: spatial hash + LOS + team filter. | |
| ──────────────────────────────────────── | |
| 5. Interest / Relevance Management | |
| ──────────────────────────────────────── | |
| World split into 3-D grid cells (e.g. 32 m cubes). | |
| For each client we only ship entities inside a radius of R = 250 m in front 120° FOV + team markers. | |
| Typical relevant entity count: | |
| • Players: ≈ 40 | |
| • Projectiles (bullets & tracers): ≈ 30 (fade quickly) | |
| • Grenades / effects: 10 | |
| • Buildables / vehicles: 20 | |
| TOTAL ≈ 100 entities / player on average. | |
| Entity State Quantization per delta entry | |
| id (uint16) 2 B | |
| position (x,y,z int16) 6 B (centimeter accuracy inside 2 km map) | |
| yaw/pitch (2×int16) 4 B | |
| velocity (packed int16×3) 6 B | |
| state bits (1 byte) 1 B | |
| TOTAL ~19 B → delta often ~10 B after XOR & RLE | |
| Bandwidth per client (down): | |
| 100 entities × 10 B × 20 Hz = 20 kB/s ≈ 160 kbps | |
| Bandwidth per client (up): | |
| InputCmd 8 B × 60 Hz = 480 B/s ≈ 4 kbps | |
| Server aggregate: | |
| Down: 20 kB/s × 128 = 2.5 MB/s ≈ 20 Mbps | |
| Up: 0.48 kB/s × 128 = 61 kB/s ≈ 0.5 Mbps | |
| Well within a single gig-E NIC. | |
| ──────────────────────────────────────── | |
| 6. Server Threading & Scaling | |
| ──────────────────────────────────────── | |
| CPU Budget (per tick): | |
| • Physics + ECS: ~100 µs per player → 128 × 100 µs = 12.8 ms | |
| • Overhead / pathing / extras → 2.0 ms | |
| Total → 14.8 ms < 16.6 ms budget 💚 | |
| Implementation: | |
| • 1 async thread (Tokio/Quinn) for recv/send (zero-copy to/from mpsc channel). | |
| • N-1 worker threads (rayon or Bevy_schedule) own ECS; partition by entity or system. | |
| • End of tick = barrier; snapshot builder runs, pushes bytes back to Net thread. | |
| Memory: | |
| Baseline entity (archetype) ~256 B; 5 000 live entities → 1.3 MB. | |
| Plenty of headroom; 32 GB RAM box is luxurious. | |
| ──────────────────────────────────────── | |
| 7. Will It Still Work for 128 Players? | |
| ──────────────────────────────────────── | |
| We already designed for 128 total. Stress test scenario: everybody in one courtyard. | |
| Entity count might double to 200 relevant. | |
| Bandwidth per client → 40 kB/s (≈320 kbps) still OK. | |
| Server outbound → 5 MB/s (≈40 Mbps) still < 1/20th gig-E. | |
| CPU: bullet spam could spike physics to 25 ms → mitigation: | |
| • cap projectile simulation (hitscan on server, clients draw fake tracers) | |
| • off-thread async jobs for explosions etc. | |
| So yes, still viable on one modern 8-core (Ryzen 5 7600, Xeon E-2288G, etc.). For >128, you’d shard or open “region servers” (same exe, different port). | |
| ──────────────────────────────────────── | |
| 8. Special Topics / Trade-offs | |
| ──────────────────────────────────────── | |
| Anti-cheat: | |
| • Server validates hits; client only raycasts for FX. | |
| • CRC on resources, obfuscation of packet opcodes. | |
| • Optional: kernel driver not covered here. | |
| Matchmaking & Persistence: | |
| • Separate micro-service; server receives a “SpawnBlob” (loadout, cosmetics). | |
| • At end of match flush stats via TCP to DB. | |
| Voice: | |
| Don’t mix in main data path; use separate SFU or Vivox-like relay. | |
| Tick vs. Event-Driven alternative? | |
| If you wish to ditch fixed 60 Hz, you could go “snapshot-based” variable Δt (Apex approach) but you’ll complicate determinism & physics. For indie scope, classic fixed-tick is safer. | |
| ──────────────────────────────────────── | |
| 9. Checklist Summary | |
| ──────────────────────────────────────── | |
| ✅ UDP + reliability layer (seq/acks) | |
| ✅ 60 Hz sim / 20 Hz snapshots, client interp 100 ms | |
| ✅ Entity relevance + delta compression to keep < 320 kbps per user | |
| ✅ Rust: ECS (Bevy/Legion/Shipyard) + Tokio/Quinn net I/O | |
| ✅ 8-core box, 40 Mbps peak outbound, < 16 ms/frame CPU | |
| ✅ Scales to 128 players; >128 ⇒ shard or stream sections to sub-servers | |
| You now have a concrete yet implementation-agnostic blueprint for building the server & protocol. Happy fragging! 🔫 |
🏋️♂️ Authoritative Physics & Collision Handling
(what runs on the Rust server every 60 Hz tick)
────────────────────────────────────────
- Guiding Principles
────────────────────────────────────────
• 100 % authoritative: only the server decides “what touched what”.
• Fixed Δt (16.66 ms) to keep integration stable and deterministic.
• Keep it cheap (≤ 2 ms of the 16 ms budget) – no full-blown rigid-body chaos, only what an infantry-centric FPS really needs.
• Give clients a mirror-lite version for prediction; small divergences are OK because reconciliation corrects them.
────────────────────────────────────────
2. Physics Scope for an HLL-Style FPS
────────────────────────────────────────
A. Player locomotion – capsule vs. static level geometry, jump, step-up, ladder.
B. Bullets / hitscan – instant ray checks (99 % of shots).
C. Projectiles – grenades, rockets: parabolic flight + explosion radius.
D. Environment – static meshes, trigger volumes, no destructibles (keep first release simple).
E. Vehicles – if/when added, approximate with single convex hull, no wheel suspension simulation initially.
────────────────────────────────────────
3. Tech Choice
────────────────────────────────────────
Use the Rapier3D crate (MIT-licensed, by Dimforge). Reasons:
• Pure Rust – perf & FFI-friendly.
• Deterministic when you pin the same compiler flags and CPU float mode (no SSE‐vs-AVX divergence).
• Already has broad-phase (SAP), narrow-phase (GJK/EPA) and CCD.
• Integrates cleanly with Bevy ECS (via bevy_rapier) or any custom ECS.
Alternative: roll your own capsule-only solver → even faster, but higher upfront cost. Start with Rapier, profile, replace later if necessary.
────────────────────────────────────────
4. Collision Pipeline per Tick
────────────────────────────────────────
for tick in 0..∞ {
// 1. Collect inputs (already queued by Net-IO thread)
apply_player_inputs(dt);
// 2. External forces
add_gravity();
apply_friction();
// 3. Broad Phase
rapier.update_broad_phase(); // uniform grid + SAP
// 4. Narrow Phase
rapier.compute_narrow_phase(); // capsule-mesh, ray, sphere
// 5. Solve contacts & integrate
rapier.step_island_solver(); // penetration correction
// 6. Hitscan / Ray Tests
process_hitscan_requests(); // see §5
// 7. Explosions & AoE overlaps
evaluate_overlap_queries();
// 8. Write-back to ECS (Transform, Velocity)
publish_new_state();
// 9. Snapshot packing happens after this
}
Average cost on Ryzen 5 5600:
• 128 dynamic capsules + 200 static colliders ≈ 0.4 ms
• 2 000 raycasts (1 full-auto MG burst) ≈ 0.3 ms
• 50 grenade projectiles ≈ 0.2 ms
TOTAL ≈ 0.9 ms
So we’re safely below 2 ms.
────────────────────────────────────────
5. Bullets ≠ Rigid Bodies
────────────────────────────────────────
• 99 % of weapons modeled as hitscan:
– Collect all fire events this tick.
– Raycast from muzzle to muzzle + range in Rapier’s query API (no insertion of dynamic bodies).
– First intersection decides hit; store “HitEvent” component, later resolved into damage & FX.
• Tracers are purely cosmetic on the client (draw a ribbon between start & impact point after server response).
Benefits: zero per-frame memory churn, no tunneling issues, trivial network traffic (only send HitEvent).
────────────────────────────────────────
6. Grenades / Rockets (Slow Movers)
────────────────────────────────────────
• Insert as lightweight RigidBodyType::KinematicPositionBased.
• Integrate with gravity: pos += vel * dt; vel += g * dt.
• Continuous Collision Detection enabled so they don’t clip through walls when frame-offset.
• On impact OR fuse-timeout → spawn ExplosionEvent.
• Explosion = overlap query of spheres within radius – O(#entities in that cell).
Network payload: only grenade spawn (reliable) + grenade despawn/explosion event (reliable). Intermediate positions are not networked; clients lerp.
────────────────────────────────────────
7. Static World Representation
────────────────────────────────────────
• Export level geometry from Godot as aggregate triangle mesh; pre-baked into Rapier’s TriMesh on server start.
• For broad-phase culling the mesh is internally split into BVH nodes; no per-tick cost.
• Doors / bridges that move? Represent as separate kinematic bodies switched by gameplay scripts.
Memory footprint: 2 × compressed mesh size (BVH + verts). Typical 1 km² map ~ 30 MB – fine.
────────────────────────────────────────
8. Player Prediction on Client
────────────────────────────────────────
Server uses Rapier.
Client ships with subset of the same code compiled to WebAssembly or GDExtension:
• Step-up height, slope limit, gravity must match server constants.
• Disable expensive CCD & contact manifold generation client-side (not needed for prediction).
• Divergence <2 cm over 200 ms is usually unnoticeable; when it exceeds threshold → reconciliation.
To ensure numeric parity:
cargo rustc -- -C target-feature=+strict-float
on both builds, or ship custom fixed-point math module just for locomotion.
────────────────────────────────────────
9. Determinism vs “Good Enough”
────────────────────────────────────────
We do not need lock-step determinism across all hardware, only “server as the single source of truth”. Therefore:
• Clients may drift a bit; server snaps them back.
• Spectator replay uses server log, so always correct.
• Future e-sport / anti-cheat hardening ⇒ move to fixed-point math to make server re-simulation easier in the cloud; not a V1 requirement.
────────────────────────────────────────
10. Profiling & Regression
────────────────────────────────────────
• Benchmarks (cargo criterion) that run the physics step with recorded input traces – catch perf regressions.
• Integration test: spawn 128 dummy capsules + 5 000 random raycasts, assert no panics and tick <2 ms on CI’s m5zn.metal reference machine.
────────────────────────────────────────
11. Extensibility Hooks
────────────────────────────────────────
• Vehicles later? Stick a convex hull collider + apply engine force; still fits.
• Destructible walls? Spawn new static collider chunks and mark them “destroyed” after HP ≤0 → update BVH once.
────────────────────────────────────────
12. Recap Cheat-Sheet
────────────────────────────────────────
Physics engine: Rapier3D (server) + stripped client mirror
Tick: 60 Hz fixed step
Collision path: Broad Phase → Narrow Phase → Island Solver
Bullets: instant raycasts, no rigid bodies
Projectiles: kinematic, CCD on, overlap query on explode
Static map: baked triangle mesh BVH
Perf budget: < 1 ms of CPU per tick for 128 players
This gives you authoritative, efficient, and maintainable physics that stays within your latency and CPU budgets while scaling cleanly to 64 v 64 battles.
Of course. Let's orchestrate the high-level design into a practical, phased project plan. This approach is designed to deliver value incrementally, allowing for testing and validation at each stage. We'll structure this as a series of milestones, each building upon the last.
Guiding Principles for Orchestration
Project Orchestration Plan
Milestone 0: The Foundation (Pre-Production)
This is the "sharpen the axe" phase before you write the first line of game logic.
Task 0.1: Define the "Networked Structs" Contract.
game-protocolorshared. This crate will be a dependency for both the Rust server and (via bindings) the Godot client.struct InputCmd { ... }(e.g., movement axes, buttons as a bitmask, look angles).struct PlayerState { ... }(e.g., entity_id, position, velocity, health).enum ServerMessage { Snapshot(Vec<PlayerState>), Event(...), ... }enum ClientMessage { Input(InputCmd), ... }serdeand a binary format likebincodefor easy serialization.Task 0.2: Choose Your Core Libraries.
tokio(the industry standard).bevy_ecsis a great choice. It's mature, data-oriented, and its scheduler is designed for parallelism. Alternatively,legionorshipyard.tokio::net::UdpSocketdirectly for maximum control initially.UDPServer/UDPPeeror a C#/.NETUdpClient.Task 0.3: Basic Project Scaffolding.
./rust-server/(a Cargo workspace)./godot-client/(a Godot project)./shared-protocol/(the Rust crate from Task 0.1)Milestone 1: The "Moving Cube" (Proof of Connection)
Goal: Prove that the client can connect to the server and see a single object move based on server state. No player input yet.
Task 1.1: Server - The Unblinking Eye.
Positioncomponent.pos.x += 0.1).Task 1.2: Client - The Observer.
Node3D(representing the player) and aMeshInstance3D(the cube)._ready, start a UDP listener on a background thread._process, update the cube'sglobal_transform.originto match the received position.✅ Verifiable Outcome: You run the server, then the client. A cube smoothly slides across the screen in Godot. You have successfully built the fundamental client-server link.
Milestone 2: Player Control & Authoritative Movement
Goal: The player can control their cube. This introduces the core concepts of prediction and reconciliation.
Task 2.1: Client - Sending Inputs & Predicting.
InputCmdstruct (from Milestone 0) and send it to the server at 60Hz.Task 2.2: Server - Accepting Inputs & Reconciliation.
InputCmdpackets.Task 2.3: Client - Correction & Interpolation.
lerp(old_pos, new_pos, delta)). This ensures other players move smoothly, hiding network jitter.✅ Verifiable Outcome: You can move a character around. It feels instant. When you introduce artificial packet loss, your character might jitter and correct itself, while other players continue to move smoothly.
Milestone 3: Scaling to 128 Players (The Real Architecture)
Goal: Refactor the prototype to handle the target player count and map size.
Task 3.1: Server - Parallelize the ECS.
bevy_ecsScheduleandWorld.Systems(e.g.,apply_inputs_system,physics_system,broadcast_state_system).Task 3.2: Server - Interest Management (Relevance).
Task 3.3: Server & Client - Delta Compression.
snapshot_idthat each client has acknowledged.✅ Verifiable Outcome: Run a headless simulation on the server with 128 "bots" running around. Measure CPU usage (should be distributed across cores) and the size of outgoing packets (should be small). The architecture is now proven to scale.
Milestone 4: Gameplay Implementation
Goal: Turn the tech demo into a game.
Task 4.1: Hit Detection.
PlayerFiredRPC.PlayerFired, perform an authoritative raycast in the ECS world. If it hits another entity, reduce itsHealthcomponent.PlayerWasHitevent in the next snapshot to all relevant clients.PlayerWasHit, play a blood splatter VFX and update the UI.Task 4.2: Game State & Objectives.
MatchTimer,TeamScores).✅ Verifiable Outcome: You can run around and shoot other players. Health bars go down. A game clock ticks down. It's a game!
Milestone 5: Production Readiness
Goal: Prepare the server for real-world deployment and operation.
Task 5.1: Matchmaking & Lobby Flow.
Task 5.2: Observability.
tracing). Log critical events like player joins, leaves, and server errors.prometheuscrate) to track players online, tick duration, and bandwidth usage.Task 5.3: Deployment.
✅ Verifiable Outcome: You can run a command that automatically builds, packages, and deploys a new server version. You can view its logs and performance on a dashboard. The system is robust and manageable.
This phased plan takes you from zero to a fully-featured, scalable, and deployable 128-player game server, validating the architecture at each critical step. 🚀