C++ game engine built to explore high-performance architecture.
Currently under active development, serves as both a learning platform and research project.
Or it might just be a playground to test my sanity.
Important
My original Bachelor's Thesis version is archived in the thesis branch.
Honestly? I just really love this stuff.
It started with my Bachelor's Thesis, where I designed a dual-renderer engine to benchmark Vulkan path tracing against traditional OpenGL PBR. The focus was purely on real-time graphics, so the underlying architecture was single-threaded. It worked, and I had a blast building it!
Then I watched Christian Gyrling’s GDC talk on Parallelizing the Naughty Dog Engine Using Fibers. Seeing how they saturated every single CPU core made me realize how much was left to explore.
So, I started Luth from scratch to explore high-performance architecture: fiber-based job systems, lock-free memory models, and bindless Vulkan rendering. It is absolutely over-engineered for a solo project, but that’s the point.
Prerequisites:
- OS: Windows 10 / 11
- Compiler: MSVC (v143+) or Clang (C++20-compliant)
- GPU: Hardware ray tracing required (
VK_KHR_ray_query+ acceleration structures) — NVIDIA RTX 20-series+, AMD RX 6000+, or Intel Arc - SDK: Vulkan SDK 1.3+. Needs
dynamicRendering,timelineSemaphore, descriptor indexing with UBO update-after-bind, and the KHR ray-tracing extensions
Steps:
- Clone with submodules
git clone --recursive https://github.com/Hekbas/Luth.git
- Generate the VS solution
scripts/setup/setup_windows.bat
- Build — either open
Luth.slnin Visual Studio 2022, or run the headless script:scripts/build/build_windows.bat
The editor binary lands at bin/windows-x86_64/Debug/Runtime/Luthien.exe.
Instead of dedicated OS threads per task ("Render Thread", "Audio Thread"), Luth treats the CPU as a generic worker pool.
- N:M Threading: One Worker Thread per CPU core. Logical tasks are wrapped in Fibers aka lightweight user-mode stacks that migrate freely between workers.
- Zero Blocking: When a job waits on a dependency (or the GPU), it yields to the scheduler, which swaps in another fiber. CPU saturation stays near 100%.
- Synchronization: SpinLocks (test-and-set +
_mm_pause()) and Atomic Counters keep critical sections short, never blocks the OS.
Three stages overlap. At any frame T, the engine is processing three frames at once:
time ──►
┌──────────┬──────────┬──────────┬──────────┐
CPU game │ N │ N+1 │ N+2 │ N+3 │
├──────────┼──────────┼──────────┼──────────┤
CPU render │ N-1 │ N │ N+1 │ N+2 │
├──────────┼──────────┼──────────┼──────────┤
GPU exec │ N-2 │ N-1 │ N │ N+1 │
└──────────┴──────────┴──────────┴──────────┘
- Game (N): Transform / animation updates, then captures a
RenderSnapshotPOD into the frame'sLogicMemoryarena — the immutable handoff to the next stage. - Render (N-1): Reads frame N-1's snapshot, builds the render graph, dispatches per-pass secondary cmd buffer recording in parallel, submits.
- GPU (N-2): Executes the commands submitted previously.
Game and render run concurrently on worker fibers from frame 2 onward (frames 0/1 are a sync warm-up against the current frame). The frame boundary is the snapshot, not shared mutable state — Game writes to one FrameContext slot, Render reads from another. Stage-isolated subsystems that retain mutexes (MaterialSystem, BoneMatrixBuffer) assert they're only mutated from the game stage.
new / delete are forbidden in the hot path. Two allocators handle everything that churns:
Page Pool (2 MB virtual pages)
├── TaggedPageAllocator — CPU side, tagged lifetime, bulk free
│ └── per-thread cache — lock-free hot-path allocations
├── GPUTaggedPageAllocator — host-mapped device pages, freed when GPU N-2 retires
│ └── per-frame UBO/SSBO regions, descriptors rebind via UPDATE_AFTER_BIND
└── LinearAllocator — per-frame, reset on Begin()
- Tagged Page Allocator — Naughty Dog–style. Allocations carry a tag (
LevelGeometry,Frame_N, …) and are freed in bulk by tag. - GPU Tagged Page Allocator — sibling of the CPU side. Vends 2 MB pages from host-mapped device backings; bulk-freed when the GPU N-2 timeline value retires.
- Linear Allocator — bump-allocate transient frame data (command lists, UI state); resets each frame, no per-object destructors.
Persistent SSBOs (Material Set 2, Light Set 3, Object Set 5) are triple-buffered so frame N writes never overlap frame N-1 GPU reads.
Modern hardware, minimal driver overhead.
- Bindless Descriptors:
VK_EXT_descriptor_indexingbinds all engine textures to one global array (Set 1), alongside a 32-slot sampler array and buffer device addresses for the RT geometry table. Materials store an integer index — any draw call can sample any texture without rebinding. - Dynamic Rendering: No
VkRenderPass/VkFramebuffer— passes usevkCmdBeginRenderingdirectly. - Timeline Semaphores: Replace
vkWaitForFences. A dedicated Poller Job queries semaphore values and wakes dependent fibers only when the GPU finishes their workload. - Update-After-Bind: Per-frame UBO/SSBO descriptor sets are rewritten each frame as their backing GPU pages cycle, eliminating CPU-GPU sync on those bindings.
- VMA: Vulkan Memory Allocator handles all device-memory placement (buffers, images, staging).
Each frame, Luth builds a DAG of render passes. Passes declare reads and writes through a RenderPassBuilder; the graph solves pipeline barriers, culls unused passes, and computes resource lifetimes automatically.
graph.AddPass<GeometryPassData>("GeometryPass",
[&](GeometryPassData& data, RG::RenderPassBuilder& builder) {
data.depthTex = builder.WriteDepth(sceneDepth, ...);
data.outputTex = builder.Write(sceneColor);
data.indirect = builder.ReadIndirectBuffer(indirectBuffer);
},
[=](GeometryPassData& data, RG::RenderPassContext& ctx) {
// record draw commands on ctx.commandBuffer
});Passes execute in topological order; command-buffer recording inside each pass parallelizes across worker threads.
| Real-Time Ray Tracing | Hardware KHR ray query — RT sun shadows, ReSTIR DI + GI (Bitterli 2020 / Ouyang 2021), stochastic RT reflections; per-frame TLAS, bindless geometry table |
| Path-Traced Reference | rayQuery megakernel — multi-bounce NEE + GGX-VNDF lobe MIS, progressive fp32 accumulation; ground-truth A/B against the raster path |
| Denoising | SVGF (Schied 2017) — three channels (diffuse DI / indirect GI / specular) behind an IDenoiser interface |
| Clustered Forward+ | Olsson log-slice clusters, slim G-buffer prepass; 1 directional + clustered point lights, ECS-driven |
| PBR | Cook-Torrance BRDF, metallic/roughness, render-mode variants (Opaque/Cutout/Transparent) |
| Shadows | RT ray-query sun shadows (default); 4-cascade PSSM CSM retained as an A/B toggle (per-cascade GPU cull, PCF) |
| Volumetric Fog | Wronski froxel grid — light injection → integrate → temporal resolve; optional per-froxel RT fog shadows |
| Ambient Occlusion | GTAO half-res compute (prefilter → integrate → bilateral denoise) |
| IBL | HDR skybox, diffuse irradiance + pre-filtered specular + BRDF LUT, split-sum ambient |
| Anti-Aliasing | TAA (Karis14 YCoCg-clip recipe) + specular AA (Tokuyoshi 2019) |
| GPU Culling | Compute frustum cull per cascade + main scene, indirect draws everywhere |
| Bindless | Buffer device address + one global 16384-texture array + 32-slot sampler array; integer material/texture indices |
| Post-Processing | HDR pipeline, bloom, ACES + AgX / AgX Punchy tonemap operators, vignette, grain, chromatic aberration |
| Shaders | Single-stage SPIR-V asset pipeline with UUIDs, hot-reload, SPIRV-Cross reflection |
| Pipeline Cache | Disk-persisted, lazy variant creation, targeted hot-reload invalidation |
| Mipmaps | Per-texture pipeline with sampler maxLod control |
| Sampling | Fiber-parallel keyframe evaluation |
| GPU Skinning | Bone matrix SSBO, vertex shader skinning |
| Blending | SQT interpolation, crossfade transitions, layered override with bone masks |
| Root Motion | Automatic extraction and application to entity transform |
| Debug | Bone overlay visualization in editor viewport |
| Backend | Jolt Physics 5.5.0, jobified onto the fiber scheduler |
| Rigid Bodies | Static / Kinematic / Dynamic with CCD, primitive + ConvexHull + Mesh shapes |
| Materials | UUID-keyed friction / restitution / density with hot-reload |
| Character Controller | Kinematic capsule via JPH::CharacterVirtual, default stair + stick-to-floor |
| Queries | Raycast + Overlap (box / sphere / capsule), layer-mask filtered |
| Events | Contact + trigger Add / Remove, drained per frame |
| Debug Draw | Wire colliders colored by motion state or character ground state |
| Asset Database | UUID-based registry with .meta sidecars, importers for shaders/textures/models/materials/animations |
| Smart Import | Multi-strategy texture discovery, drag-and-drop with eager import, texture remap dialog |
| Hot Reload | FileWatcher-based live reload for shaders, textures, and project files |
| Scene Format | Custom JSON .luth format with dirty tracking and native file dialogs |
| Scene Interaction | Mouse picking (ID buffer), selection outlines with occluded fade, shade modes (Lit/Wireframe/Unlit) |
| Inspector | Material editor, animation controls, light/shadow settings, Add Component workflow |
| Inspector Preview | Live orbit-camera 3D preview for Material/Model assets |
| Play Mode | Editing/Playing/Paused state machine, JSON scene snapshot, animation gating, transport bar |
| Game Panel | Dedicated camera-driven runtime view with letterbox, no overlays |
| Project Panel | Folder navigation, search, hot reload, context menus for entity/primitive creation |
| Thumbnails | Rendered previews for textures/meshes/materials in Project panel |
| Undo / Redo | Command pattern with UUID-based entity resolution, gizmo drag coalescing, compound commands, material snapshot undo |
| Frame Debugger | Freeze a frame, scrub through every draw, replay any single one to see what it did |
| Profiler | Per-system timing breakdown with fiber-aware instrumentation |
| Persistence | Window layouts, editor settings, and panel state saved across sessions |
See the full development roadmap for completed phases and version history.
Rendering — Material system overhaul (transparency, cutout, emissive, unified raster/RT eval), GPU particle system
Gameplay — Scripting (C#/Lua), prefab system, ragdoll, animation blend trees & IK
Editor — Asset streaming, node-based material editor
LUTH Engine is built on the shoulders of giants:
| Vulkan SDK | Rendering backend |
| VMA | Vulkan memory allocator |
| shaderc | Runtime GLSL → SPIR-V compilation (ships with Vulkan SDK) |
| SPIRV-Cross | Shader reflection |
| EnTT | Entity-Component-System |
| ImGui | Editor GUI |
| ImGuizmo | Translate / rotate / scale gizmos |
| Tracy | Frame profiler |
| GLFW | Windowing + input |
| GLM | Math |
| spdlog | Logging |
| assimp | Model importing |
| stb_image | Image loading |
| nlohmann/json | JSON serialization |
| Jolt Physics | Rigid body physics, jobified onto the fiber scheduler |
Released under the MIT License.