Real time path tracer in DirectX 12 with unbiased ReSTIR PT on a unified reservoir, a four lobe layered BXDF, light tree importance sampling, modern upscaling/denoising technology with NVIDIA DLSS. All images shown are rendered in real time using DLSS Ray Reconstruction for denoising.
Supports OBJ and glTF/glB formats via tinyobjloader and tinygltf. Textures are loaded through stb_image with DDS decompression via DirectXTex. Models, materials, and per instance transforms are unified into a single scene representation.
A four lobe, energy conserving BXDF with layered evaluation:
- Sheen: Charlie NDF for fabric like surfaces
- Clearcoat: Dielectric GGX with independent roughness and Fresnel
- GGX Specular/Transmission: Anisotropic microfacet model with VNDF importance sampling, supporting reflection and refraction through nested dielectrics with per bounce IOR stack tracking
- Lambertian Diffuse: Cosine weighted base layer
A light tree built over the scene's emissive triangles provides efficient importance sampling in scenes with many lights. The tree uses a TLAS/BLAS hierarchy with precomputed visibility cones and geometric importance weights (receiver cosine, distance attenuation) to guide traversal. This allows the renderer to handle scenes with hundreds of emissive primitives without per light overhead.
Light tree builds on the CPU. When lights move or their brightness changes, the tree is refit/rebuilt asynchronously.
Path tracing, no ReSTIR (1 spp)
ReSTIR + DLSS Ray Reconstruction
Unbiased ReSTIR PT (reconnection shift only) on a unified DI/GI reservoir: NEE, environment miss, and path integrand candidates all feed one reservoir stream, with sentinel matIDs discriminating direct samples from indirect ones. Each path uses temporal and spatial reservoir resampling with pairwise MIS for unbiased combination of canonical and neighbor samples. Temporal permutation sampling decorrelates reuse patterns across frames, and the temporal mCap is modulated by a per pixel duplication map so highly shared samples refresh quickly instead of creating correlation artifacts (Lin et al. 2026 §5).
The path tracer uses the DXR HitObject API with Shader Execution Reordering (SER) for wavefront like coherence without an explicit wavefront architecture. The pipeline is split into discrete passes:
- Raygen: Primary rays, multi bounce path tracing with NEE. All candidates (NEE, env miss, path integrand) are written into a single unified DI/GI reservoir, keyed by sentinel matIDs. Thanks to SER, aggressive russian roulette sampling allows for 30+ bounces with barely any performance impact
- Temporal reuse: Pairwise MIS temporal resampling on the unified reservoir. Permutation sampling breaks up temporal correlations that become very apparent in the denoiser; the temporal mCap is adaptively lowered where the previous frame's duplication map shows high sample reuse
- Reuse texture partner select: A stack of three precomputed self inverting reuse textures (Lin, Kettunen, Wyman 2026 §3) gives every pixel a guaranteed symmetric spatial partner in a single texture load, replacing the usual neighbor search pass
- Spatial shift (raygen): Performs the reconnection shift and visibility rays for each partner slot, caching the shifted contribution F and Jacobian in per pixel scratch
- Spatial merge (compute): Pairwise MIS over the cached shifts
- Duplication map: Compute pass that scans each pixel's 17×17 neighborhood and counts matches of the packed V2 (reconnection vertex) identifier. Initially presented for Hybrid shift using seed, V2 prooved to be a cheap and simple proxy to distinguish samples in non hybrid shift environments.
- Shading: Final accumulation, motion vectors, and DLSS input preparation
- Postprocess: Tone mapping (PBR Neutral, currently disabled) and sRGB gamma correction
Alpha tested geometry (foliage, fences, etc.) uses Opacity Micromaps (OMMs) built with the NVIDIA OMM SDK. OMMs encode opacity per microtriangle into the BVH, allowing the hardware to skip transparent regions during traversal without invoking any hit shaders, significantly improving ray tracing performance on scenes with heavy alpha tested content.
NVIDIA DLSS Ray Reconstruction is used for denoising using NVIDIA Streamline. On supported GPUs, DLSS frame generation can be used to improve performance.
Following Müller et al. 2021: a small MLP trained online every frame predicts residual radiance from short path prefixes. Paths terminate at a fixed depth and query the cache for the remainder of the integral. The network runs through tiny-cuda-nn (tcnn) on a separate CUDA stream alongside the DX12 hybrid pipeline, sharing GPU memory through CUDA/D3D12 interop.
Network: fully fused MLP, 5 hidden × 64 neurons, ReLU activations, linear 3 channel output. 16 raw inputs (position, scattered direction, surface normal, roughness, diffuse + specular reflectance) expand to 74 dims through a composite encoding:
- Position -> HashGrid, 16 levels × 2 features, log₂ hashmap = 21, smoothstep interpolation
- Direction & normal -> Spherical Harmonics, degree 4
- Roughness -> OneBlob, 4 bins
- Reflectance -> Identity passthrough
Training: RelativeL2 loss (Müller 2021 §5) on a linear target. Sqrt/log target transforms produce systematic darkening through Jensen's inequality. Adam (lr 1e-3, β = 0.9 / 0.99, L2 reg 1e-6), 4 batches × 8192 records per frame trained asynchronously during the ReSTIR reuse passes. One row per path at a randomized depth, since multi row per path emission produces intra path correlated gradients that Adam's 2nd moment EMA absorbs, collapsing the effective learning rate on shared parameters. 1/16 of training pixels take long RR terminated paths to anchor emitter/miss radiance; the remainder use cache recursive multi bounce targets.
Engineering: tcnn lives behind a thin C++/CUDA wall in Pathtracer/rdn/NRC/; DXR/HLSL only ever sees the byte for byte buffer layout in NrcLayout.h, mirrored in Nrc_v8.hlsli. An auxiliary CUDA stream + events keep training off the render critical path, and an adaptive training tile size (4×4 to 32×32 per frame) keeps the trainer saturated independent of resolution.
What started as a port of the RoyalTracer university project to DirectX quickly became a standalone rendering engine. In my Bachelor's Thesis, I implemented and optimized ReSTIR to enhance the renderers' real time capabilities. Since then, the focus has shifted to implementing and evaluating state of the art algorithms for improving unbiased sampling efficiency.
- Windows 11 (recent version for DirectX Agility SDK support)
- NVIDIA RTX 40 series GPU or newer. Frame generation requires 40 series; core rendering may work on earlier RTX cards but is untested.
- Visual Studio 2022 build tools
cmake -B build -G "Visual Studio 17 2022" -DCMAKE_BUILD_TYPE=Release
cmake --build build --config ReleaseCLion (2024.3.2+) setup notes
- Set up the toolchain: select Visual Studio (should be detected automatically). Delete any other toolchain.
- Configure the CMake project: select Visual Studio as the toolchain. Name the build directory
cmake-build-debug-visual-studio. Select "use default" for the generator and "Release" for build type. - Delete the existing
cmake-build-debug-visual-studiodirectory if it exists. - Reload the CMake project (File > Reload CMake Project).
- Build and run. Includes are automatically placed in the build directory.
Visual Studio (2022+) setup notes
- Open the project folder.
- VS should automatically run CMake.
- Build and run.
| Input | Action |
|---|---|
| W / A / S / D | Move forward / left / back / right |
| Space | Ascend |
| Left Ctrl | Descend |
| Left mouse drag | Look around |
- Port NRC to DX12 cooperative vector intrinsics (latest Agility SDK preview) to remove the CUDA dependency
- Modular material system and light sampling for reduced register pressure in callable shaders
- Modular resampling for better performance
- Volume rendering
- Kulla, C., Conty Estevez, A. Importance Sampling of Many Lights with Adaptive Tree Splitting. HPG 2018. [PDF]
- Estevez, A., Kulla, C. Production Friendly Microfacet Sheen BRDF. SIGGRAPH 2017 Course. [PDF]
- Walter, B., Marschner, S. R., Li, H., Torrance, K. E. Microfacet Models for Refraction through Rough Surfaces. EGSR 2007. [PDF]
- Lin, D., Wyman, C., Yuksel, C. Generalized Resampled Importance Sampling: Foundations of ReSTIR. ACM TOG 2022. [Project]
- Wyman, C. et al. A Gentle Introduction to ReSTIR. SIGGRAPH 2023 Course. [Web]
- Lin, D., Kettunen, M., Wyman, C. ReSTIR PT Enhanced. 2026. (§3: paired reuse textures; §5: duplication map correlation reduction.)
- Müller, T., Rousselle, F., Novák, J., Keller, A. Real-time Neural Radiance Caching for Path Tracing. SIGGRAPH 2021. [Project]
- Lanz, M. Real-Time Path Tracing with ReSTIR. Bachelor's Thesis, 2025. [Writeup]
- Scenes: Amazon Lumberyard Bistro (NVIDIA ORCA), Crytek Sponza
- NVIDIA libraries: DLSS Streamline, OMM SDK
- Asset loaders & texturing: tinyobjloader, tinygltf, stb_image, DirectXTex
- UI: Dear ImGui