This is Hybrid 3D.
Built from AI depth + custom stereo logic —
Designed for cinema in VR.
Click to download or support the project 💙
- Key Features
- Guide Sheet: Install
- Guide Sheet: GUI Inputs
- Troubleshooting
- Dev Notes
- Acknowledgments & Credits
- CUDA + PyTorch accelerated parallax shifting — per-pixel, depth-aware stereo warping
- Built on the VisionDepth3D Method, featuring:
- Pop-Control depth shaping — percentile stretch → subject recenter → signed-gamma curve
- Subject-anchored zero-parallax (EMA stabilized) — histogram/percentile subject tracking
- Dynamic parallax scaling — central depth variance → auto stereo intensity
- Edge-aware shift suppression — gradient → sigmoid feather masking prevents halos
- Floating window stabilization — auto side masks with jitter-clamped offsets
- Matte sculpting + temporal smoothing — distance transform + EMA to round/steady subjects
- Motion-aware DOF — subject-locked Gaussian pyramid for smooth bokeh
- Gradient-based occlusion healing — fills stereo gaps by blending warped + blurred data
- Export formats: Half-SBS, Full-SBS, VR (equirectangular), Anaglyph, Passive Interlaced
- Live preview overlays: shift heatmaps, edge masks, stereo difference maps
- Workflow upgrades in v3.6:
- Direct Left/Right eye export (no split step needed)
- Clip-range rendering (process just the section you need)
- Extra padding + edge reflection reduce bleed-through
- Fixed NVENC presets (no more forced
-preset slow
)
- Fully interactive: dynamic sliders, hotkey presets, real-time 3D preview, batch-ready pipeline
- FFmpeg streaming pipeline: NVENC / AMF / QSV / CPU with CRF/CQ control — no temp files
Result: The Stereo Composer has matured into a production-ready 3D engine — blending pixel-accurate warping with real-time preview, advanced parallax controls, and streamlined export for cinema, VR, or streaming.
- Supports 25+ depth models including:
ZoeDepth
,Depth Anything V1/V2
,MiDaS
,DPT (BEiT)
,DepthPro
,DINOv2
,Distill-Any-Depth
, Marigold Diffusion, and new additions like Depth Anything V2 Giant. - One-click model switching with auto-download + local caching — no CLI setup required.
- GPU-accelerated inference backends:
- PyTorch (Transformers / TorchHub)
- ONNXRuntime with CUDA / TensorRT
- Diffusers (FP16) for Stable Diffusion–based models (Marigold, etc.)
- Batch-ready pipeline:
- Process image folders
- Process full videos (frame extraction → depth inference → encode)
- 16-bit depth output support for richer disparity maps (Marigold / Diffusers)
- Preserves high-precision for inversion and HDR workflows
- FFmpeg-based MKV/PNG export
- Depth Blender integration (new in v3.6):
- Blend outputs from multiple models in real time for cleaner separation and smoother parallax
- Built-in colormaps & preview modes: Viridis, Inferno, Magma, Plasma, Grayscale.
- Smart batching with
get_dynamic_batch_size()
adapts to your GPU VRAM automatically. - Resolution-safe ONNX engine:
- Detects static input shapes (e.g.,
518x518
) - Warm-up patch avoids shape mismatch crashes
- Detects static input shapes (e.g.,
- Video frame interpolation (RIFE) supported for smoother previews and exports.
- AV1 safeguard: auto-detects unsupported codecs with ffprobe fallback + warning.
Result: Depth Estimation in VD3D is now faster, more stable, and more flexible — with expanded model support, precision 16-bit outputs, and the new Depth Blender pipeline for professional-quality depth maps.
- Integrated RIFE (ONNX) for frame interpolation
- PyTorch-free, CUDA-accelerated
- Supports 2x, 4x, 8x FPS doubling
- Preview or export directly from GUI
- Integrated Real-ESRGAN x4 (ONNX) for super-resolution
- GPU accelerated with fp16 inference
- Upscale 720p → 1080p, 1080p → 4K, or custom targets
- Matches resolution of depth/interpolated frames automatically
- Massive speed boost in v3.6:
- RIFE, ESRGAN, and FFmpeg writing now run concurrently
- Render times dropped from 10h → ~1h on long clips
- Intelligent frame indexing and buffering keep exact sync
- Batch-ready pipeline:
- Works on raw image folders or full videos
- Auto-reassembles videos with original frame count, resolution, audio, and aspect ratio
- VRAM-aware batching: dynamically adjusts batch size (1–8 frames) for stability
- FFmpeg NVENC integration:
- GPU codec support with proper presets
- AV1/H.264/H.265 export with faststart flags
- Live feedback in GUI:
- Progress bar, FPS, ETA, and logging for long renders
- Cancel/resume safe
Result: Upscaling & interpolation in VD3D are now faster, cleaner, and more flexible — letting you tackle full-length projects in a fraction of the time, without sacrificing visual quality.
- Blend depth maps from multiple models (e.g., DA2, ZeoDepth, Marigold) into a single, cleaner map.
- Frame or video mode:
- Batch process paired frame folders (PNG)
- Or process two full video files side by side
- Live Preview & Frame Scrubber:
- Side-by-side preview (
V2 Base | Blended Output
) - Scrubbable timeline with Next/Prev buttons
- Hot-reload when adjusting sliders
- Side-by-side preview (
- GPU-accelerated blending (PyTorch CUDA) with CPU fallback.
- Adjustable parameters (all sliders update preview live):
- White Strength
- Feather Blur (kernel size)
- CLAHE Clip Limit & Tile Grid
- Bilateral Filter (d, sigmaColor, sigmaSpace)
- Smart normalization: matches brightness/contrast of blended output back to the base map.
- Whites boosting & outlier suppression: reduces halos and preserves fine detail.
- Batch mode options:
- Overwrite V2 (frames mode)
- Write to new output folder or video file
- Scrubber & hotkeys: Left/Right arrows nudge frame index for testing blends quickly.
- Output scaling: optional width/height override with high-quality Lanczos resize.
- Robust logging & progress bar with stop/resume support.
Result: Depth Blender combines strengths of two depth models, smoothing out fuzz, suppressing white-edge artifacts, and giving creators more consistent 3D parallax across full sequences.
-
Three modes:
- Rip → extract audio tracks from videos
- Attach → re-mux or re-encode audio back into matching videos
- Attach + Stitch → auto-attach per-clip audio and stitch final video in one step
-
Flexible audio matching:
- Auto-match audio files from a folder to videos by filename/index
- Or choose a single audio file for all videos
-
Sync offset control:
- GUI slider to shift audio ±10s for real-time sync correction
-
Codec and format options:
- Rip:
copy, aac, mp3, opus, flac, wav, ac3, eac3
(bitrate configurable) - Attach: choose re-encode or fast copy for both video and audio
- Final encode: full control over vcodec (
libx264
,libx265
,h264_nvenc
,hevc_nvenc
), CRF/CQ, preset, acodec, and bitrate
- Rip:
-
Batch processing:
- Add multiple files or entire folders of videos
- Auto-naming and folder output for ripped/attached files
- Per-clip outputs + final stitched output
-
Gapless stitching:
- Normalizes fps, resolution, pixel format, and audio sample rate across clips
- Ensures seamless concatenation (no desync, no black frames)
-
Live progress + logging:
- Async FFmpeg runner with progress window
- Logs visible inside GUI while running
- Cancel/resume safe
Result: The Audio Tool has grown from a simple rip/attach utility into a pro-level sync and mux suite — with full codec control, offset correction, and reliable batch stitching, all inside the VD3D workflow.
(Live 3D Preview with Anaglyph and Parallax Controls)
- Real-time preview: Interlaced, HSBS, Depth Heatmap
- On-frame previews with convergence + parallax tuning
- Preview exports as images – no temp videos needed
- Save Preview Frames to show off effects with different settings
- Language support: EN, FR, ES, DE, JA
- New Help Menu bar and Hotkeys
- Responsive multi-tab Tkinter interface with persistent settings
- Full GPU render control: pause, resume, cancel
- Codec selector with NVENC options (H.264, HEVC, AV1-ready)
- Formats: Half-SBS, Full-SBS, VR180, Anaglyph, Passive Interlaced
- Aspect Ratios: 16:9, 2.39:1, 2.76:1, 4:3, 21:9, 1:1, 2.35:1
- Export formats: MP4, MKV, AVI
- Codec support: XVID, MP4V, MJPG, DIVX, FFmpeg NVENC
- ✔️ This program runs on python 3.12
- ✔️ This program has been tested on cuda 12.8
- ✔️ Conda (Optional, Recommended for Simplicity)
- 1️⃣ Download the VisionDepth3D zip file from the official download source. (green button)
- 2️⃣ Extract the zip file to your desired folder (e.g., c:\user\VisionDepth3D).
- 3️⃣ Download models Here and extract weights folder into VisionDepth3D Main Folder
- 4️⃣ Download Distill Any Depth onnx models here (if you want to use it) and put the Distill Any Depth Folder into Weights Folder
- 1️. press (Win + R), type cmd, and hit Enter.
- 2. Clone the Repository (Skip the git clone if you downloaded the ZIP and start from cd)
git clone https://github.com/VisionDepth/VisionDepth3D.git cd C:\VisionDepth3D-main pip install -r requirements.txt
- continue to step 3: installing pytorch with cuda
- Update 'Start_VD3D_Windows.bat' script file
- Double click the Script to launch VD3D
(Automatically manages dependencies & isolates environment.)
- 1. Clone the Repository (Skip the git clone if you downloaded the ZIP and start from cd)
- 2. Create the Conda Environment
To create the environment, copy and past this in conda to run:
git clone https://github.com/VisionDepth/VisionDepth3D.git cd VisionDepth3D-main conda create -n VD3D python=3.12 conda activate VD3D pip install -r requirements.txt
🔍 Find Your CUDA Version: Before installing PyTorch, check which CUDA version your GPU supports:
- 1️⃣ Open Command Prompt (Win + R, type cmd, hit Enter)
- 2️⃣ Run the following command:
nvcc --version
or
nvidia-smi
- 3️⃣ Look for the CUDA version (e.g., CUDA 11.8, 12.1, etc.)
Go to the official PyTorch website to find the best install command for your setup: 🔗 https://pytorch.org/get-started/locally/
install Pytorch-Cuda 12.8 or which CUDA version you are running
if you are running AMD GPU select CPU build
- Once all dependancies are installed update the batch script for system you are running and run the following command:
Start_VD3D_Conda.bat
# or
Start_VD3D_Linux.bat
# or
Start_VD3D_Windows.bat
Congrats you have successfully downloaded VisionDepth3D! This quick setup ensures you clone the repository, configure your environment, and launch the app — all in just a few simple steps.
-
Backup Your Weights
Move yourweights
folder out of the oldVisionDepth3D-main
directory. -
Download the Latest Version
Delete the old folder and extract or clone the updated version ofVisionDepth3D-main
. -
Restore Weights Folder
Place yourweights
folder back inside the newly downloaded main directory:
VisionDepth3D-main/weights
-
Update the Path in Startup Scripts
Open the startup script matching your platform:Start_VD3D_Windows.bat
Start_VD3D_Conda.bat
Start_VD3D_Linux.sh
Edit the script and replace any old folder path with the new path to your updated
VisionDepth3D-main
. -
Activate Conda Environment (if needed)
If you are using the Conda starter script:- Open a terminal or Anaconda Prompt.
- Run:
cd path/to/updated/VisionDepth3D-main Start_VD3D_Conda.bat
-
Launch the App
Once everything is in place, run the appropriate script or shortcut to launch VisionDepth3D with your latest settings.
Note: If you customized any configuration, backup those files before replacing folders. and if you run into import errors
pip install -r requirements.txt
inside opened terminal and that will fix any dependancie errors
Use the GUI to fine-tune your 3D conversion settings.
- Description: Sets the output video encoder.
- Default:
mp4v
(CPU) - Options:
mp4v
,XVID
,DIVX
– CPU-basedlibx264
,libx265
– High-quality software (CPU)h264_nvenc
,hevc_nvenc
– GPU-accelerated (NVIDIA)
- Description: Pops foreground objects out of the screen.
- Default:
6.5
- Range:
3.0
to8.0
- Effect: Strong values create noticeable 3D "pop" in close objects.
- Description: Depth for mid-layer transition between foreground and background.
- Default:
1.5
- Range:
-3.0
to5.0
- Effect: Smooths the 3D transition — higher values exaggerate depth between layers.
- Description: Shift depth for background layers (far away).
- Default:
-6.0
- Range:
-10.0
to0.0
- Effect: More negative pushes content into the screen (deeper background).
- Description: Applies a sharpening filter to the output.
- Default:
0.2
- Range:
-1.0
(softer) to1.0
(sharper) - Effect: Brings clarity to 3D edges; avoid over-sharpening to reduce halos.
- Description: Shifts the entire stereo image inward or outward to adjust the overall convergence point (zero-parallax plane).
- Default:
0.000
- Range:
-0.050
to+0.050
- Effect:
- Positive values push the image deeper into the screen (stronger positive parallax).
- Negative values pull the scene forward (increased pop-out effect).
- Tip: Use small increments like
±0.010
for subtle depth balancing.
- Description: Limits the maximum pixel displacement caused by stereo shifting, expressed as a percentage of video width.
- Default:
0.020
(2%) - Range:
0.005
to0.100
- Effect:
- Low values reduce eye strain but can flatten the 3D effect.
- High values create more dramatic depth but may introduce ghosting or artifacts.
- Best Use: Keep between
0.015
–0.030
for clean results.
- Description: Adjusts how strongly the 3D effect favors the subject's depth versus full-scene stereo balance.
- Default:
0.80
- Range:
0.00
to1.00
- Effect:
1.0
= Full parallax (strong 3D depth everywhere).0.0
= Subject stays fixed, depth minimized elsewhere.
- Use For: Tuning stereo focus around people or central motion while avoiding exaggerated background distortion.
- Codec: Choose GPU-accelerated encoders (
h264_nvenc
,hevc_nvenc
) for faster renders. - CRF (Constant Rate Factor):
- Default:
23
- Range:
0
(lossless) to51
(worst) - Lower values = better visual quality.
- Default:
- Checkbox: Stabilize Zero-Parallax (center-depth)
- Effect: Enables Dynamic Zero Parallax Tracking — the depth plane will automatically follow the subject’s depth to minimize excessive 3D warping.
- Function: Dynamically adjusts the zero-parallax plane to follow the estimated subject depth (typically the central object or character). This keeps key elements at screen depth, reducing eye strain and excessive parallax.
- Effect: Helps stabilize the 3D effect by anchoring the subject at screen level, especially useful for scenes with depth jumps or fast movement.
- Recommended for: Dialogue scenes, human-centric content, or anything where central focus should feel "on screen" rather than floating in depth.
- Description: Controls the inter-pupillary distance (IPD) scaling, effectively adjusting how strong the stereo separation feels.
- Default:
1.15
- Range:
0.50
to2.00
- Effect:
- Higher values exaggerate stereo depth (more 3D).
- Lower values flatten depth (safer for long viewing).
- Tip: Keep near
1.0–1.3
for natural results.
- Description: Adjusts the gamma curve for depth, controlling how depth “pops” across the scene.
- Default:
1.0
- Range:
0.5
to2.0
- Effect:
- Higher = stronger pop, can over-accentuate close objects.
- Lower = smoother, flatter depth distribution.
- Description: Locks the subject’s depth position relative to the zero-parallax plane.
- Default:
1.30
- Range:
1.0
to2.0
- Effect:
- Prevents subject from drifting too deep or too far out.
- Useful for keeping faces/characters consistently anchored.
- Description: Extra multipliers for pushing foreground or background layers.
- Default:
FG: 1.20
,BG: 1.10
- Range:
0.5
to2.0
- Effect:
- FG Push emphasizes pop-out.
- BG Push exaggerates scene depth.
- Best Use: Subtle tweaks to fine-tune stereo balance.
-
Saturation:
- Default:
1.35
- Adjusts color intensity. >1 = more vivid, <1 = muted.
- Default:
-
Brightness:
- Default:
0.04
- Fine-tunes exposure; small values recommended.
- Default:
-
Contrast:
- Default:
1.10
- Enhances separation between light/dark regions.
- Default:
-
Effect: These adjustments let you preview and render with tuned color grading before upscaling or final encoding.
-
Tip: Avoid extreme values to prevent clipping or oversaturation.
- Checkbox: Enable Floating Window
- Effect: Shifts the visible “window” edges of the stereo image inward.
- Purpose: Prevents window violations (objects being cut off by screen edges).
- Recommended for: Full-screen 3D playback, cinema, or VR headsets.
- Match resolution and FPS between your input video and depth map.
- Use the Inverse Depth checkbox if bright = far instead of close.
- Recommended depth models:
Distill Any Depth
,Depth Anything V2
,MiDaS
,DPT-Large
, etc.- Choose Large models for better fidelity.
Clip Length | Estimated Time (with GPU) |
---|---|
30 seconds | 1–4 mins |
5 minutes | 10–25 mins |
Full Movie | 6–24+ hours |
- Select your depth model from the dropdown.
- Choose an output directory for saving results.
- Enable your preferred settings (invert, colormap, etc.).
- Set batch size depending on GPU/VRAM capacity.
(Tip: Resize your video or switch to a lighter model if memory is limited.) - Select your image / video / folder and start processing.
- Once the depth map video is generated, head over to the 3D tab.
- Input your original video and the newly created depth map.
- Adjust 3D settings for the preferred stereo effect.
- Hit "Generate 3D Video" and let it roll!
Use these models to clean up and enhance 3D videos:
- In the Upscale tab, load your 3D video and enable “Save Frames Only”.
- Input the width × height of the 3D video.
(No need to set FPS or codec when saving frames.) - Set batch size to
1
— batch processing is unsupported by some AI models. - Select AI Blend Mode and Input Resolution:
Mode | Blend Ratio (AI : Original) | Description |
---|---|---|
OFF | 100% : 0% | Full AI effect (only the ESRGAN result is used). |
LOW | 85% : 15% | Strong AI enhancement with mild natural tone retention. |
MEDIUM | 50% : 50% | Balanced mix for natural image quality. |
HIGH | 25% : 75% | Subtle upscale; mostly original with a hint of enhancement. |
Input Resolution | Processing Behavior | Performance & Quality Impact |
---|---|---|
100% | Uses full-resolution frames for AI upscaling. | ✅ Best quality. ❌ Highest GPU usage. |
75% | Slightly downsamples before feeding into AI. | ⚖️ Good balance. Minimal quality loss. |
50% | Halves frame size before AI. | ⚡ 2× faster. Some detail loss possible. |
25% | Very low-resolution input. | 🚀 Fastest speed. Noticeable softness — best for previews/tests. |
- Select your Upscale Model and start the process.
- Once done, open the VDStitch tab:
- Input the upscaled frame folder.
- Set the video output directory and filename.
- Enter the same resolution and FPS as your original 3D video.
- Enable RIFE FPS Interpolation.
- Set the RIFE multiplier to ×2 for smooth results.
(⚠️ Higher multipliers like ×4 may cause artifacts on scene cuts.) - Start processing — you now have an enhanced 3D video with upscaled clarity and smoother motion!
- Black/Empty Output: Wrong depth map resolution or mismatch with input FPS.
- Halo/Artifacts:
- Increase feather strength and blur size.
- Enable subject tracking and clamp the zero parallax offset.
- Out of Memory (OEM):
- Enable FFmpeg rendering for better memory usage.
- Use
libx264
orh264_nvenc
and avoid long clips in one go.
This tool is being developed by a solo dev with nightly grind energy (🕐 ~4 hours a night). If you find it helpful, let me know — feedback, bug reports, and feature ideas are always welcome!
Thank You!
A heartfelt thank you to all the researchers, developers, and contributors behind the incredible depth estimation models and open-source tools used in this project. Your dedication, innovation, and generosity have made it possible to explore the frontiers of 3D rendering and video processing. Your work continues to inspire and empower developers like me to build transformative, creative applications.
Model Name | Creator / Organization | Hugging Face Repository |
---|---|---|
Distil-Any-Depth-Large | xingyang1 | Distill-Any-Depth-Large-hf |
Distil-Any-Depth-Small | xingyang1 | Distill-Any-Depth-Large-hf |
Depth Anything V2 Large | Depth Anything Team | Depth-Anything-V2-Large-hf |
Depth Anything V2 Base | Depth Anything Team | Depth-Anything-V2-Base-hf |
Depth Anything V2 Small | Depth Anything Team | Depth-Anything-V2-Small-hf |
Depth Anything V1 Large | LiheYoung | Depth-Anything-V2-Large |
Depth Anything V1 Base | LiheYoung | depth-anything-base-hf |
Depth Anything V1 Small | LiheYoung | depth-anything-small-hf |
V2-Metric-Indoor-Large | Depth Anything Team | Depth-Anything-V2-Metric-Indoor-Large-hf |
V2-Metric-Outdoor-Large | Depth Anything Team | Depth-Anything-V2-Metric-Outdoor-Large-hf |
DA_vitl14 | LiheYoung | depth_anything_vitl14 |
DA_vits14 | LiheYoung | depth_anything_vits14 |
DepthPro | Apple | DepthPro-hf |
ZoeDepth | Intel | zoedepth-nyu-kitti |
MiDaS 3.0 | Intel | dpt-hybrid-midas |
DPT-Large | Intel | dpt-large |
DinoV2 | dpt-dinov2-small-kitti | |
dpt-beit-large-512 | Intel | dpt-beit-large-512 |
This project utilizes the FFmpeg multimedia framework for video/audio processing via subprocess invocation. FFmpeg is licensed under the GNU GPL v3 or LGPL, depending on how it was built. No modifications were made to the FFmpeg source or binaries — the software simply executes FFmpeg as an external process.
You may obtain a copy of the FFmpeg license at: https://www.gnu.org/licenses/
VisionDepth3D calls FFmpeg strictly for encoding, muxing, audio extraction, and frame rendering operations, in accordance with license requirements.