VisionDepth3D

The All-in-One 3D Suite for Creators

This is Hybrid 3D.
Built from AI depth + custom stereo logic —
Designed for cinema in VR.

Click to download or support the project 💙

All-in-One 3D Suite

Real-Time 3D Stereo Composer

(3D Tab)

CUDA + PyTorch-powered depth parallax shifting (pixel-accurate, per-pixel)
Based on the proprietary VisionDepth3D Method:
- Depth-weighted continuous parallax (FG/MG/BG zones blended via soft masks)
- Subject-aware zero parallax tracking (histogram mode-based convergence)
- Edge-aware shift suppression (gradient-based feather masking)
- Floating window stabilization (momentum-smoothed convergence)
- Scene-adaptive parallax dampening (variance-based intensity control)
- Real-time CUDA grid_sample stereo warping (left/right in one pass)
- Depth-of-field simulation + occlusion healing (multi-pass Gaussian blending)
Export formats: Half-SBS, Full-SBS, VR180, Anaglyph, Passive Interlaced
Live preview overlays: shift heatmaps, edge masks, stereo diff tools
Fully interactive: dynamic sliders, real-time 3D preview, and batch-ready pipeline

AI-Powered Depth Estimation (GPU Accelerated)

(Depth Estimation Tab)

Supports 25+ models including: ZoeDepth, Depth Anything V1/V2, MiDaS, DPT (BEiT), DepthPro, DINOv2, Distill-Any-Depth, and Marigold Diffusion.
One-click model switching with auto-downloading and local caching — no CLI or manual configs required.
GPU-accelerated inference via:
PyTorch (Transformers)
ONNXRuntime + CUDA/TensorRT
Diffusers (FP16) for Stable Diffusion-based depth like Marigold
Batch-ready pipeline for:
Image folders
Video files (frame-extract + depth + encode)
New 16-bit depth export path for Diffusers (Marigold) — supports inversion and FFmpeg-encoded MKV output.
Built-in colormaps (e.g., Viridis, Inferno, Magma, Plasma) + grayscale preview modes.
Smart batching with get_dynamic_batch_size() — adapts to your GPU VRAM automatically.
Resolution-safe ONNX engine:
Auto detects static input shapes (e.g. 518x518)
Patches dummy warm-up tensors to avoid shape mismatch crashes.
Supports video frame interpolation (RIFE) for smoother previews and export.
AV1 safeguard: auto-detects unsupported codecs with ffprobe fallback warning.

AI Upscaling Functions

(Frame Tools Tab)

Integrated RIFE ONNX model – PyTorch-free, real-time frame doubling
Supports 2x, 4x, 8x FPS interpolation
Processes raw image folders + auto video reassembly
Maintains frame count, resolution, audio sync, and aspect ratio
Preview and export using FFmpeg codecs (GUI-integrated)
Real-time progress, FPS, ETA feedback
Uses Real-ESRGAN x4, exported to ONNX with full CUDA acceleration
Intelligent VRAM-aware batching for 1–8 frames
Upscaling: 720p → 1080p, 1080p → 4K, or custom targets
Auto-scaling to match 3D or interpolated frame resolutions
Uses fp16 inference for clean, artifact-free output
Fully integrated into pipeline with FFmpeg NVENC export
GUI includes progress bar, FPS, ETA tracking

Audio to Video Sync

(Audio Tool)

Extract + reattach source audio using FFmpeg (GUI-based)
Format options: AAC, MP3, WAV (bitrate adjustable)
No shell access needed – fully built into GUI

Preview + Format Testing

(Live 3D Preview with Anaglyph and Parallax Controls)

Real-time preview: Interlaced, HSBS, Depth Heatmap
On-frame previews with convergence + parallax tuning
Preview exports as images – no temp videos needed
Save Preview Frames to show off effects with different settings

Smart GUI + Workflow

Language support: EN, FR, ES, DE
Responsive multi-tab Tkinter interface with persistent settings
Full GPU render control: pause, resume, cancel
Codec selector with NVENC options (H.264, HEVC, AV1-ready)
One-click launch – no pip or scripting required

Output Formats & Aspect Ratios

Formats: Half-SBS, Full-SBS, VR180, Anaglyph, Passive Interlaced
Aspect Ratios: 16:9, 2.39:1, 2.76:1, 4:3, 21:9, 1:1, 2.35:1
Export formats: MP4, MKV, AVI
Codec support: XVID, MP4V, MJPG, DIVX, FFmpeg NVENC

Guide Sheet: Install

📌 System Requirements

✔️ This program runs on python 3.12
✔️ This program has been tested on cuda 12.8
✔️ Conda (Optional, Recommended for Simplicity)

📌 Step 1: Download the VisionDepth3D Program

1️⃣ Download the VisionDepth3D zip file from the official download source. (green button)
2️⃣ Extract the zip file to your desired folder (e.g., c:\user\VisionDepth3D).
3️⃣ Download models Here and extract weights folder into VisionDepth3D Main Folder
4️⃣ Download Distill Any Depth onnx models here (if you want to use it) and put the Distill Any Depth Folder into Weights Folder

📌 Step 2: Create Env and Install Required Dependencies

🟢 Option 1: Install via pip (Standard CMD Method)

1️. press (Win + R), type cmd, and hit Enter.
2. Clone the Repository (Skip the git clone if you downloaded the ZIP and start from cd)
```
git clone https://github.com/VisionDepth/VisionDepth3D.git
cd C:\VisionDepth3D-main
pip install -r requirements.txt
```
- continue to installing pytorch with cuda and then run VisionDepth3D.bat

🔵 Option 2: Install via Conda (Recommended)

(Automatically manages dependencies & isolates environment.)

1. Clone the Repository (Skip the git clone if you downloaded the ZIP and start from cd)

2. Create the Conda Environment To create the environment, copy and past this in conda to run:

git clone https://github.com/VisionDepth/VisionDepth3D.git
cd VisionDepth3D-main
conda create -n VD3D python=3.12
conda activate VD3D
pip install -r requirements.txt

📌 Step 3: Check if CUDA is installed

🔍 Find Your CUDA Version: Before installing PyTorch, check which CUDA version your GPU supports:

1️⃣ Open Command Prompt (Win + R, type cmd, hit Enter)
2️⃣ Run the following command:

nvcc --version

or

nvidia-smi

3️⃣ Look for the CUDA version (e.g., CUDA 11.8, 12.1, etc.)

📌 Install PyTorch with the Correct CUDA Version

Go to the official PyTorch website to find the best install command for your setup: 🔗 https://pytorch.org/get-started/locally/

install Pytorch-Cuda 12.8 or which CUDA version you are running

if you are running AMD GPU select CPU build

🧰 TensorRT Setup Instructions (for VisionDepth3D on Windows)

Download TensorRT
- Go to the NVIDIA TensorRT download page, log in or register, and download the appropriate .zip version for your platform (Windows, CUDA-compatible).
Create the TensorRT folder
- Inside your VisionDepth3D directory, create a new folder:
```
VisionDepth3D\TensorRT
```
Extract the TensorRT ZIP
- Unzip the downloaded TensorRT package.
- From the extracted contents, copy the entire lib folder into your new TensorRT directory so the structure looks like:
```
VisionDepth3D\TensorRT\lib\*.dll
```
Edit the batch file to set the DLL path
- Open Start_VD3D_Windows.bat in a text editor.
- On the second line (just after the cd command), add the following:
```
SET "PATH=%~dp0TensorRT/lib;%PATH%"
```
- Save the file.

✅ This step temporarily adds the TensorRT DLL folder to your system PATH — just for the session started by the batch script.
No system-wide changes are made, keeping your environment clean.

Once Pytorch TensorRT and all dependancies are installed update the batch script for system you are running and run the following command:

Start_VD3D_Conda.bat
# or 
Start_VD3D_Linux.bat
# or 
Start_VD3D_Windows.bat

Congrats you have successfully downloaded VisionDepth3D! This quick setup ensures you clone the repository, configure your environment, and launch the app — all in just a few simple steps.

🛠️ Guide Sheet: Updating VisionDepth3D

When a new version of VisionDepth3D is released, follow these steps to ensure a smooth transition:

Update Instructions

Backup Your Weights
Move your weights folder out of the old VisionDepth3D-main directory.
Download the Latest Version
Delete the old folder and extract or clone the updated version of VisionDepth3D-main.
Restore Weights Folder
Place your weights folder back inside the newly downloaded main directory:
VisionDepth3D-main/weights
Update the Path in Startup Scripts
Open the startup script matching your platform:
- Start_VD3D_Windows.bat
- Start_VD3D_Conda.bat
- Start_VD3D_Linux.sh
Edit the script and replace any old folder path with the new path to your updated VisionDepth3D-main.
Activate Conda Environment (if needed)
If you are using the Conda starter script:
- Open a terminal or Anaconda Prompt.
- Run:
```
cd path/to/updated/VisionDepth3D-main
Start_VD3D_Conda.bat
```
Launch the App
Once everything is in place, run the appropriate script or shortcut to launch VisionDepth3D with your latest settings.

Note: If you customized any configuration, backup those files before replacing folders. and if you run into import errors
pip install -r requirements.txt
inside opened terminal and that will fix any dependancie errors

Guide Sheet: GUI Inputs

Use the GUI to fine-tune your 3D conversion settings.

1. Codec

Description: Sets the output video encoder.
Default: mp4v (CPU)
Options:
- mp4v, XVID, DIVX – CPU-based
- libx264, libx265 – High-quality software (CPU)
- h264_nvenc, hevc_nvenc – GPU-accelerated (NVIDIA)

2. Foreground Shift

Description: Pops foreground objects out of the screen.
Default: 6.5
Range: 3.0 to 8.0
Effect: Strong values create noticeable 3D "pop" in close objects.

3. Midground Shift

Description: Depth for mid-layer transition between foreground and background.
Default: 1.5
Range: -3.0 to 5.0
Effect: Smooths the 3D transition — higher values exaggerate depth between layers.

4. Background Shift

Description: Shift depth for background layers (far away).
Default: -6.0
Range: -10.0 to 0.0
Effect: More negative pushes content into the screen (deeper background).

5. Sharpness Factor

Description: Applies a sharpening filter to the output.
Default: 0.2
Range: -1.0 (softer) to 1.0 (sharper)
Effect: Brings clarity to 3D edges; avoid over-sharpening to reduce halos.

6. Convergence Offset

Description: Shifts the entire stereo image inward or outward to adjust the overall convergence point (zero-parallax plane).
Default: 0.000
Range: -0.050 to +0.050
Effect:
- Positive values push the image deeper into the screen (stronger positive parallax).
- Negative values pull the scene forward (increased pop-out effect).
Tip: Use small increments like ±0.010 for subtle depth balancing.

7. Max Pixel Shift (%)

Description: Limits the maximum pixel displacement caused by stereo shifting, expressed as a percentage of video width.
Default: 0.020 (2%)
Range: 0.005 to 0.100
Effect:
- Low values reduce eye strain but can flatten the 3D effect.
- High values create more dramatic depth but may introduce ghosting or artifacts.
Best Use: Keep between 0.015–0.030 for clean results.

8. Parallax Balance

Description: Adjusts how strongly the 3D effect favors the subject's depth versus full-scene stereo balance.
Default: 0.80
Range: 0.00 to 1.00
Effect:
- 1.0 = Full parallax (strong 3D depth everywhere).
- 0.0 = Subject stays fixed, depth minimized elsewhere.
Use For: Tuning stereo focus around people or central motion while avoiding exaggerated background distortion.

9. FFmpeg Codec & CRF Quality

Codec: Choose GPU-accelerated encoders (h264_nvenc, hevc_nvenc) for faster renders.
CRF (Constant Rate Factor):
- Default: 23
- Range: 0 (lossless) to 51 (worst)
- Lower values = better visual quality.

10. Stabilize Zero-Parallax (center-depth)

Checkbox: Stabilize Zero-Parallax (center-depth)
Effect: Enables Dynamic Zero Parallax Tracking — the depth plane will automatically follow the subject’s depth to minimize excessive 3D warping.
Function: Dynamically adjusts the zero-parallax plane to follow the estimated subject depth (typically the central object or character). This keeps key elements at screen depth, reducing eye strain and excessive parallax.
Effect: Helps stabilize the 3D effect by anchoring the subject at screen level, especially useful for scenes with depth jumps or fast movement.
Recommended for: Dialogue scenes, human-centric content, or anything where central focus should feel "on screen" rather than floating in depth.

Depth Map Tips

Match resolution and FPS between your input video and depth map.
Use the Inverse Depth checkbox if bright = far instead of close.
Recommended depth models:
- Distill Any Depth, Depth Anything V2, MiDaS, DPT-Large, etc.
- Choose Large models for better fidelity.

Rendering Time Estimates

Clip Length	Estimated Time (with GPU)
30 seconds	1–4 mins
5 minutes	10–25 mins
Full Movie	6–24+ hours

Example Workflow

Select your depth model from the dropdown.
Choose an output directory for saving results.
Enable your preferred settings (invert, colormap, etc.).
Set batch size depending on GPU/VRAM capacity.
(Tip: Resize your video or switch to a lighter model if memory is limited.)
Select your image / video / folder and start processing.
Once the depth map video is generated, head over to the 3D tab.
Input your original video and the newly created depth map.
Adjust 3D settings for the preferred stereo effect.
Hit "Generate 3D Video" and let it roll!

Post-Processing: RIFE + Real-ESRGAN (FPS + Upscale)

Use these models to clean up and enhance 3D videos:

In the Upscale tab, load your 3D video and enable “Save Frames Only”.
Input the width × height of the 3D video.
(No need to set FPS or codec when saving frames.)
Set batch size to 1 — batch processing is unsupported by some AI models.
Select AI Blend Mode and Input Resolution:

AI Blend Mode

Mode	Blend Ratio (AI : Original)	Description
OFF	100% : 0%	Full AI effect (only the ESRGAN result is used).
LOW	85% : 15%	Strong AI enhancement with mild natural tone retention.
MEDIUM	50% : 50%	Balanced mix for natural image quality.
HIGH	25% : 75%	Subtle upscale; mostly original with a hint of enhancement.

Input Resolution Setting

Input Resolution	Processing Behavior	Performance & Quality Impact
100%	Uses full-resolution frames for AI upscaling.	✅ Best quality. ❌ Highest GPU usage.
75%	Slightly downsamples before feeding into AI.	⚖️ Good balance. Minimal quality loss.
50%	Halves frame size before AI.	⚡ 2× faster. Some detail loss possible.
25%	Very low-resolution input.	🚀 Fastest speed. Noticeable softness — best for previews/tests.

Select your Upscale Model and start the process.
Once done, open the VDStitch tab:
- Input the upscaled frame folder.
- Set the video output directory and filename.
- Enter the same resolution and FPS as your original 3D video.
- Enable RIFE FPS Interpolation.
Set the RIFE multiplier to ×2 for smooth results.
(⚠️ Higher multipliers like ×4 may cause artifacts on scene cuts.)
Start processing — you now have an enhanced 3D video with upscaled clarity and smoother motion!

Troubleshooting

Black/Empty Output: Wrong depth map resolution or mismatch with input FPS.
Halo/Artifacts:
- Increase feather strength and blur size.
- Enable subject tracking and clamp the zero parallax offset.
Out of Memory (OEM):
- Enable FFmpeg rendering for better memory usage.
- Use libx264 or h264_nvenc and avoid long clips in one go.

Dev Notes

This tool is being developed by a solo dev with nightly grind energy (🕐 ~4 hours a night). If you find it helpful, let me know — feedback, bug reports, and feature ideas are always welcome!

Acknowledgments & Credits

Thank You!

A heartfelt thank you to all the researchers, developers, and contributors behind the incredible depth estimation models and open-source tools used in this project. Your dedication, innovation, and generosity have made it possible to explore the frontiers of 3D rendering and video processing. Your work continues to inspire and empower developers like me to build transformative, creative applications.

Supported Depth Models

Model Name	Creator / Organization	Hugging Face Repository
Distil-Any-Depth-Large	xingyang1	Distill-Any-Depth-Large-hf
Distil-Any-Depth-Small	xingyang1	Distill-Any-Depth-Large-hf
Depth Anything V2 Large	Depth Anything Team	Depth-Anything-V2-Large-hf
Depth Anything V2 Base	Depth Anything Team	Depth-Anything-V2-Base-hf
Depth Anything V2 Small	Depth Anything Team	Depth-Anything-V2-Small-hf
Depth Anything V1 Large	LiheYoung	Depth-Anything-V2-Large
Depth Anything V1 Base	LiheYoung	depth-anything-base-hf
Depth Anything V1 Small	LiheYoung	depth-anything-small-hf
V2-Metric-Indoor-Large	Depth Anything Team	Depth-Anything-V2-Metric-Indoor-Large-hf
V2-Metric-Outdoor-Large	Depth Anything Team	Depth-Anything-V2-Metric-Outdoor-Large-hf
DA_vitl14	LiheYoung	depth_anything_vitl14
DA_vits14	LiheYoung	depth_anything_vits14
DepthPro	Apple	DepthPro-hf
ZoeDepth	Intel	zoedepth-nyu-kitti
MiDaS 3.0	Intel	dpt-hybrid-midas
DPT-Large	Intel	dpt-large
DinoV2	Facebook	dpt-dinov2-small-kitti
dpt-beit-large-512	Intel	dpt-beit-large-512

Multimedia Framework

This project utilizes the FFmpeg multimedia framework for video/audio processing via subprocess invocation. FFmpeg is licensed under the GNU GPL v3 or LGPL, depending on how it was built. No modifications were made to the FFmpeg source or binaries — the software simply executes FFmpeg as an external process.

You may obtain a copy of the FFmpeg license at: https://www.gnu.org/licenses/

VisionDepth3D calls FFmpeg strictly for encoding, muxing, audio extraction, and frame rendering operations, in accordance with license requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 321 Commits
assets		assets
core		core
languages		languages
presets		presets
weights		weights
CLI-readme.md		CLI-readme.md
Changelog.md		Changelog.md
LICENSE.txt		LICENSE.txt
README.md		README.md
Start_VD3D_Conda.bat		Start_VD3D_Conda.bat
Start_VD3D_Linux_Conda.sh		Start_VD3D_Linux_Conda.sh
Start_VD3D_Windows.bat		Start_VD3D_Windows.bat
VisionDepth3D.py		VisionDepth3D.py
VisionDepth3D_Method.md		VisionDepth3D_Method.md
__init__.py		__init__.py
render_cli.py		render_cli.py
requirements.txt		requirements.txt

License

VisionDepth/VisionDepth3D

Folders and files

Latest commit

History

Repository files navigation