A comprehensive stereo matching toolbox for efficient development and research.
pip install stereo_toolbox
| Status | Identifier | Train | Val | Test | Noc. Mask | Description |
|---|---|---|---|---|---|---|
| β | SceneFlow_Dataset | 35K+ | 4.3K+ | - | β | The most famous synthetic dataset for stereo matching pre-training. |
| β | KITTI2015_Dataset | 200 | - | 200 | β | Driving scene dataset. |
| β | KITTI2012_Dataset | 192 | - | 195 | β | Driving scene dataset. |
| β | MiddleburyEval3_Dataset | 15 | - | 15 | β | Indoor and outdoor scene dataset. |
| β | ETH3D_Dataset | 27 | - | 20 | β | Indoor scene dataset with grayscale images. |
| β | DrivingStereo_Dataset | 174K+ | 7.7K+ | - | β | Driving scene dataset with diverse weathers (sunny, cloudy, foggy, rainy). |
| β | Middlebury2021_Dataset | 24 | - | - | β | Indoor scene dataset. The non-occulusion masks are obtained using LRC by StereoAnywhere. |
| β | Sintel_Dataset | 1.0K+ | - | - | β | Synthetic dataset derived from the open source 3D animated short film, Sintel. |
| β | HR_VS_Dataset | 780 | - | - | β | Synthetic dataset rendered by Carla simulator. |
| β | Booster_Dataset | 228 | - | - | β | Indoor dataset with specular and transparent surfaces. |
| β | CREStereo_Dataset | 200K | - | - | β | Synthetic dataset rendered by Blender with different shapes, lighting, texture, and smooth disparity distribution. |
| β | InStereo2k_Dataset | 2.0K+ | 50 | - | β | Indoor dataset with high accuracy disparity maps. |
| β | Argoverse_Dataset | 4.0K+ | 1.5K+ | 1.0K+ | β | Driving scene dataset with details at the near and far range. |
| β | MonoTrap_Dataset | - | 26 | - | β | Perspective illusion dataset specifically designed to challenge monocular depth estimation. |
| β | Holopix50k_Dataset | 41K+ | 4.9K+ | 2.4K+ | β | In-the-wild Dataset contributed by users of the Holopixβ’ mobile social platform. |
| β | FallingThings_Dataset | 61K+ | 0 | 0 | β | Synthetic dataset with object models and backgrounds of complex composition and high graphical quality. |
| β | VirtualKITTI2 | 21K+ | 0 | 0 | β | A more photo-realistic and better-featured version of the original virtual KITTI dataset. |
| β | LayeredFlow | β | 400 | 600 | β | Indoor and outdoor dataset with non-Lambertian objects. |
Dataloader Return:
- left image (color jitter if training)
- right image (color jitter and random mask if training)
- disparity ground-truth (nan if not available)
- non-occucluded mask (nan if not available)
- raw left image (not normalized)
- raw right image (not normalized)
| Status | Identifier | Train | Val | Test | Noc. Mask | Description |
|---|---|---|---|---|---|---|
| β | SceneFlow_Dataset | 35K+ | 4.3K+ | - | β | The most famous synthetic dataset for stereo matching pre-training. |
| β | KITTI2015_Dataset | 200 | - | 200 | β | Driving scene dataset. |
| β | KITTI2012_Dataset | 192 | - | 195 | β | Driving scene dataset. |
| β | MiddleburyEval3_Dataset | 15 | - | 15 | β | Indoor and outdoor scene dataset. |
| β | SCARED | 35 | - | β | Endoscopic (Porcine cadavers) datasets with structured light data. | |
| β | MiddleburyEval3_Dataset | 15 | - | 15 | β | Indoor and outdoor scene dataset. |
| β | ETH3D_Dataset | 27 | - | 20 | β | Indoor scene dataset with grayscale images. |
| β | DrivingStereo_Dataset | 174K+ | 7.7K+ | - | β | Driving scene dataset with diverse weathers (sunny, cloudy, foggy, rainy). |
Dataloader Return: Returns a dictionary containing the requested data types:
- ref (torch.Tensor): Reference image in CHW format, values in [0, 255].
- tgt (torch.Tensor): Target image in CHW format, values in [0, 255].
- gt_disp (torch.Tensor): Ground truth disparity map in H*W format, with 0 indicating invalid pixels.
- noc_mask (torch.Tensor): Non-occluded mask in H*W format, with Flase for occluded and True for non-occluded pixels.
- raw_ref (torch.Tensor): Unaugmented reference image in CHW format, values in [0, 255].
- raw_tgt (torch.Tensor): Unaugmented target image in CHW format, values in [0, 255].
- ref_filename (str): Filename of the reference image.
- top_pad (int): Number of pixels padded at the top during testing.
- right_pad (int): Number of pixels padded on the right during testing.
| Status | Identifier | Architecture | Description |
|---|---|---|---|
| β | PSMNet | 3D Conv. | CVPR 2018, change .cuda() to .to(x.device), optimize the cost volume building. |
| β | GwcNet | 3D Conv. | CVPR 2019, two models GwcNet_G and GwcNet_GC. |
| β | GANet | 3D Conv. | CVPR 2019, need to compile |
| β | AANet | 2D Conv. | CVPR 2020, need to compile. |
| β | DSMNet | 3D Conv. | ECCV 2020, need to compile. |
| β | CFNet | 3D Conv. | CVPR 2021, mish avtivation function only, return pred1_s2 only when evaluation. |
| β | STTR | Transformer | ICCV 2021, return output['disp_pred'] only when evaluation. |
| β | RaftStereo | Iterative | 3DV 2021, add default self.args in __init__(), reset left as positive direction (i.e. invert all outputs). |
| β | ACVNet | 3D Conv. | CVPR 2022. |
| β | CREStereo | Iterative | CVPR 2022, implemented by MegEngine. |
| β | PCWNet | 3D Conv. | ECCV 2022, rename class PWCNet as PCWNet, two models PCWNet_G and PCWNet_GC, mish avtivation function only, return disp_finetune only when evaluation. |
| β | IGEVStereo | 3D Conv. + Iterative | CVPR 2023, timm==0.5.4. |
| β | GMStereo(UniMatch) | Transformer | T-PAMI 2023, return results_dict['flow_preds'][-1] only when evaluation. |
| β | CroCoStereo(CroCov2) | Transformer | ICCV 2023, set overlap=0.7 |
| β | SelectiveStereo | 3D Conv. + Iterative | CVPR 2024, two models SelectiveRAFT and SelectiveIGEV, add default self.args in __init__(), timm==0.5.4. |
| β | MoChaStereo | Iterative | CVPR 2024. |
| β | NMRF | MRF | CVPR 2024. |
| β | MonSter | 3D Conv. + Iterative | CVPR 2025, add default self.args in __init__(), timm==0.5.4. |
| β | DEFOM-Stereo | Iterative | CVPR 2025, add default self.args in __init__(), note that the used depthanythingv2 has additional interpolation step, timm<=0.6.5 |
| β | FoundationStereo | 3D Conv. + Iterative | CVPR 2025 Best Paper Nomination, add default self.args in __init__(), timm==0.6.5. |
| β | StereoAnywhere | Iterative | CVPR 2025, integrate depthanythingv2 into the forward process. |
| β | IGEVStereoPlusPluss (IGEV++) | 3D Conv. + Iterative | T-PAMI 2025οΌ timm=0.5.4 |
- Unless specified, the maximum search disparity for cost volume filtering methods is set to 192.
- All predictions are output as a list during training, and only the final disparity map is output during inference.
- For all iterative methods, the default training and validation iterations are set to 22 and 32, respectively.
- Due to version dependency, please additionally install timm==0.5.4 and rename it to timm_0_5_4:
wget https://github.com/huggingface/pytorch-image-models/archive/refs/tags/v0.5.4.zip # unzip the zip file cd pytorch-image-models-0.5.4 # replace 'timm' in 'setup.py' with 'timm_0_5_4' # replace all the 'import timm' and 'from timm' with 'import timm_0_5_4' and 'from timm_0_5_4', respectively pip install .
| Status | Identifier | Description |
|---|---|---|
| β | photometric_loss | |
| β | smoothness_loss | |
| β | triplet_photometric_loss | CVPR 2023, NerfStereo. |
| β | single_modal_cross_entropy_loss | |
| β | multi_modal_cross_entropy_loss |
| Status | Identifier | Description |
|---|---|---|
| β | softargmax_disparity_estimator | ICCV 2017. |
| β | argmax_disparity_estimator | |
| β | unimodal_disparity_estimator | ICCV 2019. |
| β | dominant_modal_disparity_estimator | CVPR 2024. |
| Status | Identifier | Description |
|---|---|---|
| β | colored_disparity_map_Spectral_r | Disparity map pseudocolor visualization with Spectral_r colorbar. |
| β | colored_disparity_map_KITTI | Disparity map pseudocolor visualization with KITTI colorbar. |
| β | colored_error_map_KITTI | Error map pseudocolor visualization with KITTI colorbar. |
| β | colored_pointcloud | Point cloud visualization with real color derived from left image. |
| Status | Identifier | Description |
|---|---|---|
| β | sceneflow_test | Evaluation on SceneFlow finalpass test set. EPE and outliers are reported. Valid disparity range 0~maxdisp-1, default 0~191. |
| β | generalization_eval | Test generalization performance on the training sets of KITTI 2015/2012, Middlebury Eval3, and ETH3D. Outliers in the occ, noc, and all regions are reported. Valid disparity range 0~maxdisp-1, default 0~191. |
| β | speed_and_memery_test | Test inference speed and memory usage. |
| β | drivingstereo_weather_test | Test generalization performance on different weathers of DrivingStereo test sets (half). |
| β | benchmark_submission | Generate zip files for submitting to benchmarks (KITTI 2015 (verified), KITTI 2012 (verified), MiddEval3, and ETH3D). |
Table 1: Evaluation on SceneFlow finalpass test set.
| Model | Checkpoint | EPE | 1px | 2px | 3px |
|---|---|---|---|---|---|
| PSMNet | pretrained_sceneflow_new.tar | 1.1572 | 11.2908 | 6.4028 | 4.7803 |
| GwcNet_GC | checkpoint_000015.ckpt | 0.9514 | 8.1138 | 4.6241 | 3.4730 |
| CFNet | sceneflow_pretraining.ckpt | 1.2879 | 10.7195 | 7.3116 | 5.9251 |
| STTRβ | sceneflow_pretrained_model.pth.tar | 4.5613 | 15.6220 | 12.3084 | 11.3189 |
| RAFTStereo | raftstereo-sceneflow.pth | 0.7863 | 7.7104 | 4.8658 | 3.7327 |
| ACVNet | sceneflow.ckpt | 0.6860 | 5.1409 | 2.9201 | 2.1832 |
| PCWNet_GC | PCWNet_sceneflow_pretrain.ckpt | 1.0391 | 8.1380 | 4.6462 | 3.5443 |
| IGEVStereo | sceneflow.pth | 0.6790 | 5.7491 | 3.7320 | 2.9069 |
| GMStereo | GMStereo-scale2-regrefine3-resumeflowthings-sceneflow | 0.6355 | 6.1353 | 3.4315 | 2.5237 |
| CroCoStereo | crocostereo.pth | 0.6822 | 5.1854 | 3.3273 | 2.6104 |
| SelectiveRAFT | sceneflow.pth | 0.6956 | 5.7341 | 3.7000 | 2.8816 |
| SelectiveIGEV | sceneflow.pth | 0.6048 | 5.3667 | 3.4717 | 2.6904 |
| MonSterβ‘ | sceneflow.pth | 0.5201 | 4.5608 | 2.9705 | 2.3052 |
| DEFOMStereo-Sβ‘ | defomstereo_vits_sceneflow.pth | 0.5592 | 5.9396 | 3.7223 | 2.8441 |
| DEFOMStereo-Lβ‘ | defomstereo_vitl_sceneflow.pth | 0.4832 | 5.4918 | 3.4421 | 2.6136 |
| FoundationStereo-Sβ‘ | 11-33-40/model_best_bp2.pth | 0.5165 | 4.0213 | 2.4983 | 1.9194 |
| FoundationStereo-Lβ‘ | 23-51-11/model_best_bp2.pth | 0.4966 | 3.6243 | 2.2180 | 1.7123 |
| StereoAnywhereβ‘ | sceneflow.tar | 0.9109 | 7.9459% | 5.0610 | 4.0071 |
| IGEV++ | sceneflow.pth | 0.6269 | 4.7347 | 2.8433 | 2.1624 |
- β w/o occluded mask input
- β‘employed the foundation model (DepthAnything v2).
Table 2: Generalization evaluation on four real-world training sets. For all datasets, we report the average error (EPE), outlier rates in occluded, non-occluded, and all regions. The outlier thresholds are set to 3, 3, 2, and 1 for KITTI 2015, KITTI 2012, Middlebury Eval3, and ETH3D, respectively.
| Model | Checkpoint | KITTI 2015 | KITTI 2012 | MiddEval3 | ETH3D | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EPE | Occ | Noc | All | EPE | Occ | Noc | All | EPE | Occ | Noc | All | EPE | Occ | Noc | All | ||
| PSMNet | pretrained_sceneflow_new.tar | 4.0584 | 47.6432 | 28.1250 | 28.4160 | 3.8022 | 63.1951 | 26.5022 | 27.3239 | 9.8662 | 62.2950 | 30.1842 | 34.5084 | 2.3997 | 28.5613 | 14.7393 | 15.3888 |
| GwcNet_GC | checkpoint_000015.ckpt | 2.3801 | 29.0696 | 12.1746 | 12.5331 | 1.7062 | 45.6458 | 11.9081 | 12.6712 | 6.0044 | 47.1304 | 20.4144 | 24.1094 | 1.9213 | 21.3749 | 10.4911 | 11.0878 |
| CFNet | sceneflow_pretraining.ckpt | 1.9798 | 16.4189 | 5.8712 | 6.0967 | 1.0334 | 30.2510 | 4.5758 | 5.1527 | 5.7162 | 44.5492 | 16.3307 | 20.2219 | 0.5862 | 11.8926 | 5.5666 | 5.8700 |
| STTR | sceneflow_pretrained_model.pth.tar | 2.1786 | 90.9327 | 6.8101 | 8.3029 | 2.8117 | 94.3034 | 7.1706 | 9.1719 | 8.6700 | 88.7300 | 19.3553 | 28.1827 | 2.2964 | 50.0450 | 15.8716 | 17.5654 |
| RAFTStereo | raftstereo-sceneflow.pth | 1.1283 | 12.6979 | 5.3413 | 5.5269 | 0.9098 | 28.3453 | 4.2900 | 4.8351 | 1.5231 | 27.9966 | 9.0575 | 11.9563 | 0.3614 | 6.0158 | 2.8471 | 3.0412 |
| ACVNet | sceneflow.ckpt | 2.5105 | 32.8509 | 11.2934 | 11.7108 | 2.0233 | 54.4658 | 12.9433 | 13.8876 | 6.2429 | 47.3617 | 22.0709 | 25.6607 | 2.4436 | 19.6435 | 8.6531 | 9.1933 |
| PCWNet_GC | PCWNet_sceneflow_pretrain.ckpt | 1.7777 | 14.9532 | 5.5273 | 5.7416 | 0.9589 | 30.2184 | 4.0734 | 4.6669 | 3.1463 | 37.9880 | 12.1703 | 15.8633 | 0.5284 | 11.6673 | 5.2792 | 5.5360 |
| IGEVStereo | sceneflow.pth | 1.1868 | 14.2606 | 5.5951 | 5.7924 | 1.0131 | 33.6624 | 4.9248 | 5.5936 | 1.5491 | 24.2787 | 7.2518 | 9.9079 | 0.7400 | 9.7601 | 4.0635 | 4.3856 |
| GMStereo | GMStereo-scale2-regrefine3-resumeflowthings-sceneflow | 1.1957 | 19.4742 | 5.3210 | 5.6365 | 1.1021 | 36.7635 | 4.7391 | 5.4961 | 2.1188 | 41.0546 | 12.7984 | 16.6042 | 0.4013 | 14.3825 | 5.5249 | 5.8759 |
| CroCoStereo | crocostereo.pth | 5.3563 | 44.5432 | 19.8355 | 20.2043 | 3.3760 | 41.0264 | 13.8920 | 14.4562 | - | - | - | - | - | - | - | - |
| NerfStereo-RAFTβ | raftstereo-NS.tar | 1.1330 | 14.6178 | 5.2269 | 5.4257 | 0.8592 | 26.9731 | 3.5119 | 4.0440 | 1.6247 | 31.0983 | 6.7877 | 10.3770 | 0.2992 | 8.3545 | 2.7778 | 3.0729 |
| SelectiveRAFT | sceneflow.pth | 1.2629 | 17.7190 | 6.0989 | 6.3532 | 1.0889 | 28.0310 | 4.9432 | 5.4576 | 1.6684 | 26.7379 | 7.5835 | 10.5572 | 0.3958 | 8.8286 | 3.8131 | 4.2670 |
| SelectiveIGEV | sceneflow.pth | 1.2124 | 13.8184 | 5.7032 | 5.8859 | 1.0068 | 31.8457 | 5.0626 | 5.6780 | 1.3974 | 22.5942 | 6.7270 | 9.1742 | 0.4373 | 9.8115 | 4.0689 | 4.4284 |
| MonSterβ‘ | sceneflow.pth | 0.8884 | 9.6433 | 3.3003 | 3.4495 | 0.7334 | 18.8246 | 3.0310 | 3.3710 | 0.9325 | 18.4153 | 5.8567 | 7.6997 | 0.2724 | 3.5259 | 1.3234 | 1.4525 |
| DEFOMStereo-Sβ‘ | defomstereo_vits_sceneflow.pth | 1.0819 | 13.6233 | 4.9982 | 5.1943 | 0.9024 | 23.5715 | 4.3982 | 4.8102 | 1.9487 | 23.8614 | 6.0614 | 8.7609 | 0.2733 | 4.9148 | 2.0263 | 2.1937 |
| DEFOMStereo-Lβ‘ | defomstereo_vitl_sceneflow.pth | 1.0725 | 12.5722 | 4.7921 | 4.9853 | 0.8433 | 21.9474 | 3.8260 | 4.2137 | 0.8884 | 20.6396 | 4.3891 | 6.9092 | 0.2533 | 5.1446 | 2.0820 | 2.2437 |
| ZeroStereo-RAFTβ | model.safetensors | 1.0306 | 11.1673 | 4.4509 | 4.6312 | 0.7484 | 20.5038 | 3.1816 | 3.5517 | 1.3451 | 23.8572 | 4.6174 | 7.5843 | 0.2346 | 6.3722 | 1.9073 | 2.2238 |
| ZeroStereo-IGEVβ | model_192.safetensors | 1.0061 | 10.5266 | 4.3593 | 4.5312 | 0.7394 | 19.4140 | 3.1647 | 3.5043 | 1.1126 | 21.2663 | 4.8955 | 7.3997 | 0.2297 | 6.2541 | 1.9331 | 2.1894 |
| FoundationStereo-Sβ‘ | 11-33-40/model_best_bp2.pth | 0.8812 | 9.3458 | 3.1114 | 3.2651 | 0.6646 | 16.5310 | 2.5713 | 2.8640 | 0.5377 | 10.6189 | 1.3074 | 2.7566 | 0.1612 | 2.5606 | 0.7094 | 0.7776 |
| FoundationStereo-Lβ‘ | 23-51-11/model_best_bp2.pth | 0.8746 | 8.1694 | 3.1106 | 3.2388 | 0.6692 | 15.3932 | 2.6666 | 2.9389 | 0.5060 | 8.5997 | 1.1179 | 2.2698 | 0.1517 | 2.2747 | 0.5694 | 0.6525 |
| StereoAnywhereβ‘ | sceneflow.tar | 0.9531 | 11.2782 | 3.6955 | 3.8666 | 0.8347 | 21.6785 | 3.5198 | 3.9184 | 1.2584 | 20.7357 | 5.3881 | 7.629 | 0.2362 | 4.2515 | 1.5314 | 1.7172 |
| IGEV++ | sceneflow.pth | 1.2942 | 16.6714 | 6.2537 | 6.4695 | 1.1315 | 34.6407 | 5.7610 | 6.4504 | 2.3679 | 26.2245 | 7.2190 | 10.0922 | 0.4844 | 5.8613 | 4.2269 | 4.2871 |
- β trained on extra synthetic/real data.
- β‘employed the foundation model (DepthAnything v2).
- CroCoStereo was trained with Middlebury and ETH3D datasets.
Table 3: Inference speed (s) and memory (MB) usage. Device: NVIDIA GeForce RTX 4090.
| Model | (480, 640) | (736, 1280) | (1088, 1920) | #Param. | #Learnable param. | |||
|---|---|---|---|---|---|---|---|---|
| Speed | Memory | Speed | Memory | Speed | Memory | (M) | (M) | |
| PSMNet | 0.0396 | 1787.69 | 0.1245 | 4956.50 | 0.2866 | 10687.22 | 5.22 | 5.22 |
| GwcNet_GC | 0.0386 | 1882.58 | 0.1326 | 5251.74 | 0.3093 | 11326.84 | 6.91 | 6.91 |
| CFNet | 0.0481 | 1966.13 | 0.1434 | 5374.05 | 0.3343 | 11526.54 | 23.05 | 23.05 |
| STTR | 0.1556 | 3036.80 | 0.8468 | 16588.08 | OOM | OOM | 2.51 | 2.51 |
| RAFTStereo | 0.1967 | 914.25 | 0.3624 | 2227.85 | 0.7613 | 4598.91 | 11.12 | 11.12 |
| ACVNet | 0.0494 | 2098.31 | 0.1664 | 6344.20 | 0.3848 | 14021.82 | 7.17 | 7.17 |
| PCWNet_GC | 0.0888 | 3067.07 | 0.2769 | 8629.70 | 0.6419 | 18680.02 | 35.94 | 35.94 |
| IGEVStereo | 0.2363 | 686.43 | 0.3501 | 1504.02 | 0.6741 | 2988.35 | 12.60 | 12.60 |
| GMStereo | 0.0571 | 937.78 | 0.2011 | 2792.14 | 0.6261 | 8412.21 | 7.35 | 7.35 |
| CroCoStereo | 0.1710 | 2292.61 | 1.1333 | 2319.24 | 3.1700 | 2370.83 | 437.42 | 437.42 |
| SelectiveRAFT | 0.1776 | 731.03 | 0.4253 | 1559.72 | 0.9899 | 3171.54 | 11.65 | 11.65 |
| SelectiveIGEV | 0.1853 | 600.90 | 0.3843 | 1406.60 | 0.8850 | 2895.57 | 13.14 | 13.14 |
| MonSter | 0.3375 | 2399.86 | 0.7188 | 3841.63 | 1.8735 | 6537.50 | 388.69 | 53.38 |
| DEFOMStereo-S | 0.1957 | 1062.00 | 0.3423 | 2424.38 | 0.8829 | 4886.10 | 43.29 | 18.51 |
| DEFOMStereo-L | 0.2483 | 2451.85 | 0.5966 | 4005.69 | 1.7410 | 6816.45 | 382.62 | 47.30 |
| FoundationStereo-S | 0.2792 | 4522.09 | 0.6896 | 7237.75 | 1.5627 | 12145.71 | 62.34 | 37.55 |
| FoundationStereo-L | 0.3327 | 2811.62 | 0.8369 | 5569.83 | 1.7758 | 10555.90 | 374.52 | 39.20 |
| StereoAnywhere | 0.4172 | 2785.66 | 0.8389 | 7810.07 | 2.6662 | 22433.76 | 346.75 | 11.43 |
| IGEV++ | 0.2881 | 741.00 | 0.4362 | 2066.62 | 0.8782 | 4619.21 | 14.53 | 14.53 |
Table 4: Generalization across different weathers. The outlier threshold is set to 3.
| Model | Checkpoint | Sunny | Cloudy | Rainy | Foggy | ||||
|---|---|---|---|---|---|---|---|---|---|
| EPE | Outliers | EPE | Outliers | EPE | Outliers | EPE | Outliers | ||
| PSMNet | pretrained_sceneflow_new.tar | 7.9699 | 40.1363 | 12.8784 | 43.9466 | 24.5091 | 56.187 | 31.558 | 69.6891 |
| GwcNet_GC | checkpoint_000015.ckpt | 2.2694 | 17.1220 | 3.5672 | 25.5583 | 4.9620 | 28.1909 | 3.3859 | 29.2295 |
| CFNet | sceneflow_pretraining.ckpt | 1.1168 | 4.6957 | 1.0915 | 5.3006 | 1.8753 | 12.4819 | 1.1242 | 5.5388 |
| STTR | sceneflow_pretrained_model.pth.tar | 2.6073 | 6.9961 | 2.4905 | 7.9241 | 7.8698 | 23.7624 | 2.2568 | 8.3199 |
| RAFTStereo | raftstereo-sceneflow.pth | 1.1015 | 4.2288 | 1.0457 | 4.1902 | 2.0409 | 12.7736 | 0.9909 | 3.0875 |
| ACVNet | sceneflow.ckpt | 2.5432 | 19.6405 | 4.1897 | 29.8733 | 12.3508 | 41.3112 | 5.8133 | 38.0457 |
| PCWNet_GC | PCWNet_sceneflow_pretrain.ckpt | 0.9841 | 3.5835 | 1.0074 | 3.6724 | 1.9833 | 10.5247 | 1.1282 | 5.1968 |
| IGEVStereo | sceneflow.pth | 1.0485 | 4.5893 | 1.1052 | 5.1544 | 2.2975 | 15.4724 | 1.0657 | 4.4922 |
| GMStereo | GMStereo-scale2-regrefine3-resumeflowthings-sceneflow | 1.3744 | 6.8031 | 1.3299 | 7.0328 | 2.9797 | 16.7326 | 1.5465 | 9.7642 |
| CroCoStereo | crocostereo.pth | 2.0420 | 8.2903 | 1.4104 | 5.7141 | 2.3852 | 16.8024 | 1.6361 | 6.3033 |
| NerfStereo-RAFTβ | raftstereo-NS.tar | 0.9003 | 2.8822 | 0.9145 | 2.9105 | 1.7485 | 10.2047 | 1.0682 | 3.9268 |
| SelectiveRAFT | sceneflow.pth | 1.1099 | 4.8376 | 1.0555 | 4.4836 | 1.8238 | 13.9435 | 0.9648 | 3.4256 |
| SelectiveIGEV | sceneflow.pth | 1.1242 | 5.0513 | 1.1139 | 5.2406 | 2.0507 | 13.5095 | 1.0679 | 4.1028 |
| MonSterβ‘ | sceneflow.pth | 0.9857 | 3.4775 | 0.9318 | 3.1687 | 1.1267 | 5.2665 | 1.1023 | 5.0289 |
| DEFOMStereo-Sβ‘ | defomstereo_vits_sceneflow.pth | 0.9678 | 3.7935 | 0.9836 | 4.0202 | 1.4416 | 12.9997 | 0.9787 | 3.4861 |
| DEFOMStereo-Lβ‘ | defomstereo_vitl_sceneflow.pth | 0.9740 | 3.6134 | 0.9970 | 3.7463 | 1.5175 | 13.5251 | 0.9330 | 2.8767 |
| ZeroStereo-RAFTβ | model.safetensors | 0.8423 | 2.6192 | 0.8457 | 2.4586 | 2.2867 | 13.6386 | 0.8388 | 1.7887 |
| ZeroStereo-IGEVβ | model_192.safetensors | 0.8487 | 2.5887 | 0.8379 | 2.3636 | 1.7912 | 13.0688 | 0.8483 | 1.8949 |
| FoundationStereo-Sβ‘ | 11-33-40/model_best_bp2.pth | 0.8651 | 2.7184 | 0.8791 | 2.5519 | 1.5783 | 13.4365 | 1.0134 | 3.3486 |
| FoundationStereo-Lβ‘ | 23-51-11/model_best_bp2.pth | 0.9427 | 3.1979 | 0.8918 | 2.6189 | 5.3590 | 26.9548 | 2.1145 | 5.6173 |
| StereoAnywhereβ‘ | sceneflow.tar | 0.9713 | 3.5070 | 0.9068 | 2.9285 | 1.3656 | 10.8506 | 0.9408 | 2.8968 |
| IGEV++ | sceneflow.pth | 1.1330 | 4.9870 | 1.1195 | 5.1955 | 3.3631 | 17.1510 | 1.2028 | 5.9750 |
We sincerely thank the authors of the models and datasets mentioned above.