Recovering Dynamic 3D Sketches from Videos

CVPR 2025
Seoul National University
1
2

Liv3Stroke🖊️ reconstructs dynamic sketches with deformable 3D Strokes directly from video frames.

Abstract

Understanding 3D motion from videos presents inherent challenges due to the diverse types of movement, ranging from rigid and deformable objects to articulated structures. To overcome this, we propose Liv3Stroke, a novel approach for abstracting objects in motion with deformable 3D strokes. The detailed movements of an object may be represented by unstructured motion vectors or a set of motion primitives using a pre-defined articulation from a template model. Just as a free-hand sketch can intuitively visualize scenes or intentions with a sparse set of lines, we utilize a set of parametric 3D curves to capture a set of spatially smooth motion elements for general objects with unknown structures. We first extract noisy, 3D point cloud motion guidance from video frames using semantic features, and our approach deforms a set of curves to abstract essential motion features as a set of explicit 3D representations. Such abstraction enables an understanding of prominent components of motions while maintaining robustness to environmental factors. Our approach allows direct analysis of 3D object movements from video, tackling the uncertainty that typically occurs when translating real-world motion into recorded footage.

Method

To align sketches with dynamic scenes, we define a sketch as a set of deformable 3D strokes and utilize vectorized curves (cubic Bézier curves) for the expression. Using the editing capabilities of vector graphics, we can effectively convey movements by shifting stroke positions and their control points. Before reconstructing moving sketches, we compute a dense guiding motion field since it is challenging to directly align strokes with input video frames. We first reconstruct 3D motion guidance which is dynamic point cloud by using an MLP as a deformation netowrk. This guidance serves as a rough initial 3D motion and location for dynamic 3D strokes. Based on this approach, we fit 3D strokes into movements. We model stroke movements as composition of two components: (1) per-stroke rigid transformations controlling position and orientation, and (2) control point adjustments that manage shape changes. We jointly optimize the sketch and the motion guidance using a rendering loss defined in perceptual space.

Results

Ref. views
CLIPasso
SketchVideo
Sugg. Contour
Ours

Comparison with baselines methods under moving camera trajectory.

Ref. view
CLIPasso
SketchVideo
Sugg. Contour
Ours

Comparison with baselines methods with fixed camera viewpoint.

Liv3Stroke can also capture 3D movements from the scene with changing light conditions.

Results of real-world scenarios.

Ablation Study

Ref. views
w/o guidance
w/o coarse
Ours (Full)
Ref. view
w/o Ltemps
w/o Lreg
Ours (Full)

Comparison of design choices in sketch reconstruction.

BibTeX

@inproceedings{lee2025recovering,
    title={Recovering Dynamic 3D Sketches from Videos},
    author={Lee, Jaeah and Choi, Changwoon and Kim, Young Min and Park, Jaesik},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2025}
}