Chongjie Ye5β Bohan Li6,7β Zhiguo Cao3β Wei Li1β Hao Zhao4,2,*β Ziwei Liu1,*
TL;DR: Light-X is a video generation framework that jointly controls camera trajectory and illumination from monocular videos.
teaser_compressed.mp4
Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key step toward generative modeling of real-world scenes is the joint control of camera trajectory and illumination, since visual dynamics are inherently shaped by both geometry and lighting. To this end, we present Light-X, a video generation framework that enables controllable rendering from monocular videos with both viewpoint and illumination control. 1) We propose a disentangled design that decouples geometry and lighting signals: geometry and motion are captured via dynamic point clouds projected along user-defined camera trajectories, while illumination cues are provided by a relit frame consistently projected into the same geometry. These explicit, fine-grained cues enable effective disentanglement and guide high-quality illumination. 2) To address the lack of paired multi-view and multi-illumination videos, we introduce Light-Syn, a degradation-based pipeline with inverse-mapping that synthesizes training pairs from in-the-wild monocular footage. This strategy yields a dataset covering static, dynamic, and AI-generated scenes, ensuring robust training. Extensive experiments show that Light-X outperforms baseline methods in joint camera-illumination control and surpasses prior video relighting methods under both text- and background-conditioned settings.
git clone https://github.com/TQTQliu/Light-X.git
cd Light-X
conda create -n lightx python=3.10
conda activate lightx
pip install -r requirements.txt
Pretrained models are hosted on Hugging Face and load automatically during inference.
If your environment cannot access Hugging Face, you may download them manually:
-
Text-based / background-image lighting: tqliu/Light-X
-
HDR / reference-image lighting (also supports text/bg): tqliu/Light-X-Uni
After downloading, specify the local model directory using --transformer_path in inference.py.
Run inference using the following script:
bash run.shAll required models will be downloaded automatically.
We also provide EXAMPLE.md with commonly used commands and their corresponding visual outputs. Please refer to this file to better understand the purpose and effect of each argument.
The run.sh script executes inference.py with the following arguments:
python inference.py \
--video_path [INPUT_VIDEO_PATH] \
--stride [VIDEO_STRIDE] \
--out_dir [OUTPUT_DIR] \
--camera ['traj' | 'target'] \
--mode ['gradual' | 'bullet' | 'direct' | 'dolly-zoom'] \
--mask \
--target_pose [THETA PHI RADIUS X Y] \
--traj_txt [TRAJECTORY_TXT] \
--relit_txt [RELIGHTING_TXT] \
--relit_cond_type ['ic' | 'ref' | 'hdr' | 'bg'] \
[--relit_vd] \
[--relit_cond_img CONDITION_IMAGE] \
[--recam_vd]π₯ Camera
-
--camera: Camera control mode:traj: Move the camera along a trajectorytarget: Render from a fixed target view
-
--mode: Style of camera motion when rendering along a trajectory:gradual: Smooth and continuous viewpoint transition; suitable for natural, cinematic motionbullet: Fast forward-shifting / orbit-like motion with stronger parallaxdirect: Minimal smoothing; quickly moves from start to end posedolly-zoom: Hitchcock-style effect where the camera moves while adjusting radius; the subject stays the same size while the background expands/compresses
-
--traj_txt: Path to a trajectory text file (required when--camera trajis used) -
--target_pose: Target view<theta phi r x y>(required when--camera targetis used) -
--recam_vd: Enable video re-camera mode
See here for more details on camera parameters.
π‘ Relighting
--relit_txt: Path to a relighting parameter text file--relit_vd: Enable video relighting--relit_cond_type: Choose the lighting condition source:ic: IC-Light (text-based / background-based lighting)ref: Reference image lightinghdr: HDR environment map lightingbg: Background image lighting
--relit_cond_img: Path to the conditioning image (required forref/hdrmodes)
Download the dataset .
Generate the metadata JSON file describing the training samples:
python tools/gen_json.py -r <DATA_PATH>Then Update the DATASET_META_NAME in train.sh to the path of the newly generated JSON file.
bash train.shConvert the DeepSpeed ZeRO sharded checkpoint to a single fp32 file for inference.
Example (for step 16000):
python tools/zero_to_fp32.py train_outputs/checkpoint-16000 train_outputs/checkpoint-16000-out --safe_serialization
train_outputs/checkpoint-16000-outis the resulting fp32 checkpoint directory.
You can then pass this directory directly to the inference script:
python inference.py --transformer_path train_outputs/checkpoint-16000-outIf you find our work useful for your research, please consider citing our paper:
@article{liu2025light,
title={Light-X: Generative 4D Video Rendering with Camera and Illumination Control},
author={Liu, Tianqi and Chen, Zhaoxi and Huang, Zihao and Xu, Shaocong and Zhang, Saining and Ye, Chongjie and Li, Bohan and Cao, Zhiguo and Li, Wei and Zhao, Hao and others},
journal={arXiv preprint arXiv:2512.05115},
year={2025}
}
This work is built on many amazing open-source projects shared by TrajectoryCrafter, IC-Light, and VideoX-Fun. Thanks all the authors for their excellent contributions!
If you have any questions, please feel free to contact Tianqi Liu (tq_liu at hust.edu.cn).