MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

Shuai Zhang^*, Bao Tang^*, Siyuan Yu^*, Yueting Zhu, Jingfeng Yao,
Ya Zou, Shanglin Yuan, Li Yu, Wenyu Liu, Xinggang Wang^📧

Huazhong University of Science and Technology (HUST)

(* equal contribution, 📧 corresponding author)

📰 News

[2025.11.27] We have released our paper on arXiv.

📄 Introduction

Compared with SVD-XT (1.5B), our 5.55× smaller MobileI2V (0.27B) achieves similar generation quality, using only 2.24s on mobile and running 199× faster on an A100 GPU.

🎯 Demo

(1) 1280×720×17 Image to Video

(2) 960×960×17 Image to Video

🎯 How to Use

Installation

You can install the required environment using the provided requirements.txt file.

pip install -r requirements.txt

Data Processing

There are many open source video datasets, such as Openvid, VFHQ and Celebv-text. The video should be cut into a fixed number of frames (such as 17 or 25...), and the video data should be filtered based on aesthetic (use DOVER) and optical flow scores (refer to OpenSora data Processing).

You should organize your processed train data into a CSV file, as shown below:

video_path,text,num_frames,height,width,flow
./_JnC_Zj_P7s_22_0to190_extracted.mp4,scenery,17,720,1080,3.529723644
./_JnC_Zj_P7s_22_0to190_extracted.mp4,scenery,17,720,1080,4.014187813

Train

You can use the provided ./train_scripts/train_i2v.sh script for training. The configuration file is located at: ./configs/mobilei2v_config/. Before training, download the weights for video-vae and qwen2-0.5B and replace the model path in the configuration file.

bash ./train_scripts/train_i2v.sh

Inference

You can use the provided ./test.sh script for inference. Provide a reference image or video (extract the first frame) to the asset/test.txt file and pass it to the --txt_file parameter.

CUDA_VISIBLE_DEVICES=0 python scripts/inference_i2v.py \
      --config=./configs/mobilei2v_config/MobileI2V_300M_img512.yaml \
      --save_path=humface_1126 \
      --model_path=./model/hybrid_371.pth \
      --txt_file=asset/test.txt \
      --flow_score=2.0 \

To achieve faster VAE decoder speeds, we replaced the LTX-Video decoder with the Turbo-VAED decoder.

Metrics

Refer to the FVD evaluation script in vidm.

python scripts/evaluate_FVD.py -dir1 path/gts -dir2 path/videos -b 1 -r 32 -n 128 -ns 16 -i3d ./i3d_torchscript.pt

🎯 Mobile Demo

We designed the mobile UI and deployed the model, as shown in the video below:

❤️ Acknowledgements

Our MobileI2V codes are mainly built with SANA and LTX-Video. The data processing workflow is based on OpenSora. Thanks for all these great works.

📝 Citation

If you find MobileI2V useful, please consider giving us a star 🌟 and citing it as follows:

@misc{MobileI2V,
      title={MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices}, 
      author={Shuai Zhang and Bao Tang and Siyuan Yu and Yueting Zhu and Jingfeng Yao and Ya Zou and Shanglin Yuan and Li Yu and Wenyu Liu and Xinggang Wang},
      year={2025},
      eprint={2511.21475},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.21475}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
asset		asset
configs		configs
diffusion		diffusion
output		output
sana		sana
scripts		scripts
tools		tools
train_scripts		train_scripts
README.md		README.md
data.csv		data.csv
i3d_torchscript.pt		i3d_torchscript.pt
requirements.txt		requirements.txt
test.sh		test.sh
test_image.png		test_image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

📰 News

📄 Introduction

🎯 Demo

(1) 1280×720×17 Image to Video

(2) 960×960×17 Image to Video

🎯 How to Use

Installation

Data Processing

Train

Inference

Metrics

🎯 Mobile Demo

❤️ Acknowledgements

📝 Citation

About

Uh oh!

Releases

Packages

Languages

hustvl/MobileI2V

Folders and files

Latest commit

History

Repository files navigation

MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

📰 News

📄 Introduction

🎯 Demo

(1) 1280×720×17 Image to Video

(2) 960×960×17 Image to Video

🎯 How to Use

Installation

Data Processing

Train

Inference

Metrics

🎯 Mobile Demo

❤️ Acknowledgements

📝 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages