[ICCV'25] Official repo of "RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control".
- 25/07/05: Release inference code and checkpoints of RealCam-I2V on CogVideoX 1.5 for exploration. The results we report in the paper are based on DynamiCrafter, for full reproduction and evaluation, please refer to our previous repo CamI2V.
- 25/06/26: RealCam-I2V is accepted by ICCV 2025! 🎉🎉
- 25/05/18: Release training code of RealCam-I2V on CogVideoX 1.5.
- 25/03/26: Release our dataset RealCam-Vid v1 for metric-scale camera-controlled video generation!
- 25/02/18: Initial commit of the project, we plan to release our DiT-based real-camera I2V models (e.g., CogVideoX) in this repo.
apt install libgl1-mesa-glx libgl1-mesa-dri xvfb # for ubuntu
yum install -y mesa-libGL mesa-dri-drivers Xvfb # for centos
conda create -n realcami2v python=3.12
conda activate realcami2v
conda install ffmpeg=7 -c conda-forge
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu126
Download and put under pretrained
folder the pretrained weights of CogVideoX1.5-5B-I2V, Metric3D and Qwen2.5-VL.
Download our weights of RealCam-I2V and put under checkpoints
folder.
Please edit demo/models.json
if you have a custom model path.
python gradio_app.py
Please access RealCam-Vid and download our dataset for training RealCam-I2V-CogVideoX-1.5. Please unzip all contents in data
folder.
Edit example training script accelerate_train.sh
if necessary and launch training by:
bash accelerate_train.sh
For CogVideoX 1.5, we precompute latents before training.
- Our dataset, the first open-sourced, combining diverse scene dynamics with metric-scale camera trajectories, is available at RealCam-Vid.
- Our previous work at CamI2V.
- We have borrowed a lot of code from the original CogVideoX repository.
@article{li2025realcam,
title={RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control},
author={Li, Teng and Zheng, Guangcong and Jiang, Rui and Zhan, Shuigen and Wu, Tao and Lu, Yehao and Lin, Yining and Li, Xi},
journal={arXiv preprint arXiv:2502.10059},
year={2025},
}
@article{zheng2025realcam,
title={RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements},
author={Zheng, Guangcong and Li, Teng and Zhou, Xianpan and Li, Xi},
journal={arXiv preprint arXiv:2504.08212},
year={2025},
}
@article{zheng2024cami2v,
title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
author={Zheng, Guangcong and Li, Teng and Jiang, Rui and Lu, Yehao and Wu, Tao and Li, Xi},
journal={arXiv preprint arXiv:2410.15957},
year={2024}
}