Skip to content

ZGCTroy/RealCam-I2V

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RealCam-I2V

[ICCV'25] Official repo of "RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control".

🌟 News

  • 25/07/05: Release inference code and checkpoints of RealCam-I2V on CogVideoX 1.5 for exploration. The results we report in the paper are based on DynamiCrafter, for full reproduction and evaluation, please refer to our previous repo CamI2V.
  • 25/06/26: RealCam-I2V is accepted by ICCV 2025! 🎉🎉
  • 25/05/18: Release training code of RealCam-I2V on CogVideoX 1.5.
  • 25/03/26: Release our dataset RealCam-Vid v1 for metric-scale camera-controlled video generation!
  • 25/02/18: Initial commit of the project, we plan to release our DiT-based real-camera I2V models (e.g., CogVideoX) in this repo.

⚙️ Environment

Quick Start

apt install libgl1-mesa-glx libgl1-mesa-dri xvfb # for ubuntu
yum install -y mesa-libGL mesa-dri-drivers Xvfb  # for centos

conda create -n realcami2v python=3.12
conda activate realcami2v

conda install ffmpeg=7 -c conda-forge
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu126

💫 Inference

Download Pretrained Models

Download and put under pretrained folder the pretrained weights of CogVideoX1.5-5B-I2V, Metric3D and Qwen2.5-VL.

Download Model Checkpoints

Download our weights of RealCam-I2V and put under checkpoints folder. Please edit demo/models.json if you have a custom model path.

Run Gradio Demo

python gradio_app.py

🚀 Training

Prepare Dataset

Please access RealCam-Vid and download our dataset for training RealCam-I2V-CogVideoX-1.5. Please unzip all contents in data folder.

Launch

Edit example training script accelerate_train.sh if necessary and launch training by:

bash accelerate_train.sh

For CogVideoX 1.5, we precompute latents before training.

🤗 Related Repo

  • Our dataset, the first open-sourced, combining diverse scene dynamics with metric-scale camera trajectories, is available at RealCam-Vid.
  • Our previous work at CamI2V.
  • We have borrowed a lot of code from the original CogVideoX repository.

🗒️ Citation

@article{li2025realcam,
    title={RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control}, 
    author={Li, Teng and Zheng, Guangcong and Jiang, Rui and Zhan, Shuigen and Wu, Tao and Lu, Yehao and Lin, Yining and Li, Xi},
    journal={arXiv preprint arXiv:2502.10059},
    year={2025},
}

@article{zheng2025realcam,
    title={RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements}, 
    author={Zheng, Guangcong and Li, Teng and Zhou, Xianpan and Li, Xi},
    journal={arXiv preprint arXiv:2504.08212},
    year={2025},
}

@article{zheng2024cami2v,
    title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
    author={Zheng, Guangcong and Li, Teng and Jiang, Rui and Lu, Yehao and Wu, Tao and Li, Xi},
    journal={arXiv preprint arXiv:2410.15957},
    year={2024}
}

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published