People asking me to further reduce VRAM usage. Currently 1K model uses 8.7 GB minimum with VAE offloading. If we could do inference at FP8 that would reduce VRAM usage significantly I am using official SANA pipeline shared here