Can inference be done at FP8? for both 1K and 2K models

People asking me to further reduce VRAM usage.

Currently 1K model uses 8.7 GB minimum with VAE offloading. 

If we could do inference at FP8 that would reduce VRAM usage significantly

I am using official SANA pipeline shared here