This paper was submitted to ICASSP 2025.
This project is currently under active development. We are continuously updating and improving it, with more usage details and features to be released in the future.
- Download the LJSpeech dataset and place the dataset into
data/dataset
with structure looks like below:
data/dataset/LJSpeech-1.1
┣ metadata.csv
┣ wavs
┃ ┣ LJ001-0001.wav
┃ ┣ LJ001-0002.wav
┃ ┣ ...
┣ README
- Download the alignments of the LJSpeech dataset LJSpeech.zip. You have to unzip the files in
data/dataset/LJSpeech-1.1
- Download checkpoints (Coming Soon)
- Uncompress the checkpoint tar file and place the content into data/checkpoints/
python preprocessing.py
Train the VAE (Optional)
CUDA_VISIBLE_DEVICES=0 python drawspeech/train/autoencoder.py -c drawspeech/config/vae_ljspeech_22k.yaml
If you don't want to train the VAE, you can just use the VAE checkpoint that we provide.
- set the variable
reload_from_ckpt
indrawspeech_ljspeech_22k.yaml
todata/checkpoints/vae.ckpt
Train the DrawSpeech
CUDA_VISIBLE_DEVICES=0 python drawspeech/train/latent_diffusion.py -c drawspeech/config/drawspeech_ljspeech_22k.yaml
If you have trained the model using drawspeech_ljspeech_22k.yaml
, use the following syntax:
CUDA_VISIBLE_DEVICES=0 python drawspeech/infer.py --config_yaml drawspeech/config/drawspeech_ljspeech_22k.yaml --list_inference tests/inference.json
If not, please specify the DrawSpeech checkpoint:
CUDA_VISIBLE_DEVICES=0 python drawspeech/infer.py --config_yaml drawspeech/config/drawspeech_ljspeech_22k.yaml --list_inference tests/inference.json --reload_from_ckpt data/checkpoints/drawspeech.ckpt
This repository borrows codes from the following repos. Many thanks to the authors for their great work.
AudioLDM: https://github.com/haoheliu/AudioLDM-training-finetuning?tab=readme-ov-file#prepare-python-running-environment
FastSpeech 2: https://github.com/ming024/FastSpeech2
HiFi-GAN: https://github.com/jik876/hifi-gan