DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions

This paper was submitted to ICASSP 2025.

Status

This project is currently under active development. We are continuously updating and improving it, with more usage details and features to be released in the future.

Getting started

Download dataset and checkpoints

Download the LJSpeech dataset and place the dataset into data/dataset with structure looks like below:

data/dataset/LJSpeech-1.1
 ┣ metadata.csv
 ┣ wavs
 ┃ ┣ LJ001-0001.wav
 ┃ ┣ LJ001-0002.wav 
 ┃ ┣ ...
 ┣ README

Download the alignments of the LJSpeech dataset LJSpeech.zip. You have to unzip the files in data/dataset/LJSpeech-1.1
Download checkpoints (Coming Soon)
Uncompress the checkpoint tar file and place the content into data/checkpoints/

Preprocessing

python preprocessing.py

Training

Train the VAE (Optional)

CUDA_VISIBLE_DEVICES=0 python drawspeech/train/autoencoder.py -c drawspeech/config/vae_ljspeech_22k.yaml

If you don't want to train the VAE, you can just use the VAE checkpoint that we provide.

set the variable reload_from_ckpt in drawspeech_ljspeech_22k.yaml to data/checkpoints/vae.ckpt

Train the DrawSpeech

CUDA_VISIBLE_DEVICES=0 python drawspeech/train/latent_diffusion.py -c drawspeech/config/drawspeech_ljspeech_22k.yaml

Inference

If you have trained the model using drawspeech_ljspeech_22k.yaml, use the following syntax:

CUDA_VISIBLE_DEVICES=0 python drawspeech/infer.py --config_yaml drawspeech/config/drawspeech_ljspeech_22k.yaml --list_inference tests/inference.json

If not, please specify the DrawSpeech checkpoint:

CUDA_VISIBLE_DEVICES=0 python drawspeech/infer.py --config_yaml drawspeech/config/drawspeech_ljspeech_22k.yaml --list_inference tests/inference.json --reload_from_ckpt data/checkpoints/drawspeech.ckpt

Acknowledgement

This repository borrows codes from the following repos. Many thanks to the authors for their great work.
AudioLDM: https://github.com/haoheliu/AudioLDM-training-finetuning?tab=readme-ov-file#prepare-python-running-environment
FastSpeech 2: https://github.com/ming024/FastSpeech2
HiFi-GAN: https://github.com/jik876/hifi-gan

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data/dataset/metadata		data/dataset/metadata
drawspeech		drawspeech
.gitignore		.gitignore
README.md		README.md
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions

Status

Getting started

Download dataset and checkpoints

Preprocessing

Training

Inference

Acknowledgement

About

Releases

Packages

Languages

HappyColor/DrawSpeech_PyTorch

Folders and files

Latest commit

History

Repository files navigation

DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions

Status

Getting started

Download dataset and checkpoints

Preprocessing

Training

Inference

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages