🎉 Discrete Neural Codec With 24 Tokens Per Second (24KHZ) for Spoken Language Modeling!
To install HHCodec, follow these steps:
conda create -n hhcodec python=3.10 # it must >3.10 beacause use bigvgan
conda activate hhcodec
git clone https://github.com/rongkunxue/HH-Codec.git
cd HH-Codec
pip install -e .
#if you want to eval by UTMOS
pip install pip==24.0
pip install fairseq
Ensure your dataset is preprocessed by following the instructions in dataset
Before starting training, update the configuration settings
# Open and modify the following file "configs/train.yaml"
# Adjust parameters such as:
# - log settings
# - train_path
# - save_dir
# - device (e.g., CPU/GPU)
Once the dataset is prepared and the configuration is set, launch the training process:
#We expect to finalize and open-source the training code within two weeks.
The HHCodec codebase is adapted from the following repositories:
A huge thanks to the authors of these projects for their outstanding contributions! 🎉