Skip to content

Accelerate is a dependency for voice cloning script #1468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gustrd opened this issue Apr 6, 2025 · 6 comments
Open

Accelerate is a dependency for voice cloning script #1468

gustrd opened this issue Apr 6, 2025 · 6 comments

Comments

@gustrd
Copy link

gustrd commented Apr 6, 2025

Describe the Issue
"NameError: name 'init_empty_weights' is not defined" happens when trying to run the voice cloning example, at the line: "self.model = HFModel(config.model_path, self.device, config.dtype, config.additional_model_config)"

Additional Information:
Installing the "accelerate" package via pip solves this issue.

@LostRuins
Copy link
Owner

Ah yup alright I'll update the wiki.

How's the voice cloning so far? Should be compatible with all existing speakers. Hoping that @edwko brings us new OuteTTS models in future.

@edwko
Copy link

edwko commented Apr 6, 2025

@LostRuins New model on it's way! 🚚📦

@gustrd
Copy link
Author

gustrd commented Apr 7, 2025

I tested with some voices, but with them I think that is kind of slow, and often generates audio just to a part of the message.

I'm not sure why, but seems like when calling the API it generates faster than in the chat ui. Very strange, I know.

@LostRuins
Copy link
Owner

LostRuins commented Apr 7, 2025

@gustrd did you test with the sample speaker jsons included here or did you make your own? Are they working? The voice cloning process requires an accurate transcription from the source audio to work.

New model looks interesting but it's blocked due to dependency on DAC.speech.v1.0 arch being implemented. Kinda wish they has stuck with WavTokenizer.

Unfortunately this might be rough as the number of people who can implement a new arch in ggml is quite limited - we did not get xcodec for YuE, nor snac_24khz for Orpheus, and now those are kind of DoA. Hopefully someone will pick it up, thats the only major blocker and the rest is quite trivial to add.

@gustrd
Copy link
Author

gustrd commented Apr 7, 2025

The ones provided works ok.

I had issues with the ones I cloned myself. Maybe I was using an audio too short or too long what is the recommended length?

@LostRuins
Copy link
Owner

About 10 seconds or so works best. Choose clear, articulate speech used for cloning, so that whisper can transcribe it properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants