Well i can see it becoming sexy soon

turnsout · on April 20, 2023

Soon? Is the model filtered/censored?

jamilton · on April 20, 2023

From the readme:

>Bark has the capability to fully clone voices - including tone, pitch, emotion and prosody. The model also attempts to preserve music, ambient noise, etc. from input audio. However, to mitigate misuse of this technology, we limit the audio history prompts to a limited set of Suno-provided, fully synthetic options to choose from for each language.

It's not immediately clear how the audio history prompts are created.

joshjob42 · on April 20, 2023

I don't know how they're made exactly, but one can just edit the code a bit and delete the restriction to just the given audio history prompts. It's literally just enforced, affect, with a simple "assert" command.

gkucsko · on April 20, 2023

history prompts are just unconditionally generated TTS from the same model. any of those can be used as history, but for convenience 10 are provided for each language (to generate things with consistent voices)

turnsout · on April 20, 2023

So the history prompts are collections of text/audio pairs?

gkucsko · on April 20, 2023

history is semantic, coarse and fine. so essentially the same thing thats getting generated just using it as an input before the generation

CamperBob2 · on April 22, 2023

So how do you clone an existing speaker's voice? That's the part I don't get.