1- Don't host on anchor. Podcasting is an open standard. Don't let companies (like Spotify or Apple) take it over. Check https://podcastindex.org/
2- The voice is too mechanical for this to be actually reasonable to listen to at night, potentially could be listenable with AWS Polly Neural voices, it's pretty good.
I didn't much love hosting on Anchor/Spotify, but I made this in half an hour and I didn't want to have to get into RSS/site generation. Do you know of an easy way to dump audio files and some metadata somewhere and get a Podcast with RSS? I can upload there as well.
I'll try Polly, thanks! The current voice annoys me too.
The main problem is giving out the anchor.fm domain for your RSS feed, as it marries you to Anchor forever. In theory, you can get anchor to 301 redirect your subscribers somewhere else, but I've found that podcast clients tend to keep the old URL.
You can use Anchor to generate your RSS feed and host your content while still sharing the RSS URL on a domain you own. So you'd give out a URL like feeds.deepdreams.com/rss, and it would proxy the response from Anchor's RSS feed
I wrote a simple Go cloud function that can proxy your Anchor RSS URL for you:
I ended up setting GitLab pages to just curl the XML feed every time I publish, so now it's at https://deepdreams.stavros.io/feed.xml. Thanks for the help!
It's one of those things that's hard to do after you've got a bunch of subscribers, so I'm always glad if I can warn people early in their podcast against getting stuck with their host.
Oh definitely agreed, I aim to always own my stuff, but this was so quick and dirty that I figured it doesn't matter enough. Still, since it was this easy to do, better safe than sorry!
Just listened to one of the episodes. Sounds decent. I personally think it would sound a lot better if you slowed down the speed a little bit. 0.75x sounded much better to me.
Hm, yeah, I should at least whip up something like that. Getting the domain is easy, I just don't want to have to set up another static site or service just to proxy a file... Maybe I should bite the bullet and set up GitLab pages plus a simple script to output an RSS feed.
Polly supports ssml tags for nuanced vocal inflection and emphasis. Gpt-3 could probably output high quality tags if you run your content back through with an ssml prompt.
Agreed. I recently built an internal application allowing our customer reps to play around with ideas using text-to-speech before sending the "copy" to a studio for a professional human recording, and included both Google WaveNet and Amazon Polly in the available voice synthesis choices. Polly is in its own right plain and simply mediocre for the most part, and in comparison to WaveNet it's just awful.
I've tried both of them and even Microsoft Neural speech and IBM's ones; eventually, Microsoft one has sounded me the most clear and natural amongst these four services.
100% agree. Azure voices are the best. I wish Polly would catch up since most of our stacks are there but we keep going back to azure for this one specific thing.
AWS Polly looks interesting! I wish it supported some more languages, for personal reasons. Maybe I'll try to set something up that reads ebooks, tweets, or news articles to me with this.
Do you know if there are any similar quality TTS tools for less technical applications? I mean, where you can just type in the text you want and get an audio file with a high quality voice?
I actually think the voice is pretty good for sleeping, feels very droney. but the nonsensicalness of the stories made it harder to sleep because my brain was trying to figure out what was going on
Cool idea but that voice is like sandpaper to my ears.
Maybe a female voice, a bit quieter (the soundscapes are almost completely silent for me) and maybe add some high-room-size, long decay (5-10, maybe even 20 seconds), wide panned (like 100%) and moderately diffused (maybe 10-20%) reverb to the voice with like 30% mix or so, which would add a very airy tone and help the voice blend in a bit. If the TTS engine has a whisper setting (many do), add just a bit. It'll help thicken the reverb.
That, paired with bass-heavy soundscapes, will create a very nice balance between the low registers and the voice's high registers.
"AAAAAAAh WOOAH Jeez guys, I sure wanted to tell you this AWESOME-SPOOKY story that I had, but I can't read the next word on my sheet because my flashlight broke. ffffffffff! I hate it when it breaks."
How did that end up there? Are the AI overlords fucking with people trying to fall asleep?
No problem! Also if you're into the music production end of it, check out adaptiverb. I use it extensively when I make ambient stuff and it is unparalleled for quality.
Hey, would you by any change be able to generate another background track for me? The one I have is 10' long so it won't be enough if the story is longer, and I don't know how to make these.
Just found this comment. I'm a bit into generative / algorithmic music; here are two demos I made a year ago: https://fligenstein.bandcamp.com/
One is just piano, the other is keys and drums. On Bandcamp they are about 10' each, but they can be made of arbitrary length, without ever repeating themselves exactly (in principle... in practice it's likely there are exact repeats but they should be few and far between).
If you have an idea of the type of background music you need, I can make other tracks too. I'd be happy to work with you on this.
(This example is just one minute but it can be made of any length.)
Maybe the mix a little too much on the high-end, which may conflict with the voice. It should be tested?
Anyway, tell me what you think; I tried to sound a bit like the current background but with a more melodic feel and less industrial noise; but we could go in any direction -- if we know what to try.
I think that's great! Maybe the changes are a bit too rapid, but it's definitely in the right direction. Also my email is in my profile if you want to take this offline!
I don't have tons of extra time these days for music, unfortunately. I give you permission to rip/download anything from here that you'd like and include it in the project, however: https://soundcloud.com/0-aces
There's also a few TTS systems which are pretty natural sounding too. Maybe one of those if they wanted to make a subscription for this, that way they could offset the price of the TTS service
>> So he chained her up in her room and he chained up hundreds of angry wolves in the other side of the room. [...] But he made the window and the doors big enough so that the fierce beasts could move in and out and chase her away. And they lived happily ever after.
These fairy tales are quite hypnotic for me due to weirdness of AI generated grammar and plot. They reminded me of a beautiful fairy tale Richard Bandler wrote. It is a fable written intentionally using hypnotic language techniques (part of Neuro Linguistic Programming set of patterns) and a nice read.
I think several people have commented on how GPT produces narratives with a dream-like quality, locally sensical but less and less so the more you zoom out. I've since found that catching myself thinking nonsensical thoughts is a sure sign I'll soon be asleep. Seems like without high level attention, we do almost exactly what GPT does.
> Seems like without high level attention, we do almost exactly what GPT does.
Absolutely so. Without higher order of organizing patterns in activity, brain is just a bunch of neurons firing up - this will correlate with random phenomena in consciousness.
Years ago I read that visions of going through tunnels and into the light at a distance in near-death experiences is just what visual pathways in brain will produce when activity in them fades away. 'Signal lost' experience of a (still conscious) brain. (That would be interesting to dig into this. I have no reference.)
Supposed to be nonsense stories, but after two minutes of listening, I don't find it nonsensical at all. Sounds like something perfectly reasonable written by a seven to ten year old child.
Good analogy. It feels as if AI is in its child-like stage, trying to make sense of the world on its own, mixing together concepts it has just discovered but not fully understood in an amusing way.
... then suddenly there is a Prime Minister of Everything, which had me burst in laughter.
Also at the beginning there seemed to be one little girl (Amelia) and one witch (Sarah), and then there were now two little girls (plus the witch), and one of the little girls stood between the two little girls, and later on there appeared to be three little girls. The girl duplicating over and over sure got me hooked, kind of like watching a strange surreal painting, or reading some PKD short story.
My dog fell asleep while I had episode 4 running (the end caught me by surprise, haha).
I mean, she would have fallen asleep anyway, I probably could not. The voice is a little unpleasant and I concentrate too much on the nonsensical stories. But I also can't really fall asleep when the TV is running, so YMMV.
It is kind of hard to sleep to, I agree. This is a deepfake voice, ie it's generated by Google's WaveNet, which afaik is a deep learning thing. Unfortunately they didn't have a more whispered/softer voice, but I like the insanity of the generated stories anyway.
GPT3 does tend to get a bit repetitive, though, with the default temperature (0.7).
ASMR is not about whispering, but about pleasant sounds that make you feel nice and tingly. Most people may react well for whispering in particular, making it very popular in ASMR videos, but that doesn't mean all of them do - others may need a different trigger.
The GPT-3 generated conversations were coherent most of the time, and even interesting! However the generated speech via Google Cloud's API was monotonous and could do with a bit more intonation and excitement.
Let's say one had a child and instead of music, you played these GPT3 generated stories, would the child then have been trained to speak by an AI, and if so, what else can we do? Hypothetical and unethical, but that never stopped Skinner, or any of them really. It's something we're going to have to consider.
Brilliant :) My main concern is that my brain would just wander off if the story is complete nonsense. I will give it a try. I've been listening to the same podcast episode for months now to help me fall a sleep.
I find the stories toe the line of "just enough sense" to keep it interesting. Episode 2 is the one I liked the most so far, I was reading the text with lots of interest!
Is there anything different about a podcast like this from a copyright perspective? Are machine generated products just as copyrightable as if OP had written and produced these using traditional methods?
This is really cool from an engineering perspective, but there was a sort of uncanny valley vibe that makes me fear for the psychological health of someone listening to this all night.
I'm worried that if I listening to this while falling sleep one day I may wake up and not be able to distinguish between things that make sense and those that don't.
is there any tech where one can record themselves speaking all the phonetic elements and then have a program string the samples together to "speak" texts in your own voice?
A podcast is at-the-very-least a rss feed where items have a media enclosure element pointing to something for a pod-catcher to download [1].
If something else allows you to listen to episodic content via a player and does not satisfy this condition then it is factually wrong to call it a podcast. Do you think a TV show released weekly on Netflix is also a podcast?
You can't listen to what you have made without having a spotify account (or clicking directly on the webpage), ergo it's not a podcast.
1- Don't host on anchor. Podcasting is an open standard. Don't let companies (like Spotify or Apple) take it over. Check https://podcastindex.org/
2- The voice is too mechanical for this to be actually reasonable to listen to at night, potentially could be listenable with AWS Polly Neural voices, it's pretty good.