(This is not meant to be an anti-ai-generated-art rant. It's coming whether we like it or not. But some of the motives in this thread confuse me.)
Music producer here with an honest question to those saying "this will provide me with a simple soundtrack/background music for $PROJECT"
Have any of you checked out / made offers on music production subreddits? Or other music subreddits? various music production discords? Elsewhere on the internet?
If so, could you say what your experience has been?
I ask because the music production scene is like...ridiculously saturated, and it's almost a meme in the producer community how hard it is to make even a buck producing. I suspect that there are a significant number of producers who would be happy to take your "prompt" for a small fee. Yes, I understand 1) free and 2) immediate is convenient, but isn't 1) relatively inexpensive and 2) whatever advantage intent in construction gives good too?
I'm willing to admit that I'm missing something here, but I'd love it if someone could enlighten me.
While I'm asking follow ups, to all the folks who love digging for new music so much that they're considering turning to prompting AIs, I'd be seriously surprised if you've really checked out all the stuff that is coming out from new producers (again, reddit, soundcloud, etc). Another meme in the producer community is how one spends hundreds/thousands of hours perfecting ones craft, and dozens of hours working on a track, only for that track to get like 5 plays on soundcloud and negligible engagement elsewhere. Are music consumers really that desperate for new tunes? Frankly a lot of us just aren't seeing it....
It's not that there isn't enough electronic music being made, it's that every new track that lands on soundcloud is a drop in the ocean of mediocrity. There is _too_ much, and 99.9% is just boring to listen to, because it sounds like everything else. I listen to a LOT of electronic music (and have, since the mid 90's) and just don't have the patience anymore to sit through hours of average material to find one or two truly inspired artists.
I doubt I would turn to AI much for anything other than background noise while focusing on work. In fact, that sounds like a perfect use case for me. "Dear GPT, please compose a four-on-the-floor downtempo progressive track with soft pads, no vocals, and zero goddamned fake vinyl noise that runs for two hours straight..."
> It's not that there isn't enough electronic music being made, it's that every new track that lands on soundcloud is a drop in the ocean of mediocrity. There is _too_ much to listen to, and 99.9% is just boring to listen to, because it sounds like everything else.
Yep. this is why I don't feel like AI used in this manner moves the needle for music: people only actively listen to the best 0.1% of music anyway. The ability to create music that is firmly in the other 99.9%, as this stuff very clearly is, just means that the ocean of mediocrity has more water dumped into it.
> It's not that there isn't enough electronic music being made, it's that every new track that lands on soundcloud is a drop in the ocean of mediocrity. There is _too_ much, and 99.9% is just boring to listen to, because it sounds like everything else.
To the extent that sounding like everything else is a problem, how is ML generated music not going to have it?
And in general this isn't going to be a qualitative improvement in experience. ML algorithms for recommendation are searching the preference space in much the same way ML generation would, they're just doing it over existing stuff. If you really find 99.9% of existing material boring you're probably going to find a similar order of generated material boring.
Though I suspect 99.9% is hyperbole. My rate of "this is listenable and interesting and I'd like to come back " on Soundcloud is better than 1 in 25 on the worst day and better than 1 in a dozen on most, and the rate is often north of 1 in 6 for curated platforms like Pandora. It's never been easier to discover good new music to listen to with not much in the way of effort.
"To the extent that sounding like everything else is a problem, how is ML generated music not going to have it?"
AI generated art has explored all sorts of weird spaces that few humans have touched.
It's not difficult to make computers create unusual, original, bizarre work. The difficulty comes in making it both original and enjoyable/interesting.
Also consider that AI-generated music is often going to actually be a collaboration between a human and an AI. The human will be acting at least as a curator, because not everything created by AI is going to be pleasing, so some selection and catering to human taste will be required.
> The human will be acting at least as a curator, because not everything created by AI is going to be pleasing, so some selection and catering to human taste will be required.
Yes, and keep in mind humans are already doing this! It's very common to do tweaking of knobs on a synth/VST while recording and create a 10-20 minute audio file, commonly called a bass jam or mud pie, then select the best bits to use in a song. And of course, people use randomization tools to tweak the knobs for them. IMO use of AI to support this type of workflow is far more promising than going directly to the finished product.
I generated a 1:20 sample using your prompt "four-on-the-floor downtempo progressive track with soft pads, no vocals" using the audiocraft-webui fork, which allows for longer generation by overlapping generations.
It was not super discerning listeners (like it sounds like you are) that I meant to address in that second question. Sorry if that was not clear. Rather it had sounded from some of the comments that people were desperate for original tunes (and maybe not necessarily the most highly produced). But I didn't point to a specific comment, so maybe that's my fault.
We may also disagree on how much good stuff there is coming out, but I agree there is a lot of noise.
There is an absolutely massive gulf between "free (or fixed list price) and immediate, just use these apps" and "locate a musician, bargain with them, pay, collaborate with them, wait for revisions, eventually get something usable (but not be 100% sure if I own the rights or not)." I wouldn't even know where to start; I'm too far out of my league.
Tackling the latter would likely exceed the entire effort I spent writing my little hobby game in the first place. I don't think it's even close; it was never a serious consideration. Some of these games I write in a single sitting. I do my best to piece background music together using a chord progression app, descriptions of keys and the notes they contain from Google, and premade drum loops and instrument samples. It comes out worse than if a real musician had made it, but getting a real musician was never really an option.
It's the same for the art. I don't have the time or money to pay an artist. They deserve to be paid fairly for their work just like musicians, but I don't have it and it's just a stupid hobby game. But even stupid games need art and music. So, homemade programmer art and music it is. The availability of better tools to help non-musicians hack something together is greatly appreciated. I haven't tried any AI stuff yet but I will next time.
It's not that hard, you find an email and you basically do a cold call. I was doing it in high school off Newgrounds (which is full of royalty free stuff too!)
If your project is small and free, you're not going to land The Eurythmics. But all those people posting their music online hoping to get noticed? Emailing them, even a cold call, immediately tells them you've listened to their stuff and you like it. Honesty is the best approach.
I think OP is onto something.
Edit-
> but not be 100% sure if I own the rights or not).
That's also really easy: stipulate it in writing. Preferably a proper contract but an email agreement is defensible too (IANAL).
When it's just me and some apps, I'm writing the background track in an evening after coding the game earlier in the day. If I bring someone else in, now I'm writing contracts (something I'm completely unprepared to do correctly myself, as a non-lawyer). It's too big of a jump for a one-day zero-budget hobby game that isn't very good and only my friends will play. Not when I can quickly cook up something myself using readily available tools.
For a more serious project with a budget, absolutely you find a professional producer, just like you get professional coders and artists. But this isn't that.
Dude I've literally offered to design entire websites for up and coming music artists for FREE because I love their work and am rebuilding a portfolio to have more music sector work.
I've offered this to like 15 people. Some respond in utter confusion and blow me off. Most don't even respond.
This isn't 2005. The vast majority of people, especially "music artists" are not corresponding over email and are not exactly professionals either.
> It's not that hard, you find an email and you basically do a cold call. I was doing it in high school off Newgrounds (which is full of royalty free stuff too!)
You are missing the point, most programmers are introverts and absolutely deplore doing cold calls and cold emails.
Instead of turning to pre-made corporate software, you could just hire me or one of my developer friends? We’ll crank out something fine for you in no time flat. All you have to do is just:
- Find me, which is easy, just become an amateur developer yourself and scour the various places I frequent
- Think of and propose in great detail what you want. We will go back and forth over this, over the course of several days/weeks. Bonus points if we don’t speak the same language.
- Sign some form of agreement, really easy, just read this 5 page document and maybe hire a lawyer if you are unsure. All very easy.
- Fork over the cash
- Get deliverables in a few weeks, hopefully.
Now when you compare that with just firing up some website/app and getting on with your work, is that really better? I’m not seeing why you would just go to gmail.com if I could have just made you a very nice, very special email reader.
I have perfected my craft over thousands of hours you know. You should pay us the respect we deserve.
—
Seriously: cranking out tunes through some prompting vs hiring people through shady channels like reddit? Are you serious?
You give an analogy of software. I suppose I feel that art differs from this in some ways. E.g. originality, creativeness.
Reading many of these responses I'm gathering that perhaps I have too strong a notion of what kind of quality of music might be in demand. While the loops on the original article are impressive given their generative nature, I suppose I felt that there may be a demand for something more (Better sound design, more long term structure), but maybe I'm naive.
I came off like an annoying neckbeard. I guess that’s expected of me and my type, but sorry for that.
You hit a nerve because software development is also art and highly creative. I have given a significant part of my life - basically my youth - to it and I feel “creatives” think they are somehow special and that their work is fundamentally different and I don’t think it is.
It’s just that we have been cornered earlier than you guys. My skills are now only profitable as boring building blocks in corporate settings because nobody else will pay for proper work. Everybody expects easy access for free instantly to whatever digital service they can get their grubby hands on. If I talk about “craftsmanship” I get laughed out the room. Nobody gives a shit.
Now I’m like, yeah guys, that’s how it feels to have your skills commoditized. Deal with it. That’s kind of childish though.
Personally I think you underestimate access, I've on several occasions while developing small games wanted to collaborate with someone who has a musical bent to put something together.
The problem I feel is that I have an expectation of being able to front the cost of engaging someone to work on a project with me.
Working out navigating a working relationship on a smaller project seems fraught with issues.
I'm rarely inclined to spend dozens of hours listening to soundcloud when I have other things to work on.
I mean yes people create interesting music, perhaps it's a search problem? Knowing someone creates the kinds of music I'm interested in would help. But as someone making things, I'm trying to find someone who I can collaborate with who has an overlapping interest in what I make. Solving for that is not straightforward.
I've had much more luck with graphical art than music.
So yes, even though these systems are fundamentally worse, I can at least "collaborate" with them on producing something. Going from zero to one can be enough.
Music is abstract. When we talk about visual art we can almost always be on the same page. If I say I need garden gnomes parading around a Bavarian village, the amount of variation between my internal idea and what a visual artist returns will mainly come from the lack of terms I use regarding aesthetic sensibility. Will they return something abstract or neoclassical? I would then be more specific etc...
For music we could present such an image but it would then suggest I'd argue much more possibilities. We could narrow down by genre you would suppose but even then there are too many possibilities: genre's are not as strong categories as are the stylized "era's" of visual art, I would also claim. Moreover, we can "port" a fundamental structure like a melody over all sorts of strains of music. In visual art, any motif is bound to be changed depending on the era and the style we'd put it in, that is, I think that in music, there are elements that are stronger in visual arts and elements that are weaker in music, and vice-versa, with regard to a description we could give in English. It's probably more natural and more possible to ask about what a sort visual representation should be than what a piece of sound should be.
It's interesting how we can generate images I'd argue in stunning faithfulness to some prompts but we don't seem to be very close to the same standard, for some prompts, at generating music.
My first reaction to this wasn't "cool I can make the novel music I desperately crave", more along the lines of "this thing is making some wacky sounds that I'd love to see a producer craft into something more". Because I definitely agree with you that there's an abundance of fantastic music to check out, and realistically I'll never be able to check out even half of it throughout my lifetime.
The guys in Infected Mushroom will have a field day with this stuff. Their whole thing is finding weird ways to create new sounds you never heard before.
Honestly what I'm most excited about is how this technology can be used, not to arrange parts or even loops but rather in new plugins (VSTs) that implement novel approaches to digital synthesis. Think of all the awesome sounds.
If anyone knows anyone working on that, ping me. :)
> Have any of you checked out / made offers on music production subreddits? Or other music subreddits? various music production discords? Elsewhere on the internet?
When it's 3 am on a Saturday and I'm in the zone on a passion project, I'm not about to spend the rest of the weekend going back and forth with a music guy on Reddit.
I want a music robot that cranks out music on demand and responds to my every whim, and a real human being isn't going to want to fill that role no matter how impoverished they are.
If I need a background track for something, and I commission someone else, then I believe the standard contract for the commissioned work would still leave copyright with the producer (though not always), and changing it so that I have exclusive rights to the work would potentially make it more expensive.
Add to that, if I don't like something, want it tweaked, want something completely redone, or just flat out change my mind about some direction I provided later, I have to go back to them and negotiate a new contract, or find someone else to do the work. The costs add up over time, and there's an additional benefit to immediate feedback (or cost of delayed feedback, as anyone who has worked on a software project that takes forever to compile/check can attest)
I haven't used the music AI tools yet, but having played around with Dall-E a bit I can say that it's pretty enjoyable to be able to give direction, and bound it, then roll the dice and see how things turn out. I definitely feel some ownership of, and pride in, the resulting creation
the simple answer is that your motivations for being an artist need to change to be exclusively personal fulfillment. because that was true for the pre-AI world, as you essentially described, and its true-er for this current-AI world.
the real meme is about how artists have always been grasping for financial respect in every market condition ever, and yet nothing has changed. people were never going to commission you, they were never going to book you. While they do appreciate the content. But for the few that would ever actually try to commission something, they encountered friction after friction after friction and collectively artists have been disinterested in solving. Because they're starving and preoccupied with fighting for scraps and modicums of respect at all.
The world’s has now solved many of these frictions.
The frictions were:
1 hoping they found the right artist to begin with
2 hoping that artist is reliable and has any work ethic or structure in their life
3 not bruising that artists ego in however communication style is preferred
4 dealing with how completely segregated many artists are from contract negotiations and any aspect of the business world, but needing to secure rights properly
5 ego in securing rights properly without the artist overplaying their hand
6 waiting for the commission
7 revisions
8 circle back to 1
9 if you ever get past part 8, you have the issue of whether your new license can be used in an unforeseen way and medium in the future
getting burned in altruistic commissions of living artists is simply over now. all these frictions are solved with the free and immediate way.
> the simple answer is that your motivations for being an artist need to change to be exclusively personal fulfillment. because that was true for the pre-AI world, as you essentially described, and its true-er for this current-AI world.
> the real meme is about how artists have always been grasping for financial respect in every market condition ever, and yet nothing has changed. people were never going to commission you, they were never going to book you. While they do appreciate the content. But for the few that would ever actually try to commission something, they encountered friction after friction after friction and collectively artists have been disinterested in solving. Because they're starving and preoccupied with fighting for scraps and modicums of respect at all.
For anyone trying to make money off of music, they should have already been aware that most of the effort in making a living is the non-music work. Once your music reaches an acceptable level of quality it's more about finding and managing your fanbase, industry connections, getting booked at the right shows, promotion and marketing, maintaining professionalism, etc. than anything else. Which this particular AI doesn't help with.
An extreme example is Fred Again, who came out of nowhere and is now one of the biggest names in electronic music. His music isn't bad, but it's nothing revolutionary. As it turns out, though, he grew up in one of the richest neighborhoods in England, with Brian Eno as a neighbor, and went to the most expensive private school in London.
So no, AI music generation doesn't change anything here. It's similar to the startup mistake technical people make of focusing on picking the right tech stack instead of focusing on sales and finding product-market fit. The software/music is only about 10% of the challenge of making a successful business/career.
I did want to clarify that I was posting from an angle about those us who need music produced for our products, but were never going to commission it.
I think its important to understand that user story because a lot of artists don’t seem able to empathize with it. People are excited because they were never going to commission artists, and were also turned off from stock music licensing websites too.
>the simple answer is that your motivations for being an artist need to change to be exclusively personal fulfillment.
Mine are and the same goes for most of the artist I indicate. The point wasn't that they were in it for the money, although many dream of being able to at least one day pay the rent with it (or maybe just groceries).
The rest of your response makes sense (although I think much of it could be said for all of hiring someone to do work). Anyway, thank you for providing your perspective.
Humans are just naturally hard to deal with, especially humans who you never met face-to-face.
As anti-social as it sounds, it's the conclusion that I've reached to after years of working with freelancers/contractors. I've contacted with >50 artists (>300 if "they send me a propose on Upworks" count) and worked with ~10 of them.
Don't get me wrong, I still choose human artists over Stable Diffusion. For now...
> I'm willing to admit that I'm missing something here, but I'd love it if someone could enlighten me.
It's basically the same as with Midjourney. Before Midjourney I'd have to spend quite some time organizing with some human, explaining what I want, licensing terms, etc only to have to wait a significant amount of time for an image that I may not like.
With MidJourney for just a very small amount of money I can instantly get images that are exactly what I want, iterating extremely quickly. Just the fact that I don't have to deal with another human saves a massive amount of time.
TL;DR
1) Faster
2) Cheaper
3) Often closer to what you want, because you quickly iterate and can get hundreds of variations
“Melody conditioning” as shown in the article seems both immediately useful and something that’s harder to find a human to do for you at the same level of quality.
I think it's about the ability of someone who perhaps has a great idea (imagination) but lacks time, resources or the skills (execution) to make it happen. Of course that a talented producer will create a more compelling song (at least for now) but if the tool is an amplifier it should also boost talented composers - their prompts or inputs to prompts might be much more detailed, more interesting, more creative than those of a neophyte. This of course makes some assumptions but I think the draw for most people is that they can "make" something that sounds cool with almost no effort. My belief is that someone who puts more effort in a GPT and has more expertise can get a lot more out of it as well. I could of course be wrong and GPTs might be the big equalizers, but I doubt it.
I want to do this but I'm scared of the backlash of "you're being exploitative!!!!".
I know the people who say that mean well, but it totally overlooks both how much the culture does (as you say) want to provide their art for projects to use it and create value together, and... reality. Shouting at everyone isn't the way to get them onboard, but shout they do, and it's one country in particular that seems to scream the most.
I'm in the UK and I can't walk down the street without tripping over producers, so maybe the way around the angsty people is finding them in person? Or... we just use AI. The robots solving our social issues is probably a thing.
Hmm, I was there right yesterday. I needed a track to go with a very particular side-project, and I was looking for somebody to arrange/design/sing the vocals. I’m 100% sure that a musician will always do a more musical job than me. The problem is that coaxing the exact work that I need from that musician is going to be a pain (that is to say, expensive and time-consuming), because the piece does not (can not) fall into any set genre (which tend to be cheaper to produce).
I’m not going the AI-route. It’s frankly easier, more fun and it sounds better to compose the music myself, and if the project takes off and makes a dime, hire a pro to improve things later.
This is an insurmountable benefit. Literally the two most important things when it comes to me buying music at scale.
I made a mobile game a while back and composing and licensing music cost me $30,000 for a free to play game. That was the same as 6-months dev salary (devs in Belarus).
If I can save $30,000 and have zero delay, I’m just going to do that 100% of the time.
The only factor that a real musician can beat on is quality. But let me tell you, with zero marginal cost of production, quality will inevitably be better with generative.
Problem is minimal viable expectations and how fast these ar filled. In 90% of Reddit you will get flamed for offering money for anything. Wouldn‘t even touch my mind to go there.
I think the promise that has already been demonstrated with language is that you can iterate really quickly. “Make it a bit more upbeat, ok try more synthwave, ok scratch that try darker electro, ok this is better make the bassline more pronounced? Great that’s what I was imagining”
I don’t think it’s going to displace a dedicated composer that gets the medium they are scoring for any time soon. But then that’s not what your comp was initially.
TLDR there are cases where “good enough” is going to be provided by generative music in the medium term. Unlikely for this to be anywhere adjacent to music connoisseurs.
I just don’t have a good source of people that are guaranteed to produce something sensible, I have a shortlist of artists I’ve encountered over the years, but it’s indeed a very short list. If I ask someone to create a track and it turns out it’s garbage, I still need to pay them, and I’ve wasted a week of my time.
Another framing of this is not based on demand. Presumably most creativity and art creation isn’t to fulfill a need or demand from anyone other than the producer. This could allow the creator and even users to feel some sense of originality and creativity.
(and now for the rare take that isn't your typical cynical/jaded internet comment)
Wow this is more than good enough to use for background music in video games, stores, commercials, etc..
You really could have super dynamic music in a video game for instance that changes based on the time of day, environment, situation, mood, etc.. all combined.
Combine it with a LLM DJ and you could get some fun radio stations.
> You really could have super dynamic music in a video game for instance that changes based on the time of day, environment, situation, mood, etc.. all combined.
Games can and do already do this, dynamic sequencing of music from a pool of stems has been common practice for a while. Maybe this could let you do it cheaper, and AI could go more granular by creating new stems on the fly, but the onus is still on the AI developers to show something which hits as hard as someone like Mick Gordons dynamic compositions.
Infinite variety is of little value if the infinite space is full of infinitely boring, uninspired content.
Like I said, cynical/jaded internet commenter - I don't need Mick Gordon's dynamic compositions, I just need some background music for my game that's good enough.
And no you can't do this already:
const musicSpeed = inFightSequence ? 'intense' : 'chill';
const musicPrompt = `drum and bass beat with ${musicSpeed} percussions`;
playMusic(musicPrompt);
Are you deflecting by attacking my simple example instead of defending your ‘this has already been done before’ point? If you’re wrong just say you’re wrong.
When Animal Crossing blends between a different composition for every hour of the in-game clock, with variants for different weather conditions, is that not changing based on "time of day, environment, situation, mood, etc"? When Doom dynamically ramps the intensity of the music along with the intensity of combat, and inserts perfectly synchonized stings in time with the players actions is that not reacting to the situation? That's what I mean by this already having been done, just not with AI.
AI has the potential to consider more variables than is feasible with the current process, but my question is "at what cost". Would Doom be better if the music were slightly different depending on which weapon you were holding, if the trade-off is that instead of Mick Gordons work it was a computer generating what may as well be royalty free elevator music? Probably not.
Making more content for less money is only a net positive if the content is actually good.
Why do you think AI music will sound like elevator music forever, when it’s already generating English text and code at such a high level? It’s quite possible that 10 years from now, Mick Gordon will sound passé when compared to the dynamic AI generated music. Maybe not, but definitely possible. There’s a lot of money to be made with better generation of music, and it’s going to be an area of exploration for sure.
Well, I would say that AIs ability to generate objectively correct text or correct code doesn't have much bearing on its ability to create worthwhile art, those are almost polar opposite goals. There is no objective metric for what constitutes good art that you can train an AI towards, the closest thing we've come up with is teaching it that the samples of art in the training set are "objectively correct" so that it will try to make something similar. Better models achive higher fidelity but are stuck forever imitating rather than exploring new or less common ideas.
Image generation is the most mature form of artistic generative AI, and the trend there has been towards introducing more human influence into the process to help guide the AI into creating something actually worthwhile. If the goal is to embed an unsupervised AI into a game engine and have it create consistently high quality and interesting music based on the current game state, with no human operator in the middle to curate and guide the process, we've got a hell of a long way to go.
>Wow this is more than good enough to use for background music in video games, stores, commercials, etc..
Hard disagree, and lack of copyright due to not being produced by a human becomes an issue for many video games, commercials, etc.
>You really could have super dynamic music in a video game for instance that changes based on the time of day, environment, situation, mood, etc.. all combined.
You don't need this IA for that at all.
>Combine it with a LLM DJ and you could get some fun radio stations.
You could also not, and you wouldn't know until it failed to produce anything interesting. A whole radio station filled with grocery store background music? oh wow I can't wait for the fun.
and you're not likely to make any money doing it, so what's the point aside from showing the human portion of music is missing in everything you suggested.
I used MusicGen yesterday to create 50 songs or so. Three of them sound pretty good [1][2][3]. MusicGen is definitely the best of four models of the presentation. I used the prompts differently than the article and i think i got better results.
Suppose there is way to measure cardio beats or electricity spikes on the brain, and we configure the machine to generate music to increase cardio beats, or decrease them, or similarly increase electrical activity of the brain or decrease it. Then psychology might be deprecated, mood will be reduced to just a music channel.
Yes of course they are the starting point, a good musician may take some samples and transform a music generation to a better song for sure. Some artists state that a painting is never complete, or a song is never complete. There is always room for innovation.
The prompts i used, referenced real songwriters, and the model seems to know their songs. The article does not prompt it that way. So i guess there may be a little bit of IP infringement, but we need that, only for the first bunch. Next models will be trained on the best generations of previous models.
People have tried this, they're called binaural beats, and they don't seem to work for the most part. I mean, not in the sense that you could engineer sound to invoke very specific effects in the brain consistently.
I personally have more than 10 years experience on sitting in the cold all day long, with only summer clothes on. Like 0 to 5 Celsius, with only shorts on, not even socks. I am winter swimmer as well. I do that, because i can think a lot more clear in a cold environment, it is good for the brain. Granted in Greece there is not that much cold, maybe 1 or 2 months of 0-5 Celsius.
That can be achieved by putting music on, which speeds up the heart pulse. Usually hard rock, metal, thrash metal etc. In that case, the body starts sweating a lot, not matter the temperature. I combine that, with 5 simple exercises i do all day long which are important as well.
My point is that using music, someone can be in charge of his heart pulse. But my biggest complaint always was that these metal guys, are masters of the guitar, but other kinds of music have better taste in rhythm, in melody etc. Using programs like that we can evolve it a little bit, to be more pleasurable to listen.
I know about about binaural beats, i have tried to listen to different hertz for hours on end, they don't work in my opinion. At least in my case.
There's a real effect, just nothing even remotely close to the actual fantastical claims being made about it. It's highly doubtful there's some sort of profound way to induce arbitrary brain states through audio input alone.
I remember vividly that this was very hyped in some circles around 2005 or thereabout, with wild claims that listening to some strange white noise for twenty minutes could induce full-blown psychedelic trips even in people with no psychedelic experience. I even tried a bunch of em, and the only clear effect was a mild headache. And I was naïve enough to think it might work back then, and yet there wasn't even really a placebo effect.
I was thinking of a scenario of mapping our brain activity, like reading functions of some module, or a birthday party, or a business meeting. From then on, we put the machine to generate songs and activate roughly the same brain region of the actual life experience. We do that once, and generate 10 songs.
The next time that life experience takes place, we listen five or ten minutes to the relevant songs before it happens. We do that to put ourselves in the mood, as a mental preparation tool.
That's all. Not creating worldwide Britney Spears hits, or alter our consciousness. Just a mental tool.
Oh I see, you're essentially describing what I think of as aural contextual clues/associations. Sure, that's very real and I've experienced it first hand.
Though I'm sceptical how directed it can really be. There are some songs that have a bizarre effect on me for sure. Though most of the time it's because I had some strange experience involving the combination of said music with psychedelic drugs. And now the music can induce echoes of that experience. But it's just sort of an association that happened by accident.
I guess I could see people using this phenomenon in a more deliberate manner. And you certainly seem to be doing so. Though it could be that you're just somehow more able to than most people.
That happens in general, many people associate music with relevant actions of their life. They listen to songs which are more suitable to driving a car, or lounge beats to read books.
One scenario is to record some sounds of the event once, like the laughter of a child in it's birthday, put it to songs, and listen to it before the next time it happens.
One other scenario, is to record the brainwaves of some difficult task, like programming, and by listening to songs, try to activate the same region of the brain. When there is an automatic way to create one song which activates an area but not exactly, and another song which activates one more area but not exactly, the machine will try to figure out how to combine the two songs together which will hit the spot. It is essentially a problem of combining information, which A.I. statistical engines are very good at it.
So, CC-BY-NC licensed model weights, and they've made sure to license the training data. And some jurisdictions are saying that copyright cannot be claimed on the output of such models.
Oh, to be a fly on the wall in RIAA corporate offices…
Sans schadenfreude, I think this (depending on inference speed) could be perfect for dynamic content in games (including IRL games: LARP, escape rooms, table top games, etc.)
>Oh, to be a fly on the wall in RIAA corporate offices…
In all likelihood, they're ok with events. Games were never anywhere near their main revenue stream. Now the labour costs on what they're actually selling are dropping to zero. RIAA's future:
1) Use AI to fake a band.
2) Use AI to write music (maybe even lyrics). Don't really care if the AI is any good.
3) Distribute output widely, note that copyright still applies to the output.
4) Use media to generate hype (the critical step). This depends only on platform control/relations, and they have that.
5) Yea, other people could technically generate same quality dreck with AI, but it won't be (and legally can't be) exactly like the hyped dreck. Others can replicate nearly everything except the hype.
6) Since the costs are near zero just about every sale is pure profit.
Basically, since Music can be replicated, they'll sell hype and belonging to a fan group instead.
Then they'll quickly be replaced because nothing about that is special - the RIAA exists because they were positioned to guard intellectual property that gave them a monopology on IP that was culturally significant.
Making and marketing an AI band isn't even interesting. Someone will be doing it on twitch and youtube an anime vtuber ensemble before the RIAA even figure out any portion of it. The media hype is because of celebrity, and AI generated stuff can't be celebrity.
>The media hype is because of celebrity, and AI generated stuff can't be celebrity.
With sufficient social network campaigns, media brib^W relations, and paid influencers we can get anything to be a celebrity, whether it's a paid actor or an AI avatar. That's the special step that not quite anyone can do. Plenty of K-Pop is already not that different...
Let's adopt a legal realist point of view. RIAA/etc. have money, lots of lawyers with large briefcases, and lobbyists so as to basically write the law. The ordinary YouTuber doesn't have tohe resources to defend, and there aren't any commercial interests on the other side here.
So even if the output is technically made entirely by LLM, they'd find a way to slightly tweak the process or even the law so it applies. Someone will do a trivial low-pass filter and then claim copyright. At worst they'd find a flunky to say they 'wrote' the music.
They might; but equally, Meta or Apple might counter-lobby to end copyright as a concept entirely, (or just for music, depending on how good the LLMs are at code and script writing for commissioned TV shows and stuff).
Those two can still rely on patents and trade secrets in a way that RIAA can't. (At least, the RIAA can't rely on those as far as I can see, but what do I know…)
That's what actors + pre-recordings are for (we could try using AI to generate the music live but that would make the actors' work more complicated and add more fail modes, why bother?). Note that using recordings already happened in the pre-AI era:
Well , a lot of artists have sued other artists for plagiarizing. Now MusicGen will be called to testify in court and show is composing method. And if it can't prove innocence, it will be put in jail
I installed it and everything went surprisingly fine and easily. It used about 8GB or VRAM max on a nvidia A30 and takes about 30s to generate 10s of audio. The max duration seems to be 30s in the frontend but the quality is a lot lower.
Mixing genres do not really work and the model doesn’t seem to be trained on band names. However it does perform well to create music using existing styles.
I generated some Eurovision crap and minimalist techno that were very much believable. But mixing death metal with lofi ambient isn’t the best, nor the epic progressive rock guitar solo I asked.
I think the examples on the website are cherry picked but with some experience in prompt engineering and many tentatives, it should be possible to generate great samples.
It’s also excellent at generating boards of Canada like music. The audio artéfacts, the low fidelity, the weird sounds, the detuned synths, this model does that very well and it does sound great to me.
(Not the original commenter) I don't know of any death metal examples, but chill-lofi-beats+djent music already exists and it's pretty good. This isn't the only example but it's my personal favorite one:
Depends on the human. If it’s myself, not really. But if it’s well done, yes I would enjoy death metal sonorities in lofi ambiant once in a while. But I admit the prompt is perhaps a bit challenging.
For the most part the new samples still sound like melodic nonsense — in all but one of the examples the melody doesn't fit properly with the chords underneath. It really does feel like the output of a music blender.
The style transfer is the most interesting bit IMO, as you get a sense of how it hears the source examples.
For example, when transferring the opening to the Bach Toccata all the new samples miss out the same passing note (the fifth note in the sequence). To a human ear that note is important, and could easily have been incorporated into the new samples, but it seemingly doesn't activate enough neurons for MusicGen to care.
I'm just weirded about the fact that conversation about something AS EPIC AS THIS is so boring and rudderless here on hacker news of all places.
I mean like YESTERDAY I did not have this superpower to summon something as majestic as say https://fb.watch/l4ssOD40M4/ with a simple 'A quirky and skronky Aphex twin sample that just hits you'
Edits:
I woke up to this news delivered from Yann Lecun himself in the morning on facebook[1] and my gaped mouth can still be found for onlookers to witness I suppose!
LIKE THIS IS IT FOLKS!
Edit 2
All those back in my days muzzak folks lamenting about the quality of contemporary music can fuck right off because you clearly havent explored enough of the modern music landscape.
Dont you dare blaspheme saying modern music has stagnated or some drivel like that. It is outright offensive to folks who are pushing the boundaries like say for example The Ex from Netherlands https://www.facebook.com/theexbandhttps://www.theex.nl/news.html
Just because you and the other soulless people you fraternize with are ignorant of all the innovative stuff thats going on, we have to suffer through your opinion on the state of pop culture?
Because we like music for the fun of making it, and the shared emotional connection felt with the artist (whether it's real or not, knowing a human wrote and performed a piece allows you to imagine this connection).
I don't know what the point of machine generated music is. Just destroying one of the few remaining ways for people to make a living doing something creative, I guess.
The promise of automation was to have machines do the things we don't want to do, so humans could have more time to do things we enjoy.
Instead, we are automating the things humans enjoy, and still leaving humans to figure out how to feed, house and clothe ourselves through the sweat of our brow.
> Because we like music for the fun of making it, and the shared emotional connection felt with the artist... I don't know what the point of machine generated music is.
Bingo.
This is a fun toy, but in terms what it means, you may as well ask an AI to pray. It's completely hollow in terms of the actual experience.
This could make suitable filler for idle games, ads, aquariums, and elevators. Not much else. Perhaps at best, a producer could use this to fill in the instrumentation behind a singer, but I have a feeling it's not there yet.
> The promise of automation was to have machines do the things we don't want to do, so humans could have more time to do things we enjoy... Instead, we are automating the things humans enjoy.
Damn. Never looked at it that way. It's still enjoyable to do these things, but perhaps less lucrative. I don't know, do professional musicians like arranging elevator music? I'm strictly an amateur who has never made a dime performing, so I really don't know if that would be joyful, soul-crushing, or somewhere in between. I just know what it means to me, and like I said, you may as well ask the machine to pray for all I think this amounts to.
> but in terms what it means , you may as well ask an AI to pray
The generative process is based on a combination of learning and randomness. The random part doesn't mean anything, but it's clear that it is far from just random notes. Do you think human music always starts from a meaning? It's just lucky accidents that sound good. We even retrofit explanations post facto to our actions, we can certainly compose music first and assign a meaning later.
Around 150 years ago classical music had a big dilemma - should music be related to concrete things or abstract? Should we put a story to music? So everyone wanted to know "what was the program?" (program==original author's meaning) sometimes composers would just hide it in order to instigate people to use their imaginations. It didn't matter what meaning the author originally assigned to it, better to try to hear it with beginners ears.
You've misunderstood. I'm not talking about the meaning of the inputs and outputs of a creative process. I'm talking about the very experience of doing the thing. Hence the prayer comparison.
> I don't know what the point of machine generated music is.
One point is that music fans can now make their own music. I think it's great that people can express themselves and it's not limited to those who put in 10k+ hours to master a single instrument. More people creating is a good thing.
The idea that creating music has had a huge technical barrier is laughable. It has not existed for ~20 years. Artists like Tinashe have learned to produce music themselves with programs like ableton, not a lick of mastering instruments or graduating from this and that art school. Just a general sense of what sounds good to you. Unlike visual art, there's no mechanical barrier either, no mastering of techniques. You can genuinely fiddle around with knobs and buttons and create something that sounds great to you - soundcloud is filled with these.
So there isn't going to be an increased level of profound self expressions because of this. Quite the opposite, more pure noise for the purpose of farming ad revenue.
What's worse, and an aspect many proponents of AI generations ignore, is that by ushering people into this specific channel of caring more about prompts than all else, we are doing a real disservice to potential people who could have become serious masters of their realm. After all, "why learn how that music program works when I can just generate it?"
>> After all, "why learn how that music program works when I can just generate it?"
That's how many of my friends in the music biz are thinking right now.
Also the same applies to Code and anything that could be generated by AI. I honestly lost the joy of learning a programming language with the advent of GPT.
Chat GPT still can't do visual programming so if you use something like Unreal Engine - you still have to figure out everything yourself. Sure - GPT can generate algorithms, but it can't playtest the game.
I can't ask ChatGPT to implement something like wall climbing or object throwing in Unreal. It will probably generate something, but it has no way to playtest it and check if it actually feels good to play.
That probably is a good thing, but the road to mastery is a great thing. I can't describe to you the feeling of being in the zone while making music, but I'll try.
Things will erode and decay, things will come into being, things will change. This flux is so constant that in truth there hardly are any things, just the changes; for as soon as you step in the river a second time, neither you nor the river are the same as you were. Epictetus, maybe? One of those guys.
Likewise, music is inherently fleeting, yet it still makes sense. You can't hold music, yet there's still a sense of it being a thing that exists. Yet when it stops, it still somehow hasn't ceased to exist. The act of musical performance, even at a basic level, especially with others, brings us one step closer to something fundamental about the universe than other forms of expression.
Like I said elsewhere, if you could ask the machine to pray or meditate, it wouldn't be fulfilling for anyone. It would be hollow.
Is it really an act of personal expression if you've narrowed the "vocabulary of creation" to the stable Diffusion equivalent of "hyper realistic, unreal engine, 8K, masterpiece, intricate details"?
At that point would not the act of creation feel rather hollow?
This feels like a straw man to me. We are continuing to automate feeding, housing and clothing ourselves as well. These two things are not mutually exclusive.
I would like to make music, video games and movies, too, and AI lets me do that. I don’t need millions of dollars or years of training to make something creative anymore.
You can go a long way with LMMS or Ardour and free sample packs. Most big sample production companies provide royalty-free samplers. The free stuff from Sonniss (GDC freebies) and Black Octopus Sound could last an entire career. Throw in the free Komplete Start (or Helm and Surge if you prefer open source) and you have all your synthesis needs covered: https://www.native-instruments.com/en/products/komplete/bund...
> Because we like music for the fun of making it, and the shared emotional connection felt with the artist (whether it's real or not, knowing a human wrote and performed a piece allows you to imagine this connection).
Speak for yourself. I like music if it sounds good, regardless of who made it.
> we are automating the things humans enjoy, and still leaving humans to figure out how to feed, house and clothe ourselves through the sweat of our brow.
Have you been to a farm before? Have you seen a textile factory? Have you seen a construction site? How could you, with a straight face, suggest we are not automating those things? There are vastly more people working on automation in those fields than are working on AI-generated music. Automation in agriculture, construction, and textiles are massive industries. There are a lot of people in the world working on a lot of things.
I mean that I still need a job to get access to those things. Housing prices go up and up, whatever automation is happening in that market is not helping ordinary people who need a place to live.
Food is a different problem. We have access to very cheap calories, but the overall quality of nutrition is way down in advanced economies, leading to an epidemic of obesity.
Textiles is pretty much a solved problem. We have so many clothes, we give them away en masse in developing countries. I think there are very few people in the world without access to adequate clothing, and if there are I suspect it's a distribution problem.
Yep. The way commissioners react when I deliver the files and they hear what I made for them for the first time tells me AI has a long way to go. I'm not sure it can replace that human connection. There's plenty of solid, cheap, and sometimes even free library music out there if you just want music of some sort for a project, and no generative music I've heard comes close to it.
> The promise of automation was to have machines do the things we don't want to do, so humans could have more time to do things we enjoy.
But why exactly should that happen? By which mechanism? Every single company automates in order to increase their monopolies and profit, to generate more shareholder value. There exists no other mechanism, so obviously we will never do anything other than that.
But at the risk of aligning mirrors to other mirrors and hollowing out the essence of it. Computers have been essential to the evolution of modern music, AI won’t evolve it anywhere because it needs to mirror the human work, and without people to do that it’s a sad dead end. But I doubt people will stop learning instuments and stop making music the old way because it is too fun and meaningful to do that. But there’s a possibility it will shift in magnitude in either direction. Hope to go the way chess did and not press a button and a few faders and call it music.
It's not rudderless, there's just a large amount of angst surrounding AI/ML ranging from "more ways to feed the copyright trolls" to "what should I raise my kids to do for a starter career?" and a lot of interpolated points in between.
You're totally okay not feeling this angst. But so are the folks who do.
I think it's really neat, but I also kind of go "meh". I've been into generative music stuff for a long time, but whenever I get to the end of the project I go "meh", and I don't really feel any different about this.
As I've watched the evolution of music generation with LLMs I feel like I just keep hearing drivel at greater fidelity. If you like it then by all means listen to it, but this is average or below. In some ways I think I prefer the more chaotic less coherent predecessors. They're a bit more interesting to my ear.
And as other posters have said: that doesn't really sound like Aphex Twin to me at all.
If I'm being honest, i have to agree. This is like, the least interesting sound I've heard today. It's just a beat, like i bet some things could sound cool eventually but it's just ridiculously generic and kinda derivative. As well i can tell it's ai generated, it's got the same kind of stilted, just holding on to tempo, that most voice generation sounds like. Like it's mere moments away from entirely falling apart into machine screeching and creepy whisper sounds. Maybe there's better examples but being introduced with this clip has really put me off the whole idea
I get the same feeling every time I buy into the AI hype and try it for myself.
On stuff like art it's hard to judge objectively, but in things like code it's much simpler. Don't get me wrong there are cases where I find generative AI useful - but the hype machine and the unedited whole solutions are just straight garbage.
because it doesn't sound like aphex twin, isn't particularly quirky, isn't skronky, and doesn't just hit me.
It sounds like output not resembling what you requested, and you're celebrating because for some random reason this particular prompt didn't sound totally horrible today. But it isn't intentionally making music, and it isn't particularly interesting music either. It's basically baby's first drum machine sort of stuff.
When it comes to music, the meme at this point is that "discovery is the problem." There is already so much music being made by so many people that the difficult part is connecting listeners to music they enjoy. It's tiring to see endless takes of "finally! we don't need artists anymore! we can just stand on their shoulders by training models on their work and then generate our own art instead!"
There's already so much art being made. Why have so much joy in ignoring all of it and focusing on generative AI instead?
Furthermore, there's another work posted on that FB account that has the caption:
"While I'm concerned about the possible impact on society - especially on the jobs front, I cant help but grin as the edifice of human exceptionalism is shred apart with every passing day."
Is that you? Why the misanthropy? "edifice of human exceptionalism"? As it applies to... making music?
Possibly. FAIR has always been doing great work and making it public though (PyTorch is so big that we forget about it sometimes). Sadly 'we sell ads' is going to remain the case unless product people ask users to pony up some cash to use this tech. To be fair, I would totally chuck some cash to play with something like this and I can easily imagine a world in which this technology is used to power some bizarre social experiences like an online drum circle or some such.
Infinite music is interesting from the angle that the music that we value is connected to our cultural and social experience. How can we cherish a song that has never been heard before and will never be heard again, which means it is deprived of social context that would give it meaning?
One answer would be to create music that shares its roots with music that the listener already knows. This music could be enjoyable, but you can't exactly sing along to a melody you're hearing for the first and last time, so it has more limited engagement potential. This is an approach to composition that you learn when you study chord progressions and other elements in music theory, and it's what I'm sensing when I listen to the MusicGen outputs.
To draw from greater cultural context, you can incorporate folk and popular melodies that are widely known. Musicians love this trick. "Immature artists copy, great artists steal." MusicGen seems capable of doing this, too.
To promote a novel melody as something that listeners deeply cherish, or to innovate at the level of the theory, the social context has to be built up around the content after it's generated. E.g., when introducing a new song on the radio, a common trick is to play it between songs that are already popular; building up co-occurrences with songs that already have cultural significance. My challenge to Meta would be: can you use your platform to transform some of the model's novel outputs into familiar popular music? It would be an important cultural milestone if an AI-generated melody became a familiar tune that would be played in the café, recognized, and enjoyed.
I've thought about this same idea before, but especially as it relates to film/television, and writing.
Think of your favourite TV show.
If when you first watched that show, you were told no one else had ever watched it, or ever would watch it, would your engagement with it be the same?
Part of our enjoyment of art is the shared cultural context. Maybe saying "art" here isn't the salient thing. Maybe it's our engagement with ideas.
I personally haven't even considered this concept as much in relation to music, because while I do love to deeply engage with music and the shared narrative behind it, both real and imagined, I also just like to put on music that sits in the background as a tool to drown out noise while I'm working, walking, etc.
> It would be an important cultural milestone if an AI-generated melody became a familiar tune that would be played in the café, recognized, and enjoyed
I can see why some might think this is silly, it seems unscientific. But what exactly is human-level performance in music, and how can we detect that?
If music is an artifact appreciated by listeners, then any metric apart from whether the music is listened to would be a proxy — though I can appreciate the perspective of creating for the artist’s own sake, without a need to share the creations.
Popularizing some of the model’s outputs would reveal their merit against human-produced music, by allowing them to succeed or fail in attaining the same quality of cultural significance.
Then again, if the artist here is the AI research team and the audience is AI enthusiasts, then the music has succeeded in being heard and attaining cultural significance. It has been remarked that Schoenberg’s music was more often defended than listened to — music written by a theorist for an audience of theorists. I am a true fan of Schoenberg’s, though I can hear that the example outputs of this work are music that is meant to be accessible to the everyday listener.
I was playing with it yesterday and it's not bad. I'd much rather use it for e.g. YouTube videos than risk getting copyright claimed for using something that already exists.
This is incredible!! For all the “AI is stealing our music” naysayers here consider that all art is derivative or it lacks context and makes it nonsensical, and artists learn too.
I don't know why you're mentioning art being derivative, but that's not the thing that worries me.
What worries me, is that a good enough model will take away the incentive to write music for many, and as a consequence it will also remove performers. This will reduce demand on music teaching and instruments, which will then both become nearly inaccessible. Since learning music isn't a question of following a few youtube videos, this will leave the world with just AI music.
Jazz and classical music are probably exempt, since it relies on subsidies, their audiences care about the actual performance, and AI compositions will not draw enough of a crowd to make it financially interesting.
But popular music will suffer, and that's what makes development of these models straight evil.
Except for the 2 min lowfi demo, I found the examples to be pretty bad. Sounds like the music is being played in a cardboard box in your garage’s corner.
This is still pretty deep in the uncanny valley. None of the rock examples for example even sound very much like guitar. One sounds more like trance, the others seem more like metal than rock(though interestingly trance is a lot more similar to metal than you might think. It's just hard to notice at the surface level due to very different instruments).
Then again, in this case I don't mind. I'm sure someone like Simon Posford could do some really wacky sampling based off of this.
Don't see myself using it to make music just for my own listening though(not much of a composer). That's still a long ways off.
The question that I think tells the kind of potential that ML has for music is, how good is written language in describing music? People here compare this to the development of Stable Diffusion, but the mediums are quite different. Describe your favorite shot in a movie and most likely you’ll enumerate and get pretty far in conveying it to someone who hasn’t seen it, at least the physical aspect of it. But try describing your favorite song and the interpretations will be wildly different (outside of hard defined genre music).
> The question that I think tells the kind of potential that ML has for music is, how good is written language in describing music?
If you took a DAW project file of a song.wav that was completely written and produced digitally using virtual instruments and compiled all of the parameters a user had to set to achieve their output.wav into a .csv file, you may be surprised to see (1) how few parameters were used (2) how often those parameters are unchanged from their defaults and (3) the amount of those parameters that would be expected in any other project file.
When you break it down, you really only have 6 layers to parse, all of which are dynamic but within a relatively small and consistent sandbox, at least relative to image generation.
1. composition layer - the midi or notes of the song.
2. arrangement layer - the selection of instruments used in the song and the division of the song's midi to the song's respective instruments.
3. instrument layer - the parameters of each instrument, such as a synth path or a virtual piano's room setting.
4. post processing layer - the effects placed on the output of each instrument, such as reverb, compression, delay, ect.
5. mixing layer - the volume of each instrument + post processing channel
6. mastering layer - processing on the master track
All of these things are more or less standardized. Developers always add their own flair (read: custom parameters) for their plugins, but they can be decompiled to be a composition of each of these layer's fundamental parameters. All these parameters + the midi of a song would be a few kb
I feel like a LLM trained on the parameter sets which interacts with the software used to manipulate these layers could really produce amazing tools and open the door to writing high quality songs to everyone, just as other AI products have opened so many similar doors.
The DALL-E app for music, in my mind, probably wont be a text based description -> .wav output. Instead, it would be the generation of the elements of each layer with options that can be listened to in real time using whatever VSTs were used in training. When you ask ChatGPT to write a complex python script, it starts with an outline of all the methods in the script as placeholders and then takes you step by step until you're done, then you troubleshoot it or flesh it out. The best part of a generative music like this is that it leaves the user really only with having to decide if something sounds good or not.
As a mostly musically illiterate producer myself, I've produced hundreds of songs and a few albums without ever really learning how to do anything other than manipulate the parameters. When I started learning to produce music I was 15 years old and knew nothing about music production. But, I was really good at was using computers and software so I learned to play the DAW, the plugins, and the sample packs. The only layer that I couldn't learn through learning software was the composition, the writing of the midi. Fortunately, the midi of a song becomes very easy to brute force over time, so I learned to brute force midi. Once I became efficient with my workflow, producing music became a task of "make this idea sound good." And without ever really feeling like I was a musician or composer, this became an enormous passion and outlet for me that I did every day for a decade.
I was able to do this because at its core, all the mechanical parts of a song are simple machines and a song's quality is the way those machines are used together. As an outside, this feels like a workflow that would be very machine learning friendly. But I could be wrong!
Google has actually released Magenta [0] which is a plugin within Ableton that can do just that. It's pretty cool, it generates interesting melodic content at the click of a button.
I’m a musician and have spent over 20 years making music. It’s my biggest passion. I still love this. Music isn’t a competition to me so it doesn’t bother me that someone with little experience can create great sounding music. I can still make music that’s an expression of myself, or I can generate some AI music to see if something catches my ear as a starting point. It’s just another tool, and I’m excited to see how it evolves.
Pretty good at transcribing the text, but the outputted music feels for lack of a better word “safe”. For example the kicking beat is way to generic and soft.
Style transfer is pretty cool tool as someone who likes to play around with sound design. It will be a lot of fun to drop this on parallel channels and blend it together into choruses and new instruments.
Right now the generation still sounds a lot like loop packs smashed together anyone could technically make. But it is practical for anyone who really only cares for that style of sound but do not themselves have the familiarity to do it. Now they can just say what they want and hit regenerate, skipping the latent feedback cycle of iterating with humans or sifting through song snippets.
My opinion on this style of content is that ai generation is simply accelerating us to the inevitable end of generic digital content, it isn't really changing it. It just happens to be also the optimal interface for discovering and not just generation.
These are really impressive. Providing the ‘prompter-composer’ can generate high quality, stereo mixes — these all appear to be low bitrate, mono files — you could probably use them right now for various ‘wallpaper music’ purposes, where originality is not required or is even a detriment: ‘on hold’ music, low budget video productions, symphonic film scores etc.
Producing this kind of anti-music is pretty soul-destroying for a musician, anyway, so the machines might as well do it. We can then spend our time working on stuff that means something, and if we’re lucky and it connects with enough people, make a living from it.
Before using just listen the samples. It doesn't have any structure even within 30 sec. It's repetitive and boring. Can be used as a background in low budget game, but that's it. It's not a threat to human talents, not yet. It's like first text generators with tiny attention windows, took some time to evolve into GPT-4. Then it will have an impact. Bans and unions may delay that for some time.
Music is hard to describe well without using artist names or references to specific songs. There isn't an alternative way to really describe things - "Airy EDM with tropical feel" doesn't cut it.
This space will belong to scrappy shadowy decentralised organisations who let you type "give me a filtered french disco song using mizell brothers era johnny hammond jazz funk samples, lil uzi rapping, with a thundercat bassline and crooning"
As if the current music scene hasn't already plummed the depths of banality. At this rate the stars of the 70s and 80s will be in business until they expire.
Music is already one of the most extremely devalued art forms given how oversupplied it is. Boggles the mind to think of the consequences of technology like this reaching quality levels where the differences between it and professionally produced music are imperceptible.
One consequence is bespoke music that changes dynamically, e.g. in games.
There's also a relative dearth of royalty free music for independent content creators to use. AI would enable them to produce better content on a limited budget.
People who enjoy creating music from scratch will be unaffected - recognition and financial rewards are tiny already for most.
Do any of these generative music ML framework export as midi instead of .wav or .mp3? That would be 1000x times more useful as the quality we're reaching is good enough.
I can imagine an VSTi that just takes a prompt and generates midi tracks. Something like this is surely coming in the next couple of years.
That was the way almost all ML research on music was done until recently- train for MIDi generation with input of other MIDI, or, at an even more fundamental level, just notes. If you go back, tons of ML music papers were written on generating believable sequences of Bach, Mozart, etc. because it was just note prediction.
Since the advent of transformers, and this idea of using text models mapping the natural language space to tagged music samples, and the music tokenizer acting directly on sampled audio stream (the bits of a .wav file, essentially) all the cutting edge work is going that route. Because it is producing high quality, finished audio streams directly. And I think part of it is because there is way, way more training data for actual audio than there is for MIDI alone (there are tons of free midi sites out there but a lot of it is garbage and it pales in comparison to what is already sampled and tagged in real audio libraries).
I imagine what will happen is... within two to three years these LM-transformer-music models will get so good that the audio will be damn near spotless sounding, and there will be additional methods developed to synthesize with more control directly with the models, to the point where wanting MIDI so you can use your own HQ synth isn't needed, because if you want "the lead synth to sound less digital and more like a classic Minimoog Model D" you just add that to another 'Music2Music" pass and out pops your sound.
For those who still want MIDI there is still work being done on traditional audio-to-MIDI modeling and I think you'd wind up just using that in the chain.
I just got excited at a mention of Summits On The Air and found a boring AI article.
It's an amateur radio thing btw...You go up a mountain and make use of the good propogation to call other hams. You accrue points that are as valuable as HN points are!
Amazing work by FAIR! With tools like these, AI-enhanced creativity is going to give us hyper specialized output and thankfully further us even more from mainstream, one-size-fits-all art. We are living in the future!
I'm curious how this handles bass music where the lower frequencies need to be tightly mixed to sound good. Is that really something that a model can learn?
I may as a composer be biased but AI "generating" music is just sad. The hypocrisy is that musicians have been suing each other for intellectual property reasons, while this thing is being trained on everyone's music. The law should catch up on this.
I get that it's going to improve but for now it's also just elevator/supermarket music.
You make it sound like some force of nature is causing this to occur. These things exist because people are making them, despite there no longer being a healthy reason for doing so.
(I’m not saying there’s reason for AI development in general to stop, but these generative things that are designed to slot neatly into the role of human artists specifically have no reason to be developed further beyond proving it was possible, and that happened a while ago.)
The force of nature is that our society will utilize any technology before even considering its ramifications. "Touchscreens in cars? Let's do it!"
I think you might enjoy Neil Postman's book "Technopoly", which discusses the subject of weighing the pros and cons of a subject instead of just diving in headfirst every time some new technology is developed. His YouTube talks are also great.
I think that’s more of an emergent behavior, not a force of nature. But I agree that enough people might do stupid stuff to create this kind of emergent behavior, which seems to be happening now. Like maybe 90% or more of people think these art AIs are distasteful, but enough people can’t stop themselves from filling in the blank square where something is possible to create but hasn’t been created yet, so people keep trying to make it. Even though there doesn’t seem to be any upside or goal, since I’ve never gotten an explanation of one.
And I think if you’re going to create something that has a little bit of potential to harm at least a few people, you should at least have a decent goal or reason for creating it.
Life is a force of nature. Evolution is a force of nature. The development of society (across species, not just human) is a force of nature. The development of technology is an aspect of that. In other words, macro-economic trends such as the development of automation ARE in fact an aspect of evolution, aka, a force of nature.
Doesn't matter if that's not the sense in which the phrase is used, these things are arising out of the collective unconscious, not as the result of mere individual will.
GDPR has a severely muted effect because Americans are still doing the same things they were before. It'll be even more ineffectual for AI. Nearly all AI research you hear about is being done in the United States. European regulations will only stop Europeans from using it, but you won't be able to escape it anyway because of how much American culture is continuously imported into Europe. Meanwhile, the reverse is almost completely not true; very little European culture makes it into American culture. This will just kneecap European creators and companies. I wish Europe the best of luck with this.
Since we are talking about music, maybe I'm living in a European bubble but last time I was in the U.S. people were listening to classical music (basically OG Euro music), Beatles, ABBA, Elton John or newer stuff like anything involving David Guetta whatever. Plenty of European music being listened to in my niche (metal) as well. Music knows no borders.
That's implementation details. It could be the case that no art produced in the EU can be used as training data (or similar), not necessarily that EU AI models are forbidden from being trained on art. I find the former case the most probable.
But they have tastes and preferences in a way that the models lack — unless, possibly, if you have them retrained sufficiently long by a single individual or small group, I guess.
This isn’t about specialness, it’s about the foundations of civil society. People are meat and guts humans.
We’re autonomous entities capable of higher reasoning, limited in time, attention and talent and eventually die allowing new people time to flourish.
We also pay taxes and make silly arguments about software and humans being no different from each other because it justifies our ability to play with cool toys without considering the impact on other people. Corporations aren’t people though because they aren’t cool like AI.
I don’t understand what’s motivating certain types of people to continue working on these types of AI practical implementation projects. There’s nothing good for the world this (type of AI in particular) will offer.
Maybe someone will say this will let people who aren’t musicians express themselves by creating music. That’s not true. It’s as true as hiring a musician to make a song for you, given a description. And nobody would say that the person who hired the musician was expressing themself.
Instead they seem to be all in on washing out any hope in creativity and pointing people to put all their hope in minting and munging “code”.
It’s so myopic and short sighted it hurts my soul. I don’t understand at all. All that money, all that knowledge and talent… and this and stupid headsets strapped to peoples faces is the game? God dammit.
Musicianship already isn’t a profitable enterprise for most, and yet kids continue to learn piano and get good enough to go to Julliard. I doubt that will change.
There’s already an endless supply of free and royalty free music in every style. While an AI can now also generate that for you as well, it was not necessary to create the AI to meet your goal and requirement.
By definition of having an AI generate something for you instead of making it yourself, you can’t create exactly what you want. You can probably get it in the ballpark, though. But that’s exactly the same scenario we have today with musicians.
The same way we traded away the profession of painters for photography, it seems like we might be trading away muscicians for generative music AI. Except photography is super useful for lots of things and truly benefitted humankind, and generative music AI… only replaces musicians? I have no idea why we would make this trade, as a society.
> The same way we traded away the profession of painters for photography
People still paint? Even portrait, although I'll acknowledge it's far less popular than it was in the 1700s. A town full of reproduction artists in China seems a bigger threat to capitalistic western artists than an AI that will mostly provide generic output.
Let's say that ebooks now include metadata for soundtrack generation as you read them. Something like this model generates it real time based on the users reading speed etc.
That does sound cool, but you don’t need a purely generative AI to do this. Dealing with a reader who jumps around, re-reads paragraphs, flips back a few pages for a moment, etc. in a coherent way seems like the more difficult and interesting problem.
This kind of automated "filler" music has been around for decades, and is usually used for exactly that - filler. It's pretty much the stock photos of music.
And that could be a good thing - suddenly content-creators don't have to spend money or energy on purchasing that kind of stuff.
If you've ever seen youtube automation videos - typically those "TOP N" list vids, they always contain some kind of muzak-style soundtracks.
Sounds like the "is DJing an art form" debate. :o)
Unlike classic "hiring a musician", here it's practical to "hire" the (robot) musician 10000 times with a feedback loop between the model and the prompt writer, iterating and picking the best output(s)... which looks like a similar process to other exercises considered art forms.
> I don’t understand what’s motivating certain types of people to continue working on these types of AI practical implementation projects. There’s nothing good for the world this (type of AI in particular) will offer.
Man wants to dominate nature and always has. I don't think this is particularly difficult motivation to understand, as it seems omnipresent.
Maybe I'm just getting older but I feel like the quality of both music and film has seriously declined over the last 5-10 years. Maybe the good stuff is still out there but lost in a sea of average garbage that has surfaced to the top.
Something tells me AI isn't going to rescue us either. I just sampled a bunch of these generated tracks and they immediately remind me of the average, mediocre, soul-lacking content that most music and film is today.
When I was 20 I was a music snob into Aphex Twin and weird IDM. I thought all pop at the time was crap, like you seem to. But then I heard, I mean like really heard, "Bye Bye Bye" by *NSYNC and seriously that is a good song!
I'm 40 now and I think it got way better even since then. Pop is so varied now! I really don't think music as quirky and weird as, say, Billie Eilish would've made it to the top of the charts in the 90s. I'd say that music like hers (and many charting artists of her generation) is a testament to how broad and compelling pop music has become.
My generation thought their parents' music was shit, my parents' generation thought their parents' music was shit, and so on, all the way until at least the invention of Jazz. But the average Gen-Z'er thinks all the music is great! They invent new genres for every song, they wear Metallica t-shirts in 2023, and they mix 80s disco with 00's Brit rock like it's just what people do.
And don't forget there's an endless long tail of music out there. There are so many good musicians and plenty of them have a sufficiently fancy label deal to be on Spotify and the likes. And otherwise they're still on Soundcloud, Bandcamp and YouTube. It's worth a deep dive!
> hey invent new genres for every song, they wear Metallica t-shirts in 2023, and they mix 80s disco with 00's Brit rock like it's just what people do.
If this appeals to you, it's worth checking out Japanese music from the Showa era to present. They've long mixed styles in a way other music markets have not. You can hear city pop songs from the 80s with metal guitar solos, jazz progressions a samba beat and synths, all in the same song.
A flood of garbage film/music has always existed, we just don't remember it because its uninteresting.
However, I think modern rec algorithms (like the Netflix home page) are recommending more mediocre stuff than the old system, and the streaming boom did produce an abnormal glut of junk.
Anyway I think AI is going to spawn a music remixing/game modding/tv extending renaissance. They perform much better when pointing them at a good source (as you can see with the melody conditioning samples, and other stuff like sd img2img and finetuned llms).
I won’t speak to music, as I listen to a lot of stuff, enough to know there is good stuff out there being made.
But for movies and TV? Where do I find the good stuff? It seems Hollywood is creatively bankrupt and just milking people off boring franchises and cheap nostalgia through crappy remakes and sequels. My eyes rolled to the back of my head when I saw an ad for a show called “how I met your father” on Hulu.
> But for movies and TV? Where do I find the good stuff?
There’s a trove of incredible foreign movies and TV shows out there. Scandinavian and Asian (Korean in particular) content has a really good hit to miss ratio for me.
For examples, check out international film festival nominations and winners.
Reelgood is a good one, sort by IMDB score (which is somehow still kinda working as a metric) or the reelgood score which is a popularity among enthusiasts kinda ranking. You will find tvs gems streaming services criminally and inexplicably never recommend.
But "old school" recommendations from TV /movie buffs (like the tvtropes community or various forums) are still a good source.
Those “boring franchises” are what bankroll the passion projects, artsy festival bound movies, and experimental content.
As far as content goes, there has been a ton of excellent stuff just this year across movies, TV, and anime. One “organic” way to start is to look for recent recommendation threads on Reddit for a movie or show you really like.
See, I've seen this stated many places but never explained. How exactly does the money made by derivative bullshit go into valuable, passionate, art projects and not either directly into pockets or into the next billion dollar derivative bullshit thing?
Generally, the bullshit costs way more to produce and market and advertise. And at least on paper, a significant amount of the money made is only recuperating costs for the 3 hours of incredibly CGI it took to make a 3rd 'ant man' or a 4th 'jurassic park'. The majority of actual indie art films cost ridiculously less than that because they're filming a movie, not a commercial.
Anyways, my opinions aside, are there any articles with cited money trails that prove that billion dollar blockbusters actually fund valuable art and not just executives yahts?
Having a good time woth Trakt for discovery and rating. It has a very active app/plugin/webhook ecosystem and I've gotten some great recommendations from it by scrobbling via Plex and following a few people with similar preferences on there.
I don't know what you like, but "Prestige TV" seems to be where writers, directors, and actors wanting to do something other than another retread from some studio's IP backlog, end up.
It was a smaller flood though, when it required lots of money to record an album / make a movie. Gatekeepers kept most of it out. Now anybody can do it, so there is both a lot more chaff to sort through, and an outpouring of creativity.
I disagree that key changes in popular music are a great measure of complexity. For many years a key change near the end of the song was an easy way to give the sense of a climax. The article your link is based on gives a good summary of it:
> The act of shifting a song’s key up either a half step or a whole step (i.e. one or two notes on the keyboard) near the end of the song, was the most popular key change for decades. In fact, 52 percent of key changes found in number one hits between 1958 and 1990 employ this change. You can hear it on “My Girl,” “I Wanna Dance With Somebody,” and “Livin’ on a Prayer,” among many others.
To me, this just reflects one set of songwriters' cliches being replaced by another. Not necessarily better or worse.
While i do agree generally about key changes, i think the point is that it's just an example of something that sounds _interesting_. It's not just key changes, but all the little chances that an actual artist takes during creation, the things that sound good to some and bad to others are exactly what makes art, art. The change being witnessed isn't the loss of key changes, but the loss of everything that sounds different or interesting, in favor of a sound that is generally palatable to everyone precisely because it does not contain anything interesting.
How about time signature changes, then? Not too many popular songs experiment much anymore. What was the last popular hit with a really odd meter (or various meters)? I know, not everyone can be Rush, but it’s pretty vanilla today.
Thats honestly what it feels like. It feels like all music and film has regressed toward some boring mean. There's not enough range, emotion, and difference to find tracks that really stand out from the crowd.
Music especially just feels flat. Maybe that's just the style now, and I'm old and can't appreciate it.
Honestly, gaming is in a similar rut although not quite as bad thanks to VR.
There's plenty of really good creative music. If you only watch Marvel and Top 20 hits you won't know it, but there's plenty of good stuff out there. I've really enjoyed the last couple of Bon Iver releases and my favorite artist, The Tallest Man On Earth, just released his new album Henry st. Containing some super personal tracks.
You are getting older. Every generation thinks the same, that media is getting worse, discounting the survivorship bias that occurs when they look back on their favorite music and discarding all of the bad music that was present back then.
This is no doubt true, but there are a number of studies which suggest that, at least in the case of music, things really have gotten rather worse over the last two decades, thanks to corporatization and consolidation of the production model.
I don't understand this view. Heavily commercialised music has almost never been all that great anyway. Except very occasionally. Most of it is LCD garbage. Maybe the garbage has become even more garbage, I don't know. But why judge an art form by the boring average.
There's so much great new music being made every year, new genres and ideas, etc. Film music seems better than ever recently. Especially for TV series. Lots of new styles emerging there too, see Mac Quayle for instance.
The really good, modern music was almost always on the fringes, and there's more of it now than ever before.
There might also be more garbage, but there's no need to listen to it.
This is only if you listen to the most mainstream, general audience top 40 pop content-sludge.
There is an overwhelming amount of good music out there. Pick an album top 50 list from 2022, for example fantano's, or pitchfork, check out bandcamp's staff picks, listen to other musicians that are on the same label as your favourite band, keep an eye on things like NPR Tiny Desk, KEXP, la blogothèque on YouTube.
Just start listening. You are almost guaranteed to stumble upon something you like. It won't come to you algorithmically but the effort required is really low.
My favourite new album I discovered last year was Immanuel Wilkin's The 7th Hand [1], I stumbled upon it by going through a top 20 jazz albums of 2022 list to see if I had missed anything, and it immediately jumped out at me as being exactly the shit I'm into.
The research on this topic that I'm aware of fails to account for the fact that the top 40/100 lists are less representative of what people are actually listening to than they used to be. If Drake can drop an album and have every song on it chart on the Hot 100 for a week or two, that's going to influence the analysis. That simply wasn't possible before music downloads/streaming. You can see the impact on the chart records -- artists from the past decade dominate.
ETA: And "worse" in these studies tends to be defined in terms of measurable qualities where contemporary pop music most differs from "classical" music.
There are more bedroom indie music producers than ever. EDM and "rap" are better than ever with many many good artists to choose from. One of the biggest breakout rap artist right now was just a random 20 year old working with other random bedroom producers just a couple years ago.
In the 1950s there was no metal, chiptunes, EDM and other stuff like that. We had all that by the 80s and 90s. But what can we make now that we could not have made 20-30 years ago? Seems like that we've reached "the end of history" in music. I haven't seen anything new appearing.
Computers can already generate every possible waveform that can be heard by human ears. There's nowhere else to go. Anything above 16kHz is useless and can't be heard anyway.
I'm not saying that music is worse. I'm saying that we have reached the "end of history" and there will be no more new radical changes like when genres like rock, metal and edm appeared.
That seems like a different point than the one I made that you replied to, so sorry for the confusion.
I'm not too sure about end of history, there might well be other genres and styles popping up. Just because we always had the ability to paint on a canvas doesn't mean pointillism or cubism didn't emerge. Just because we can put any letters on a page doesn't mean new genres of fiction didn't emerge. The fact that we can generate any waveform is analogous to these (ie, having complete control over a medium) and doesn't indicate anything about what might emerge in the future.
We've had that capability for at least 3 decades and nothing completely new has appeared since then. Something completely new like rock, metal or edm. You could not have made those during the 30s - the necessary technology hadn't been invented yet.
Can you provide some sources? Last time I looked into statistics on these topics I found the opposite to be true.
The brothers Rosling have a nice talk about how all the big stats are improving globally (gender equality, education, health, extreme poverty, life expectation)
Sources on these? Society has been far better than 100 years ago. And broken families? Or people marrying early due to societal pressure and not being able to divorce back then (whether legally or societally), who are now finally able to do so. The divorce rate is actually going down simply because people are marrying when they want to, not when society pressures them to.
I agree with you. When gwern investigated AI folk music in 2019, I realized it could generate a wonderful variety of music, full of soul. Be sure to listen to several tracks before making up your mind. My favorite is “crossing the channel”, since I think GPT made a mistake at the beginning, and then generated the most reasonable sounding not-mistake, which turned out to sound so cool.
My goal was strong, memorable melodies. Star Wars, not Marvel. GPT can come surprisingly close, if the input data format is right. Unfortunately I don’t think anyone except gwern has noticed that the input format is crucial: https://gwern.net/gpt-2-music
Sadly no, and nowhere. And with my primary focus being gamedev for the foreseeable future, the only way I see it being resurrected is if I need some generated music. That’s fairly low on the priority list for now, but it might preempt other things. https://github.com/shawwn/noh
To be immodest for a moment, my work serves as an example that it’s possible to do it, and better than anyone else, long before they figure out how. Many examples of this pop up throughout history, and I am gratified to be a small but real one.
I don’t know if gwern realized how powerful his model was. His examples are underwhelming, because you have to prompt it in a certain way to get it to generate chords. He was showing me samples and they were neat, but boring.
One day he posted something that sounded pretty amazing, and I was blown away. “More like that, please.” It had chords in it.
He didn’t pursue it past that. I did. So it’s possible that no one is aware of how crucial the input format actually is to the success of the music that I was able to produce.
(And “produce” is a fair description here — choosing the instruments was really important, and the model didn’t do it. It wasn’t as easy as press a button. It felt like I was suddenly a 15x music producer, since I made all those tracks in one night. Such is the power of ML.)
Do you have a write up anywhere with samples? I'd love to hear some of the better examples you have. I agree that most of what's out there is underwhelming
Unfortunately I didn’t do a writeup (which gwern has given me a hard time about over the years, and he’s quite right!), so I have nothing to offer beyond those songs as a finished product. Maybe one day I’ll try to resurrect it for devs.
You really have to dig for good music these days. The record industry is a zombie at this point and no longer does the job of discovering good music. It just churns out utterly formulaic pop that might as well be the output of a music generator like this.
You're getting older. I'm 62 and moving air always inspires me. The shit my children listen to annoys me as much as the shit I listened to that annoyed my parents.
> remind me of the average, mediocre, soul-lacking content
This model is just the equivalent of GPT-2 for music. It's not the GPT-4 yet. Music is trailing a few years from language. Used to be that language was about 5 years behind vision. Now language is the top.
This is why I paid extra for services like Tidal and Roon. Their music recommendations are just better than any AI-driven stuff. Need actually human experts to curate playlists and such. I feel like the alg-based stuff is just a race to the middle.
This is correct. I have about 2TB of (royalty-free) samples that I pick and chop and place and resample when making music. I actually get more original the more material I have to work with.
Music producer here with an honest question to those saying "this will provide me with a simple soundtrack/background music for $PROJECT"
Have any of you checked out / made offers on music production subreddits? Or other music subreddits? various music production discords? Elsewhere on the internet?
If so, could you say what your experience has been?
I ask because the music production scene is like...ridiculously saturated, and it's almost a meme in the producer community how hard it is to make even a buck producing. I suspect that there are a significant number of producers who would be happy to take your "prompt" for a small fee. Yes, I understand 1) free and 2) immediate is convenient, but isn't 1) relatively inexpensive and 2) whatever advantage intent in construction gives good too?
I'm willing to admit that I'm missing something here, but I'd love it if someone could enlighten me.
While I'm asking follow ups, to all the folks who love digging for new music so much that they're considering turning to prompting AIs, I'd be seriously surprised if you've really checked out all the stuff that is coming out from new producers (again, reddit, soundcloud, etc). Another meme in the producer community is how one spends hundreds/thousands of hours perfecting ones craft, and dozens of hours working on a track, only for that track to get like 5 plays on soundcloud and negligible engagement elsewhere. Are music consumers really that desperate for new tunes? Frankly a lot of us just aren't seeing it....