Always appreciate new attempts at music gen. As with every attempt thus far, even the hand picked selections sound like random nonsense locked to diatonics within a particular key, and no real harmony or counterpoint to speak of (and that's the "good" output, they never let you hear just any old output, it's always 'listen to this handful I selected, the rest may be total garbage').
What if we would train computers to compose same way as we teach composition students: renaissance counterpoint, fugues of Bach and harmonic structuring of classical era (sonata form)?
Unfortunately, as most people in computational creativity will tell you, teaching/learning "rules" is only a tiny sliver of the problem. But even getting a computer to "understand" those things in a way that would allow it to apply them to the act of composing is vastly beyond our current understanding.
By far the best output I've heard is still David Cope's stuff, which dates back to at least the 1990s. No one really seems to have improved on it significantly.
Which, even worse, was not only hand selected but also heavily influenced by Cope himself, as he selected snippets he enjoyed from the output. So it's not really an apples to apples comparison.
I found the part about notewise vs chordwise encodings very interesting!
Ages ago I was a sequencer geek (Impulse Tracker!) while also noodling around with guitar, and I noticed something strange: I made music I liked a lot more when I composed on guitar and transposed onto the sequencer afterwards. After a lot of experimentation, I realized that the constraints on what my hands could do on guitar were (of course) having a huge impact on what I tried to do when composing -- and struggling with the constraint was helping me make music I liked more.
I like a vision for practical machine learning where we spend less time on plumbing and more time thinking about the kinds of constraints (e.g. through input encoding) that enable "creativity" on the part of the machine.
That's so interesting - you're totally right that setting constraints often leads to really creative ideas. It reminds me of the "crab canons" by Mozart and Bach: https://en.wikipedia.org/wiki/Crab_canon .
I also think there's room for other creative encodings for music - possibly expanding these notewise/chordwise ideas, or possibly going in a totally new direction. It's fascinating to me how much the generations are affected by the encoding.
Another fun direction is to generalize the kinds of constraints we put on our own instruments! I had a chance to play with that in a graduate class by implementing an API for midi generation where you set chord fingerings and strum patterns independently for a guitar of [N] strings.
Of course, I had to "play" the guitar myself by writing song sequences in those terms... it would be terrific to see what an AI could do with a notation scheme representing, say, a 20 string guitar or a 30 foot long flute.
Baudelaire said something like that (about the sonnet): "Parce que la forme est contraignante, l'idée jaillit plus intense" (poor translation: "because the form is constraining, the idea comes out more intense")
"The more constraints one imposes, the more one frees one's self. And the arbitrariness of the constraint serves only to obtain precision of execution."
IMO this is eventually going to replace a lot of tasks. This for example, can dynamically generate elevator music (or music in an office). The system we built can generate synthetic data for testing and sharing samples of datasets. Eventually, we'll have entirely synthetically generated videos, advertisements, and more.
My english teacher in high school said that some guy from Apple came to talk to them and said that soon AI would be able to write stories. That was 15 years ago and as far as I can tell, they cant use bots to write anything like an original story that anyone would want to actually read. Good luck making "entire movies."
Yes. Mood Media is a company that bought Muzak, Inc - the original elevator music company (and the reason we sometimes talk about disposable music like this as “muzak”). They are a substantial business now owned by private equity. They acquired Muzak for soemthing like $300m a few years back.
Background music is actually quite difficult, commercially. Someone needs to write and arrange it, and they need to be paid - either royalties each time it is played which is why a lot of companies don’t use “known” music for telephone hold and so on - it’s too expensive. If it’s not on a royalty basis then the writer needs to be bought out - which can be expensive.
So having algorithmically generated music is actually really interesting because there is potentially no author to be paid. This is actually an emerging area of music copyright law. If an algorithm writes music who owns the copyright to that music? The computer? Probably not, not a legal person. The people who wrote the algorithms? Possibly - but did they actually create the music? Or does no one own it - meaning anyone can use it without payment? If a label commissions an algorithm to write hits who owns the music publishing?
One such example would be for people making videos on sites like YouTube, where you want some sort of background music to keep the video alive but where you don't want to license something, use the same music as everyone else, or spent a lot of time digging through the internet to find something that ticks all the boxes.
Elevator music was, in retrospect, pretty much what I a decade ago in a failed effort to be a Mac shareware developer. Mostly games, and their background music was procedurally generated, no real beginning or ending.
Drunk walk around a key, with randomised reset locations whenever the walk went out of bounds. Very good for fake oriental music, acceptable for action/scifi, terrible for theme development or classical style.
Nothing special, except that I totally failed to know anything about any of the previous efforts until years later, so it was all wheel-reinvention.
And then Apple deprecated Java, so it became obsolete.
The issue that 'rests are so common, we need to remove them or the algorithm would just predict rests all the time' shows the flaw with this approach.
If there is some pattern in your data, and your algorithm, rather than replicating something similar to the pattern, just outputs the most likley value at any point in time, then it is never going to work as you hope. Rests are a symptom of this, and fixing them doesn't fix the underlying issue.
There are a bunch of solutions to this, but adversarial models do a good job of approximating a probability distribution like this.
> There are a bunch of solutions to this, but adversarial models do a good job of approximating a probability distribution like this.
The problem is GANs on sequence data still stink compared to max-likelihood: they train far more slowly, more unstably, and still don't generate decent sequences compared to a char-rnn with a bit of temperature tuning & beam search. They should be better for precisely the reason you say, but they aren't.
I am struck at the quality of music neural nets can generate today. Just a few years ago it was much worse - the notes would make sense for 2-3 seconds and then they would just drift into another direction. And using the Transformer for music is an intriguing idea.
In the answer to "Wait, what's a rest?", I'm intrigued by the definition of "...any time step where you don’t play any new notes." (emphasis mine)
Why not have each time step contain all pitches that should sound during that time step (so starting a new quarter note and continuing a half note would both appear in the same time step)?
Then at the end of generating the music, perform some post-processing to get the note lengths.
Would the approach in the interview having any significant advantages to this approach? (I suppose you do lose the ability to rearticulate a pitch with my idea)
I got 3 out of 4 correct. In the first two questions, the AI seemed easy to identify because of alien rhythmic patterns, not really because of melodic content. In the 3rd, the AI was identifiable because the piece, while pleasant, seemed to lack a plausible development of the idea (but this is something that easily could be ascribed to a second rate human composer). The one I got wrong, the AI composition was pretty good, and the human one had exactly the alien rhythmic patterns that to me were a giveaway for an AI composition. Weird composer or bad performance?
Do you have any examples of jazz compositions by your software? Would be very interested in hearing that.
I'm really curious how much effort there was in building up the data set - before, training the model before you got to "music"
Reading the steps feels like 9 months to a year before you got to credible music.
What kept you going in the belief this would work. I can think of 20 reasons why this shouldn't work - hence its "surprising" that it does. Its quite easily something you could have worked on for 5 years with no results.
reading your background - it also sounds like your time would be tightly constrained hence figuring out where to deploy it - you need to have some conviction you'll have success
Awesome work Christine! I've only ever heard you play classical music in concert. Any plans to perform bits of your AI generated music live. Perhaps with Ensemble SF?
Also, I noticed your data format has a flag for instrument type. Have you considered generating for voice? Obviously a very different beast but it seems the same principals could apply. It would be important to restrict the music to a model of what a human is capable of to make it singable. Adding physical constraints to the piano generated music might also be interesting. Fingers are so long and there are usually only ten.
Has anyone done work on automated evaluation of the quality of a musical composition? Possibly by training a neural network, or maybe even just by designing some heuristic rules which try to capture what elements make music pleasing to humans?
Then, could you train a neural network (or a genetic algorithm, or whatever) to compose music that is assigned a high quality score by such a composition quality evaluator?
I actually just recently took a shot at something very similar to this for my undergrad thesis! [0]
I used genetic algorithms to generate 4 measure melodies, using a long short-term memory (LSTM) neural network to determine the fitness of melodies. I trained the LSTM on snippets of music by J.S. Bach. It was able to distinguish between random noise notes and actual music quite well, and to a somewhat lesser degree between Bach and other composers.
The melodies it produced were...mixed in quality. I really liked some of them, but quite often it would get stuck at some local maxima of the fitness and couldn't mutate its way to something better.
>"More recently, there is a shift towards using a Transformer architecture, and right now I’m experimenting with that as well."
I'm really curious- any early results to share on that? Attention really does make a big difference on a lot of things (including work I've done so I know first hand). It should improve the coherence of the entire music piece in theory at least, right?
Transformer is working really well- I'm very excited. I'll probably be sharing results soon. Yes, the attention makes a huge difference & the pieces are both more creative and more coherent.
Have you considered using 'learning from human preferences' as the loss function in addition to the Transformers? That was another OpenAI project, and it seems tailor-made for music generation: what is more 'I know it when I hear it' than music quality?
textgenrnn (https://github.com/minimaxir/textgenrnn) uses a simple Attention Weighted Average at the end of the model for text generation, which in my testing allows the model to learn much better.
Haha, very true. It was Charlie's question in this interview that made me realize the 62 key limit was an old fix that I no longer needed, so now I'm trying out expanding my dataset and also expanding to the full 88 keys!