Generating classical music with LSTM neural networks

TaupeRanger · on Nov 28, 2018

Always appreciate new attempts at music gen. As with every attempt thus far, even the hand picked selections sound like random nonsense locked to diatonics within a particular key, and no real harmony or counterpoint to speak of (and that's the "good" output, they never let you hear just any old output, it's always 'listen to this handful I selected, the rest may be total garbage').

raptorraver · on Nov 29, 2018

What if we would train computers to compose same way as we teach composition students: renaissance counterpoint, fugues of Bach and harmonic structuring of classical era (sonata form)?

TaupeRanger · on Nov 29, 2018

Unfortunately, as most people in computational creativity will tell you, teaching/learning "rules" is only a tiny sliver of the problem. But even getting a computer to "understand" those things in a way that would allow it to apply them to the act of composing is vastly beyond our current understanding.

stephencanon · on Nov 29, 2018

By far the best output I've heard is still David Cope's stuff, which dates back to at least the 1990s. No one really seems to have improved on it significantly.

TaupeRanger · on Nov 29, 2018

Which, even worse, was not only hand selected but also heavily influenced by Cope himself, as he selected snippets he enjoyed from the output. So it's not really an apples to apples comparison.

aswanson · on Nov 29, 2018

He was just an added threshold layer to the network. :)

p1esk · on Nov 29, 2018

Check out https://aiva.ai/creations

TheOtherHobbes · on Nov 28, 2018

Agreed. This is barely structured musical gibberish.

I'd expect Transformer to produce slightly more structured musical gibberish.

EADGBE · on Nov 29, 2018

As a musician; I'm less worried about a computer replacing me than as a software developer.

evrydayhustling · on Nov 28, 2018

I found the part about notewise vs chordwise encodings very interesting!

Ages ago I was a sequencer geek (Impulse Tracker!) while also noodling around with guitar, and I noticed something strange: I made music I liked a lot more when I composed on guitar and transposed onto the sequencer afterwards. After a lot of experimentation, I realized that the constraints on what my hands could do on guitar were (of course) having a huge impact on what I tried to do when composing -- and struggling with the constraint was helping me make music I liked more.

I like a vision for practical machine learning where we spend less time on plumbing and more time thinking about the kinds of constraints (e.g. through input encoding) that enable "creativity" on the part of the machine.

mcleavey · on Nov 28, 2018

That's so interesting - you're totally right that setting constraints often leads to really creative ideas. It reminds me of the "crab canons" by Mozart and Bach: https://en.wikipedia.org/wiki/Crab_canon .

I also think there's room for other creative encodings for music - possibly expanding these notewise/chordwise ideas, or possibly going in a totally new direction. It's fascinating to me how much the generations are affected by the encoding.

evrydayhustling · on Nov 28, 2018

Another fun direction is to generalize the kinds of constraints we put on our own instruments! I had a chance to play with that in a graduate class by implementing an API for midi generation where you set chord fingerings and strum patterns independently for a guitar of [N] strings.

Of course, I had to "play" the guitar myself by writing song sequences in those terms... it would be terrific to see what an AI could do with a notation scheme representing, say, a 20 string guitar or a 30 foot long flute.

williamdclt · on Nov 29, 2018

Baudelaire said something like that (about the sonnet): "Parce que la forme est contraignante, l'idée jaillit plus intense" (poor translation: "because the form is constraining, the idea comes out more intense")

nixpulvis · on Nov 29, 2018

I forget where I read it, but that's the design behind the OP-1 by teenage engineering. https://www.teenageengineering.com/products/op-1

emptybits · on Nov 28, 2018

I reckon Igor Stravinsky would agree:

"The more constraints one imposes, the more one frees one's self. And the arbitrariness of the constraint serves only to obtain precision of execution."

jacquesm · on Nov 29, 2018

The same goes for programming. The more constrained the environment, the more the code looks like a work of art.

citilife · on Nov 28, 2018

We're actually using a similar technique (quite a bit more complex) to generate synthetic data for applications:

https://medium.com/capital-one-tech/why-you-dont-necessarily...

IMO this is eventually going to replace a lot of tasks. This for example, can dynamically generate elevator music (or music in an office). The system we built can generate synthetic data for testing and sharing samples of datasets. Eventually, we'll have entirely synthetically generated videos, advertisements, and more.

In 50 years, entire movies may be generated.

jmh530 · on Nov 29, 2018

My english teacher in high school said that some guy from Apple came to talk to them and said that soon AI would be able to write stories. That was 15 years ago and as far as I can tell, they cant use bots to write anything like an original story that anyone would want to actually read. Good luck making "entire movies."

huhlig · on Nov 29, 2018

You mean like this: https://arstechnica.com/gaming/2016/06/an-ai-wrote-this-movi...

ben_w · on Nov 29, 2018

As that’s an example of it being funny-bad, I’d have suggested AI news writers instead: https://en.m.wikipedia.org/wiki/Automated_journalism

isoprophlex · on Nov 28, 2018

Is anyone currently affected by a lack of elevator music, be it due to financial reasons or any other reason, and does your approach solve this?

I hope you'll agree that you gotta find a better, more sympathetic example if you want to sell your generative algo's...

saaaaaam · on Nov 29, 2018

Yes. Mood Media is a company that bought Muzak, Inc - the original elevator music company (and the reason we sometimes talk about disposable music like this as “muzak”). They are a substantial business now owned by private equity. They acquired Muzak for soemthing like $300m a few years back.

Background music is actually quite difficult, commercially. Someone needs to write and arrange it, and they need to be paid - either royalties each time it is played which is why a lot of companies don’t use “known” music for telephone hold and so on - it’s too expensive. If it’s not on a royalty basis then the writer needs to be bought out - which can be expensive.

So having algorithmically generated music is actually really interesting because there is potentially no author to be paid. This is actually an emerging area of music copyright law. If an algorithm writes music who owns the copyright to that music? The computer? Probably not, not a legal person. The people who wrote the algorithms? Possibly - but did they actually create the music? Or does no one own it - meaning anyone can use it without payment? If a label commissions an algorithm to write hits who owns the music publishing?

isoprophlex · on Nov 29, 2018

Thanks for changing my mind on this, I was looking at it in an overly simplistic way

saaaaaam · on Nov 29, 2018

You are welcome!

hamaluik · on Nov 28, 2018

One such example would be for people making videos on sites like YouTube, where you want some sort of background music to keep the video alive but where you don't want to license something, use the same music as everyone else, or spent a lot of time digging through the internet to find something that ticks all the boxes.

ben_w · on Nov 29, 2018

Elevator music was, in retrospect, pretty much what I a decade ago in a failed effort to be a Mac shareware developer. Mostly games, and their background music was procedurally generated, no real beginning or ending.

Drunk walk around a key, with randomised reset locations whenever the walk went out of bounds. Very good for fake oriental music, acceptable for action/scifi, terrible for theme development or classical style.

Nothing special, except that I totally failed to know anything about any of the previous efforts until years later, so it was all wheel-reinvention.

And then Apple deprecated Java, so it became obsolete.

londons_explore · on Nov 28, 2018

The issue that 'rests are so common, we need to remove them or the algorithm would just predict rests all the time' shows the flaw with this approach.

If there is some pattern in your data, and your algorithm, rather than replicating something similar to the pattern, just outputs the most likley value at any point in time, then it is never going to work as you hope. Rests are a symptom of this, and fixing them doesn't fix the underlying issue.

There are a bunch of solutions to this, but adversarial models do a good job of approximating a probability distribution like this.

gwern · on Nov 28, 2018

> There are a bunch of solutions to this, but adversarial models do a good job of approximating a probability distribution like this.

The problem is GANs on sequence data still stink compared to max-likelihood: they train far more slowly, more unstably, and still don't generate decent sequences compared to a char-rnn with a bit of temperature tuning & beam search. They should be better for precisely the reason you say, but they aren't.

visarga · on Nov 29, 2018

I am struck at the quality of music neural nets can generate today. Just a few years ago it was much worse - the notes would make sense for 2-3 seconds and then they would just drift into another direction. And using the Transformer for music is an intriguing idea.

Edit: apparently someone has already implemented music generation with the Transformer. Samples: https://storage.googleapis.com/music-transformer/index.html

big_t · on Nov 29, 2018

In the answer to "Wait, what's a rest?", I'm intrigued by the definition of "...any time step where you don’t play any new notes." (emphasis mine)

Why not have each time step contain all pitches that should sound during that time step (so starting a new quarter note and continuing a half note would both appear in the same time step)? Then at the end of generating the music, perform some post-processing to get the note lengths. Would the approach in the interview having any significant advantages to this approach? (I suppose you do lose the ability to rearticulate a pitch with my idea)

microtherion · on Nov 28, 2018

I got 3 out of 4 correct. In the first two questions, the AI seemed easy to identify because of alien rhythmic patterns, not really because of melodic content. In the 3rd, the AI was identifiable because the piece, while pleasant, seemed to lack a plausible development of the idea (but this is something that easily could be ascribed to a second rate human composer). The one I got wrong, the AI composition was pretty good, and the human one had exactly the alien rhythmic patterns that to me were a giveaway for an AI composition. Weird composer or bad performance?

Do you have any examples of jazz compositions by your software? Would be very interested in hearing that.

whatrocks · on Nov 28, 2018

There's a short snippet of a jazz composition from Clara near the top of the post.

microtherion · on Nov 28, 2018

Ah, thanks! Not entirely natural, but could marginally be passed off as "Thelonious Monk, having drunk one Espresso too many".

nilanp · on Nov 29, 2018

I'm really curious how much effort there was in building up the data set - before, training the model before you got to "music"

Reading the steps feels like 9 months to a year before you got to credible music.

What kept you going in the belief this would work. I can think of 20 reasons why this shouldn't work - hence its "surprising" that it does. Its quite easily something you could have worked on for 5 years with no results.

reading your background - it also sounds like your time would be tightly constrained hence figuring out where to deploy it - you need to have some conviction you'll have success

jcoffland · on Nov 29, 2018

Awesome work Christine! I've only ever heard you play classical music in concert. Any plans to perform bits of your AI generated music live. Perhaps with Ensemble SF?

Also, I noticed your data format has a flag for instrument type. Have you considered generating for voice? Obviously a very different beast but it seems the same principals could apply. It would be important to restrict the music to a model of what a human is capable of to make it singable. Adding physical constraints to the piano generated music might also be interesting. Fingers are so long and there are usually only ten.

skissane · on Nov 29, 2018

Has anyone done work on automated evaluation of the quality of a musical composition? Possibly by training a neural network, or maybe even just by designing some heuristic rules which try to capture what elements make music pleasing to humans?

Then, could you train a neural network (or a genetic algorithm, or whatever) to compose music that is assigned a high quality score by such a composition quality evaluator?

big_t · on Nov 29, 2018

I actually just recently took a shot at something very similar to this for my undergrad thesis! [0]

I used genetic algorithms to generate 4 measure melodies, using a long short-term memory (LSTM) neural network to determine the fitness of melodies. I trained the LSTM on snippets of music by J.S. Bach. It was able to distinguish between random noise notes and actual music quite well, and to a somewhat lesser degree between Bach and other composers.

The melodies it produced were...mixed in quality. I really liked some of them, but quite often it would get stuck at some local maxima of the fitness and couldn't mutate its way to something better.

[0] https://github.com/ThomasMatlak/is-software/tree/master/gene...

citnaj · on Nov 28, 2018

>"More recently, there is a shift towards using a Transformer architecture, and right now I’m experimenting with that as well."

I'm really curious- any early results to share on that? Attention really does make a big difference on a lot of things (including work I've done so I know first hand). It should improve the coherence of the entire music piece in theory at least, right?

kasrahbar · on Nov 28, 2018

Check out Music Transformer that was recently published https://arxiv.org/abs/1809.04281

Some generated samples: https://storage.googleapis.com/music-transformer/index.html

bcheung · on Nov 29, 2018

The accompaniment examples are cool. That would be a very nice tool to have. Just play a melody and it auto-generates an accompaniment.

mcleavey · on Nov 28, 2018

Transformer is working really well- I'm very excited. I'll probably be sharing results soon. Yes, the attention makes a huge difference & the pieces are both more creative and more coherent.

gwern · on Nov 29, 2018

Have you considered using 'learning from human preferences' as the loss function in addition to the Transformers? That was another OpenAI project, and it seems tailor-made for music generation: what is more 'I know it when I hear it' than music quality?

citnaj · on Nov 28, 2018

That is exciting! I'll be watching on Twitter :)

minimaxir · on Nov 28, 2018

textgenrnn (https://github.com/minimaxir/textgenrnn) uses a simple Attention Weighted Average at the end of the model for text generation, which in my testing allows the model to learn much better.

gleenn · on Nov 28, 2018

Reminds me of this pretty cool music generation talk at StrangeLoop. I can't find his own site, but here's the SL page: https://www.thestrangeloop.com/2018/making-machines-that-mak...

scottlocklin · on Nov 28, 2018

LZW does a creditable job as well.

https://arxiv.org/pdf/1107.0051.pdf

EADGBE · on Nov 29, 2018

Better not have any parallel fifths in there or I swear to god my Theory professor will come out of his grave and berate the AI for it.

p1esk · on Nov 28, 2018

Hi Christine, what are your thoughts on using reinforcement learning for music generation? Has anyone tried that at OpenAI?

robbiemitchell · on Nov 28, 2018

Need some Chopin to help train the full 88 keys!

mcleavey · on Nov 28, 2018

Haha, very true. It was Charlie's question in this interview that made me realize the 62 key limit was an old fix that I no longer needed, so now I'm trying out expanding my dataset and also expanding to the full 88 keys!