I'm interested that you used some Christmas songs as training (which wasn't obvious from what I read of the paper). Were they pop songs, traditional, or a mix?
Further to my comment up there[0] - and I don't wish to sound a grinch because this is a really cool project - but would I be right in thinking you spent more time on the image description than the music?
I saw that you specify a scale for the melody, would it be either possible to use a mode to generate the accompaniment around, so that the melody can move diatonically and risk too many clashes, or to allow the melody to follow the chord sequence somehow?
Again, sorry if I sound too critical. It's a really awesome thing you've done, and I'm just a guy that listens to the music instead of the lyrics.
Thanks for the comments! Are you asking the lyrics or music generation?
For lyrics, we actually didn't train on Christmas songs. Training data was a large collection of romance novels. (See neural-storyteller by Jamie Kiros). The "Christmas trick" we did was applying a "style shifting" after image captioning and before lyrics generation, where the shifting vector was obtained from ~30 Christmas songs.
For the music generation. Although we are aware of some basic music performing rules, such as melody following chord etc, we actually didn't add this kind of rules.
For the blues scale here's the thing. I didn't really know much about music, so I spent several hours reading things like basicmusictheory.com. It happened to introduce blues so we just used it. But you're right on the relevance between blues and pop: only a very small percentage in our pop music collection is blues, after we ran the scale-checking code.
Thanks for the reply! I was concentrating on the music specifically. I thought the lyrics generation was really enjoyable.
I was asking more if you'd used any traditional carols, as they can have a more definitively "christmassy" sound than a pop song with sleighbells laid over the top.
Overall I meant that I think the music would be more convincing either following the chords in the melody, or sticking to a single mode for both melody and accompaniment.