The work of Nicolas Boulanger-Lewandowski was extensively focused on this topic, see his work [1]. He wrote a Theano deep learning tutorial on this topic [2], and several people (Kratarth Goel) [3][4] have advanced the work to use LSTM and deep belief networks.
For a brief while RNN-NADE made an appearance as well, though I do not know of an open source implementation
There are also a few of us who are working on more advanced versions of this model for speech synthesis, versus operating on the MIDI sequence. Stay tuned in the near future!
I can say from experience that some of the samples from the LSTM-DBN are shockingly cool, and drove me to spend about a week using K-means coded speech. It made robo-voices at least but our research moved past that pretty fast.
For a brief while RNN-NADE made an appearance as well, though I do not know of an open source implementation
There are also a few of us who are working on more advanced versions of this model for speech synthesis, versus operating on the MIDI sequence. Stay tuned in the near future!
I can say from experience that some of the samples from the LSTM-DBN are shockingly cool, and drove me to spend about a week using K-means coded speech. It made robo-voices at least but our research moved past that pretty fast.
[1] http://www-etud.iro.umontreal.ca/~boulanni/ [2] http://deeplearning.net/tutorial/rnnrbm.html [3] http://arxiv.org/pdf/1412.6093.pdf [4] https://github.com/kratarth1203/NeuralNet/blob/master/rnndbn...