DTW is quadratic. While in grad school I worked for a bit with a team interested in doing massive speech recognition using DTW so they did some work speeding up the algorithm using a technical called Locality Sensitive Hashing [1], [2], [3]. It might be worth a look in order to speed your algorithms.
[1] http://www.academia.edu/2600658/Indexing_Raw_Acoustic_Featur... [2] http://old-site.clsp.jhu.edu/~ajansen/papers/IS2012a.pdf [3] http://www.cs.jhu.edu/~vandurme/papers/JansenVanDurmeASRU11....