Hacker News new | past | comments | ask | show | jobs | submit login
Finding Similar Music Using Matrix Factorization (benfrederickson.com)
134 points by sebg on May 10, 2016 | hide | past | favorite | 29 comments



Personally to me, these methods of finding music don't work. I believe it's because most people tend to listen to music in a genre.

Because the above method doesn't work, I have done correlations between different artists I like and the music recommended for that artist. (Example Arists A: Similar B C D; Artists E: Similar C D, then it would recommend C=4 point, D=4 points, B=2 point)

This method sort of works, but it mainly yields music I find 'acceptable' not music I find good.

Music should IMO be recommended using genre/tags(artists, year, ...) as well as:

(rather long list:) tempo, complexity of the music, instruments, amount of instruments, how monotone or varied to music is, then there is music that uses notes to keep you in short suspense and others in long suspense, general sound of the music (rock would be 'rough' while violin would generally be 'smooth'), music patterns, music pattern genre, ...

And then you need to train a small neural net per person to figure out witch of all these features is important to the person you are recommending music to.

(Edit: added to list of features to look for in music)


The approach you describe is one of many. Traditional collaborative filtering using matrix factorization will "discover" the features you listed, or other non-intuitive features, from patterns in the data. They don't always need to be listed explicitly, and training individual asymmetric models (like your one NN per person) isn't always ideal. Per person models won't make a lot of sense unless each person has rated a large number of songs, which usually isn't the case.

In general, methods like this require an epic amount of data to work well for consumers. The Netflix Prize made a dataset with 110 million ratings available, and that was barely enough to make impressive predictions. The issue here isn't the statistical model being used, but the paucity of training data. If you trained this or any other model on one billion rated songs, the quality of the recommendations could blow your mind. They would however still skew toward the average musical tastes.


Ow the NN would be rather small though: like 1 or 3 layer kind. The bulk of the identification has to be done before. Although music patterns could be a problem.

And the skew toward the average musical taste is exactly my issue. Finding music is more like playing bingo to me.

Disclaimer: I haven't used any note-wordy NN yet, so this might still required a ton of input. But if the NN could start recognizing thinks like for example: likes piano with male singer, but a penalty when it includes a violin in these genres (stupid example) it would IMO be an improvement.


>Traditional collaborative filtering using matrix factorization will "discover" the features you listed, or other non-intuitive features, from patterns in the data.

I don't think that has turned out to be the case. Most people's listening habits follow genre, and just don't have that much latent structure to extract.


I think Spotify's "Discover Weekly" playlists get pretty close to this. I really, really look forward to each week's new playlist. I listen to a wide range of genres, and completely agree with you that most discovery tools try to pigeon hole you into one or two. Spotify seems to pick up on other associations. I wonder how much neural net stuff they're doing with their big data.


I was going to say the same, Discover Weekly frequently really impresses me with some of its selections and I have discovered some excellent artists through it. I'm into electronic music - not what I would consider really obscure stuff, but certainly not mainstream - and many other recommendation systems seem top lump anything electronic together with mainstream "EDM" etc. (which is nothing like what I enjoy) whereas Discover Weekly makes intelligent recommendations of quite obscure stuff.

I assume it uses some combination of social data (e.g. music listened to by people who listen to similar artists as you) and the intelligent classification ability they purchased when acquiring Echo Nest. There was quite a good write up about it - I think it was one of these two: http://www.theverge.com/2015/9/30/9416579/spotify-discover-w... or http://qz.com/571007/the-magic-that-makes-spotifys-discover-...

I just wish it would save my previous weeks' playlists as sometimes I forget to listen to them and then they're gone!


There's an IFTTT recipe for that! https://ifttt.com/recipes/311873-discover-weekly-archive (only discovered this yesterday)


Ah amazing, thank you! I had been doing it with https://rocketgraph.com/reports/21-discover-weekly-archiver but the archived playlists live within their web app rather than Spotify itself, so I never actually looked at them!


EDIT: Oops, somehow missed that The Verge article on how Discover Weekly works had already been posted! It works on a variation of PageRank, where instead of links on pages, it uses songs in saved playlists. (If you have a playlist with lots of songs similar to someone else's playlist, you might like the songs in their playlist too. If a song is on lots of playlists, it's probably noteworthy, etc.)


While it's common to use things like genre, year, musical characteristics, etc, social dimensions may be at least as valuable. Exanples are artists performeing at the same show, collaborating on a project, maintaining personal friendships, signing to the same record label, moving to another group, etc. Most of the music I've found and enjoyed were through these types of connections.


That sort of describes the Pandora approach. Each song has a set of "genes" that describe its tone, musical instrumentation, key signature, time signature, rhythmic syncopation, etc. these genes are then used to find similar music. I've found it to be quite pleasing to discover new music when it's seeded with a relatively uncommon song that I like.

For instance, I added "Going Through Changes" by "Army of Me", and its recommendations included songs from bands I'd never heard of and actually enjoyed, like Radford, Red (a Christian band, one, as an atheist, I wouldn't ever have browsed on my own, but ended up really enjoying), Maxeen, Black Lab, and Brand New. Each had "genes" that made their algorithm align them with the seed song.

I don't use it much right now - I find myself in a cycle of discovery and then stubborn re-listening of the same four or five albums - but I expect I'll go back to it the next time I'm back in that exploratory mood.


The idea behind Pandora is neat, but I've unfortunately found its actual dimensions insufficient. It always ends up drifting towards mediocre generic pop crap, no matter how far away I start or how carefully I curate. And it doesn't account well for complexity or mood.


Agreed. I think suggestions should be based on mood and emotion.

For example if I like 'dark' music I might as well like a song from a techno artist and a classical orchestra.

But this can be very complex. What some people find aggressive music others find a little too soft.

EDIT: by the way: because of the above I also think a recommendation for a song is better than for a band.


I think that for me the approach in TFA would work, if it was applied to songs rather than artists. There are many artists that have a couple of songs I like and a lot of ones I don't care for. Artist-based recommendations go terribly wrong for me for this reason, I think.


What you're describing is is similar how Allmusic categorises artists (categorised? I haven't used it a lot in the last few years, so I don't know how much it's changed). I used to find most of my new music via it. It was in no way algorithmic - it didn't make suggestions, it was just a case of manually following links (given that part of the pleasure comes from searching and discovering independently, no bad thing). It had genres, mood, influences, influencees, etc at the artist level at least. It would seem that Rovi's API could be better leveraged to provide good suggestions than it currently is, if all that metadata is still being attached (though there are things like Spotify's Discover Weekly that I've found to be surprisingly good which I assume use it).

[It would have been even better if that had been expanded to album/song level. As the sibling comment suggests, categorising by subject matter would be very useful for a not-insignificant number of people as well.]


I've long wondered about other psychological favtors that come into play as well. I wonder if each time I give a song a thumbs up, say, during radio play, they record features like day of the week, month of the year, season, the weather that day, time of day, maybe a geographic classifier like city/suburb/wilderness. These are all things that could be used to make localized, time dependent and situational recommendations. I like a lot of the 'mood' playlists Spotify offers as well, but I've used players in the past where you have to choose between very similar and highly subjective things like calm vs tranquil to populate a mood based list, and I wonder if features I mentioned could make this a more passive process on the user's part.


I like the Songza (rolled into Google Play Music) themed playlists, eg "It's Tuesday Morning , play something for focusing/working out/spring cleaning". Usually there is something within one of the suggested categories I'll listen to.


Add vocal to this list:

There is a lot of great music I don't listen to because it comes with the performers problems, complaints, and political beliefs attached.

Some people don't care but for those of us who do it would be great if there was a filter for spoken words.


I am quite interested in this topic. There is some research on that field. Some links I collected in the past:

http://thesis.flyingpudding.com/

http://dynampd.ubitux.fr/

http://erikbern.com/2015/09/22/presentations-about-spotify-m...

http://erikbern.com/2015/09/24/nearest-neighbor-methods-vect...

http://www.wired.com/underwire/2013/08/qq_netflix-algorithm/

http://www.slideshare.net/erikbern/collaborative-filtering-a...

http://benanne.github.io/2014/08/05/spotify-cnns.html https://news.ycombinator.com/item?id=8137264

http://www.playdar.org/ https://github.com/RJ/playdar-core http://news.ycombinator.com/item?id=3876724

https://github.com/bmcfee/librosa

http://forever.fm/ http://blog.petersobot.com/introducing-forever-fm

http://musicmachinery.com/2011/05/14/how-good-is-googles-ins...

http://musicmachinery.com/2012/11/12/the-infinite-jukebox/

Btw., as I am developing my own music player, my main important feature/goal is a kind of automatic DJ which automatically selects music and I also especially had a discover mode in mind. However, that main feature did not evolve that much so far because there were so much other things to implement first. For now, it only supports access to files in your file system, and it looks on tags and artist and adds quite some randomness to it.

https://github.com/albertz/music-player/blob/master/WhatIsAM...


I often find that artists who I like tend to work together (this is especially true in hip-hop and electronica, but also in rock music as well through side projects). I've found some of my favorite albums this way; and it's weird esoteric stuff that other people tend to like (just harder to find unless you know to look for it).

Is there an algorithm that suggests music based on the degrees of connectedness to the individuals who made the music? I discovered Kanye West long, long before he made it big because he did a guest track on a label-produced mixtape. I said "man this shit is dope" and tried to track him down, but was only able to find other guest tracks on like a Mos Def album and a Common album (this was in like 1998, 5 years before he released a solo album).

I really miss that way of finding music, because it introduced me to a lot of artists who were similar enough I would like them, but different enough that genres started to morph into one another. A good example there would be Nine Inch Nails vs. How To Destroy Angels - Trent Reznor is behind both of them, but HTDA has a much more haunting, ambient sound compared to the sharp edges and distorted beats of NIN.


PageRank is analogous to this type of music discovery if you created a graph based on artists who have collaborated together. It'd be about the same as having a teleportation vector that always restarts the random walk back at the artist you're wanting to explore around.


A class I took this fall covered recommendation on attributed graphs. On of the research papers on the reading list for the topic was very interesting: it extends the matrix factorization algorithm on a bipartite graph (of users and soundtracks, for eg.) to more general graph structures (which could include soundtracks, artists). I think this exploits local connectivity between artists much better than the vanilla matrix factorization algorithm.

[1] Yu, X., Ren, X., Sun Y., Gu, Q., Sturt, B., Khandelwal, U., Norick, B., Han, J. (2014) Personalized entity recommendation: A heterogeneous information network approach. in J Proc. 2014 ACM Int. Conf. on Web Search and Data Mining (WSDM’14)


I did a similar thing for music/movies/directors/actors using word2vec which is also a form of matrix factorization[1]. (Shameless plug) You can try it at http://rerecommender.com

[1] http://www.cs.columbia.edu/~blei/seminar/2016_discrete_data/...


Cool stuff, did you look at using Latent Dirichlet Allocation to perform the categorization?


LDA actually works better for this in my experience. That is if assessed by the median distance to leave-one-out from the ones liked by a user in the test set.


A self-learning, genre agnostic music recommender with a nice interface is http://www.gnoosic.com

It has been learning on it's own for years now and works pretty well for me.


Nice, except it is artist-based, not song-based.


Oh, I thought this was talking about looking through the actual notes of the music to find patterns


Rammstein -> Command & Conquer Red Alert 3 :P




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: