Hacker News new | past | comments | ask | show | jobs | submit login

It doesn't seem that this approach "knows" the actual music. The article doesn't seem to explain how track embedding vectors are produced, but it mentions that user-action signals are of the same length, which makes me doubt track embeddings have any content-derived (rather than metadata-derived) information. Maybe I'm wrong, of course.

I doubt that any recommendation system is capable of providing meaningful results in absence of the "awareness" about the actual content (be it music, books, movies or anything else) of what it's meant to recommend.

It's like a deaf DJ that uses the charts data to decide what to play, guessing and incorporating listeners' profiles/wishes. It's better than a deaf DJ who just picks whatever's popular without any context (or going by genre only), but it's not exactly what one looks forward to when looking for a recommendation.




I think the entire idea is fatally flawed.

My experience is that the best music is found randomly. I like so much, I don't even know what I really like. Even what I like is always changing. I need to listen to ton of random things I don't like and I will find a small amount of gems. The absolute gold though is finding songs I didn't even know I would like.

The algorithmic version of sifting through records at a record store for a music lover is random. Random with an easy way to play the next song.

All these recommendation systems are just Satie's musique d'ameublement generators for non-music lovers. Furniture music generators, music to play during a dinner to create a background atmosphere for that activity.


So much this. Often times I found out a new artist making music in a genre I thought I didn't like leading to me starting to like that genre.

Other times specific song or music genre is relevant to me because of a moment in real life or from a movie.


This is a shortcoming of every music recommendation algorithm except Spotify and Pandora's. Spotify holds holds a pretty hefty patent portfolio of music classification algorithms and Pandora employs hundreds of music experts that spend an hour tagging each song.


Spotify's Discover Weekly seems to be a healthy mix of close enough guesses and random but not too random suggestions. Song radio is okay. The pop up recommendation for specific new songs/ albums feels so unrelated to my likes that it must be a sponsored recommendation. """Smart""" shuffle exclusively sends me whatever was popular on the radio in the last 5-30 years despite my listening habits being the opposite.

Pandora was much smarter, but seemed to run out of songs instantly.


Nearly 10 years ago, I was at a Spotify recruiting event and they told us how they did embeddings at the time.

They took all user generated playlists and projected the songs into vectors where songs that appear together on playlists are closer and songs that appear less often are farther.

It’s likely changed a lot since then, but it seemed like a pretty straightforward clustering system at the time.


co-occurrence. It's the real backbone of almost all recommender systems.

This is the same way YT/TikTok does it btw. Co-occurrence is king in recommender systems in production. It's extremely cheap to calculate and by far the most effective method.


That's just bais collaborative filtering. Drdaeman is talking about using the actual content of the songs in your vector embeddings.

This is not really important if you have a lot of user behavior data and/or playlists for each song. But if you have a niche song that few people of listened to, collaborative filtering based recommendations aren't going to be good.

Real semantic embeddings (which can then be part of the input to the recommendation model) can be trained using self-supervision, e.g. an auto encoder or a seperate "next audio token" predicting transformer.


I have more and more experienced, best aggregators are people. I really wish For You pages can get to that level.


A recommendation from a person you know takes into account not just their knowledge of your preferences, but also how much and in what way they like/care about you, and conversely, your taking of the recommendation is colored by your rapport with the recommender. All that is something a recommender system has no access to.

Or, more bluntly: you aren't going to mate with a For You page, so it doesn't have the same evolutionary cheat code to your preferences as other people have.


Sounds like a complicated way to make everyone listen to the same 10 songs eventually.


Complicated, or worryingly straightforward and effective? It really does seem that over time, this would compress the space of peoples' preferences - and since listening stats also feed into production and promotion - the space of music produced.


> I doubt that any recommendation system is capable of providing meaningful results in absence of the "awareness" about the actual content (be it music, books, movies or anything else) of what it's meant to recommend.

Most of the reasons people like music, or fictional movies and books, is personal, emotional, subjective, and difficult to articulate. You wouldn't know what data to collect. You're better off just asking them to rate song, movies, or novels out of ten. You can then compare their ratings with other people's, and what you'll find is there are clusters of people who rate things similarly (and others who rate things differently), and that the ratings they give overall somehow capture their feelings about whatever they listened to, watched, or read. (Source: I developed a movie recommendation system which predicted ratings reasonably accurately.)

Of course, if you just have sequences of user actions, like in the article, your recommendations won't be anywhere near as accurate.


> I doubt that any recommendation system is capable of providing meaningful results in absence of the "awareness" about the actual content (be it music, books, movies or anything else) of what it's meant to recommend.

Years of experience have proven that you can get quite far with pure collaborative filtering—no user features, no content features. It's a very hard baseline to beat. A similar principle applies to language modeling: from word2vec to transformers, language models never rely on any additional information about what a token "means," only how the tokens relate to each other.


A while ago I created a project that embeds artists on Spotify using word2vec: https://galaxy.spotifytrack.net/

It uses data about overlap in listenership between different artists to determine which artists are related to which others and how. The artists serve the same role as words in sentences.


it says the track embedding vectors are inputs, the music representations are probably learned in an earlier model, w2v or a two tower model.


> It doesn't seem that this approach "knows" the actual music. The article doesn't seem to explain how track embedding vectors are produced

That's the thing with transformers, right? It doesn't actually "know" anything about its inputs.

The embeddings are learned (initialized to random).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: