One of the challenges here is handling homonyms. If I search in the app for "kin...

itronitron · 2024-04-17T19:02:57 1713380577

>> If I search in the app for "king", most of the top ten results are "ruler" icons

I believe that's the measure of a man.

charlieyuan · 2024-04-17T18:59:07 1713380347

Good call out! We think of this as a two part problem.

1. The intent of the user. Is it a description of the look of the icon or the utility of the icon? 2. How best to rank the results which is a combination of intent, CTR of past search queries, bootstrapping popularity via usage on open source projects etc.

- Charlie of v0.app

joshspankit · 2024-04-17T22:05:31 1713391531

This is imo the worst part of embedding search.

Somehow Amazon continues to be the leader in muddy results which is a sign that it’s a huge problem domain and not easily fixable even if you have massive resources.

Aachen · 2024-04-19T16:42:45 1713544965

I don't seem to have this issue on any other webshop that uses normal keyword searches and always wondered what Amazon did to mess it up so much and why people use that website (also for other reasons, but search is definitely one of them: no way to search properly). The answer isn't always "massive resources" towards being more hi-tech

But, thanks, this explains a lot about Amazon's search results and might help me steer it if I need to use it in the future :)

dceddia · 2024-04-17T19:58:54 1713383934

I was reading this article and thinking about things like, in the case of doing transcription, if you heard the spoken word “sign” in isolation you couldn’t be sure whether it meant road sign, spiritual sign, +/- sign, or even the sine function. This seems like a similar problem where you pretty much require context to make a good guess, otherwise the best it could do is go off of how many times the word appears in the dataset right? Is there something smarter it could do?

anon373839 · 2024-04-18T00:07:09 1713398829

Wouldn’t it help to provide affordances guiding the user to submit a question rather than a keyword? Then, “Why are kings selected by primogeniture?” probably wouldn’t be near passages about measuring sticks in the embedding space. (Of course, this idea doesn’t work for icon search.)

feoren · 2024-04-18T19:05:04 1713467104

Only if you have an attention mechanism.

lubesGordi · 2024-04-18T13:25:20 1713446720

I think this is the point of the Attention portion in an llm, to use context to skew the embedding result closer to what youre looking for.

It does seem a little strange 'ruler' would be closer to 'king' versus something like 'crown'.

bryantwolf · 2024-04-17T19:20:34 1713381634

Yeah, these can be cute, but they're not ideal. I think the user feedback mechanism could help naturally align this over time, but it would also be gameable. It's all interesting stuff

jonnycoder · 2024-04-17T20:43:06 1713386586

As the op, you can do both semantic search (embedding) and keyword search. Some RAG techniques call out using both for better results. Nice product by the way!

bryantwolf · 2024-04-17T23:48:31 1713397711

Hybrid searches are great, though I'm not sure they would help here. Neither 'crown' nor 'ruler' would come back from a text search for 'king,' right?

I bet if we put a better description into the embedding for 'ruler,' we'd avoid this. Something like "a straight strip or cylinder of plastic, wood, metal, or other rigid material, typically marked at regular intervals, to draw straight lines or measure distances." (stolen from a Google search). We might be able to ask a language model to look at the icon and give a good description we can put into the embedding.