Hacker News new | past | comments | ask | show | jobs | submit login
Semantle: guess words based on Word2vec similarity scores (semantle.com)
117 points by cpeterso on June 3, 2022 | hide | past | favorite | 60 comments



A friend pointed this out to me a few weeks ago.

I found it was nearly impossible to enjoy because a single numerical distance metric provides almost no help in figuring out what direction to move to improve your guess, and any change in the distance metric when you're far away from the solution is really just a false signal - i.e. "mother" gives a score of 20, "grandmother" gives a score of "23", but grandmother isn't actually closer to the final word ("malfeasance") in any meaningful way than mother is.

Another issue with it is that the training set comes largely from newspaper articles, which means the semantics it learns sometimes have significant artifacts resulting from associations that are common in news coverage but not the English language in general.

As a result my own experience with it was "think of random guesses endlessly until you end up somewhat close and then be frustrated at what you learn about the embedding when you eventually converge on the answer."


It would be interesting if it provided a direction.

3.8 as "cat" is to "pig"


> the semantics it learns sometimes have significant artifacts resulting from associations that are common in news coverage

I think this is largely the point. The game is significantly easier (for me at least) when I limit myself to thinking about what words would have high co-occurrence in English print journalism.


> "mother" gives a score of 20, "grandmother" gives a score of "23",

Doesn't this make sense though? A small change like that means you didn't really move, so you didn't get further or nearer the solution. Grandmother and mother are pretty similar.


It feels to me like there is no way to "improve" short of blindly guessing synonyms. The space feels like concentric circles, where two words with similar word2vec scores move along the circumference rather than normal or even tangential.


Try Pimantle[1], which is Semantle with a point-cloud visualization of guesses. The visualization is only two-dimensional, but it's enough to prevent the "orbiting" you describe.

[1]: https://semantle.pimanrul.es/


What dimensionality reduction algorithm is being used? PCA? UMAP? Of course better ones are likely to give better 2D representations.

I've had this same idea to use the point cloud visualization to improve my own take on language model games - but I when I experimented with this, I found that even the best 2D representations were still quite bad in the context of trying to help out humans.


PCA destroys local structure at such a low dimensionality. So probably UMAP or T-SNE.


I do like this because you can see if the "path" you are on towards the center is going to get you close or if you are going to hit a dead end. Because it's weird some of the ways you can get relatively close to the word and then go nowhere.

I have also found it frustrating in semantle when I get something that is "close" but is actually close because it is a direct antonym of the word.


Only took me about 70 guesses including a dozen hints. I'm not good at thinking like these algorithms...


Ah, indeed! Not much data yet, but with my first pimantle just now, I was able to get it in 33 tries, instead of often over a hundred with semantle. It really helps to "see" how you are approaching (or not approaching) the word in one more dimension.


My computer is on fire trying to run this. But it's way better than semantle


Wrote a solver for this game.

https://crab.manimino.com/


That is brilliant. Just triangulates based on the distance to each word? Like GPS for the Word2Vec space?


You got it.


This is hilarious. And correct!


Semantle is a very cool concept but the similarity scoring is kind of whack sometimes. It feels like it needs to use a wider dataset? Or perhaps word2vec is not quite good enough for this.

I had an idea for a minor improvement where if you guess one word, behind the scenes it would check that word, the plural version, the past tense version, etc. And then show you the highest score from all of those. I find it very frustrating when one word has a certain similarity score but another version of the same word (pluralized, past-tense, etc) has a very different score, whether higher or lower. Some small tweaks like this might make the game much more enjoyable.


Yeah, the plural and variations on words are always a bit strange. It also throws me off when something that is the direct opposite is "close" because it is associated. Like you guess "dusk" and its really close so you try "twilight", "sunset", "night", etc. but they are all further away and the word is actually "dawn".


Today’s word is actually really hard… it’s one of those words which is close to a whole lot of other ones, so it’s hard to understand the similarity scores from a human perspective. But I’ve played this game before, and the scores are usually much easier to make sense of.


One of the underrated features of the original Wordle is that the world to guess is selected from a short list (2,315 words) instead of the full dictionary (10,657 words). The short list is very well selected and it makes the game more fun to play.


Try Semantle Junior. I don't know exactly how it selects the "easier words" though

Edit: So the way he grabs words for the original is out of the 5000 "most popular" words in English. Still not sure about the Junior version, maybe it is an even more restricted list. Or it looks like it might just be restricted to nouns since it does seem to be easier to guess things close to a physical item.


Not as bad! Got today's puzzle in 18 guesses.

https://semantle.com/junior


I think semantle also chooses from a pretty short list of words. All the answers were very common words, while in the "closest word list" you can see all kind of crazy words like "ethnocracy" and "coextensive."


Main semantle. My mum got today's answer in 53 guesses, my sister in 157 and myself in 163 but only after mum ruined it by trying to give me a hint!!

Mum turned 80 yesterday, sister turned 50 2 weeks ago and I'm 55.75 now.

5 June 2022

We play it like the 20 questions car game from the 1970's, by starting with smalk, mineral, vegetable, but have added a few more class headers, as we call them, chemical, emotion, transport, colour, human, etc....

Junior is usually played the same with easier words, so can be found quicker, Eg I always play animal as my first word and the other day that's what it was so boom! found it in 1 guess!

Hooked!!


I was able to piece together the word, though I was surprised when I was right. Without spoiling the answer, I did sort of walk down the path to the answer in an almost logical way


What the heck? Is there actually any relationship between these words? I got to within 900/1000 today in 33 guess, and was SOOO far off from the answer that after 30 hints I gave up. Not sure I understand the rules.


Algorithm: Words are similar if they're used in similar places.


It's stupid really... my word was "ours" and it gave me a high score on "progress" but a low score in "vouch".


I first found "god" and "heaven" were among 1000 closest words, but didn't get what was going until I put "sake" (for god's sake). I realized this is a very common word and possibly also in common idioms and I eventually arrived at "everything", which was probably within top 10 I think. But it took me another 30+ trys to finally figure out that it was just a pronoun.


It's the same word for everyone once a day. So you just spoiled it for anyone who hadn't gotten it. Just FYI in case you want to edit.


Wow, I thought I had something going, and was even within the top 10 similar words, but in my opinion they are not very closely related at all.

Heres the the top list for me: https://imgur.com/AqmQl6A


I could see how this might be fun, but today's word (I won't spoil it here) seems ill-advised for such a puzzle. When I saw the hint it made me laugh out loud. Why would the creator choose such a word (or such a hint)?


No, this game is not fun. I've played it quite a few times and found it addictive, but not in a fun way. I had to force myself to stop playing it because I realized it was just frustrating and un-fun and you never get any better at it because there's no way to improve your reasoning about the similarity scores or ever become truly "good" at it.


That's because the words arent related in meaning; they're related by how often the words-as-symbols cooccur in a bank of historical documents.


It's based on news articles, so I've found it helpful in the final stages to switch from guessing words with similar meanings to guessing words you might see together in a headline. For example if you've got "rescued" you might go in the direction of "mountain" or "accident" rather than "assisted," "helped" etc.

But also I stopped playing because it wasn't fun anymore.


The hints definitely aren't manually chosen — they're just a word some distance closer to the target than you've gotten so far. And I wouldn't be surprised if the targets are randomly picked from the word2vec dataset, too.


Perhaps they should be manually chosen...the hint I received was a word that is semantically empty, but apparently co-occurs with today's word.


I highly recommend Redactle as a similar concept that is more fun: https://www.redactle.com/


Whoa, I really liked this. Even though it was super hard, and once I had a vague idea of the subject, I was entering relevant words from a list I looked up (before I knew that guessing the subject ended the game - I'm bad at reading directions.) But yeah I had fun and lost like an hour to that.


I wrote a set of "language games" using word embeddings. Very similar to what's shown here.

https://github.com/Hellisotherpeople/Language-games

Also, you can use extensions to word embeddings, namely, sense2vec (https://github.com/explosion/sense2vec), to supercharge games like this!


playing this game is akin to starving yourself for a week just to finally have a meal that "tastes better"


Oh cool! I did this in unity for a game jam. I called it Lexicode. I used conceptnet numberbatch.

https://greatfilter.itch.io/lexicode

I am learning c# as I learn unity, so it's very rough. But I am proud of the hint system I devised.


Mild spoilers...

How can beautiful be semantically close, but gorgeous is cold? Great is close, but excellent is cold?

What the heck do "very" and "great" have to do with the solution, semantically?


Maybe you're not looking for an adjective that describes extremes.


In the end it wasn't an adjective at all. Or remotely related to any of the "closely related" words. Ha!


I think the target word vocabulary needs to be limited to exclude super abstract generic words. Today's seems to be within a few points of anything, we, and yours.


Please add a practice mode. This will just go into the long list of Wordle clones that I will never play because it's only one a day.


I really thought there was no hope for me but I started getting closer and got it in 11. Really enjoyed that game!


@dang something curious happening here. I commented yesterday, but now it looks like I commented just an hour ago?


Confirmed. If you look at your comment history the correct timestamp is there:

https://news.ycombinator.com/threads?id=mwcremer

Also, this post says it’s few hours old, which makes no sense given you commented yesterday. Though the submission history for the user has the correct timestamp too:

https://news.ycombinator.com/submitted?id=cpeterso


Some links are given a second chance (happened to me once with dang contacting me directly to give a link to repost and give it some front page time). I'm guessing resetting the comment timestamps to be relative to the new repost makes it look more coherent.


I don’t understand the scores. “Opened” gives me -0.65, which I guess is really low. “Closed” gives me -2.11.


As I understand it, "opened" and "closed" are semantically similar, because they're in the same category of word -- they both finish the sentence "The door is ___." "Evangelize" is more different from those two, because it won't show up in the same sentences very often.


This game is brutally hard, but I feel like playing it improves my vocabulary. Once, I even beat it!


Way too hard


This is one of those "enjoy the journey as much as the destination" games. I like seeing the green bar go up even if I don't get it all the way.


This is really hard. I love it


If today's word is too easy, Randall Munroe calculated some of the most difficult ones to guess:

https://twitter.com/xkcd/status/1522260318512156672


Clever game lol


For humans, probably better done via word cooccurence on Wikipedia, or, to make it much easier, give two Wikipedia articles and pick which has more of the word.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: