Twitter Bot Finds Anagrams of Twitter Statuses

Igglyboo · on Aug 13, 2014

"Mustache got thicker" vs "git checkout hamster"

This bot is pretty funny.

Anyone know how it works? I'm assuming it just sorts the string and puts it in a hashmap/table and looks for collisions.

duiker101 · on Aug 13, 2014

I found "another math genius" vs "He ain't smart enough." pretty funny

MiguelVieira · on Aug 13, 2014

Probably just takes tweets, canonicalizes them, and then hashes them based on a 26-length vector of character counts. For every new tweet, it looks for old tweets with the same character count.

breyten · on Aug 13, 2014

The source is on github : https://github.com/cmyr/anagramatron

solutionyogi · on Aug 13, 2014

I thought this was deep:

Go All out or Die trying.

v/s

R u really going to do it?

subir · on Aug 13, 2014

This particular one had me in splits (:

madcaptenor · on Aug 13, 2014

I'm a bit surprised that there are anagrams to be found. It's easy to find them if they exist, but there's no guarantee at all that there actually should be collisions.

Strilanc · on Aug 13, 2014

Fermi estimate time!

Anagrams are just sentences with the same letter counts. The anagrams they're posting have 25ish letters... how many ways are there to distribute 25 balls into 26 bins? (25+26)!/25!/26! is ~250 trillion. The birthday paradox square roots that down to ~10 million, and the fact that we prefer some bins (fewer Zs, more Es) probably cuts it down even further to ~1 million.

So one anagram per million short tweets; hundreds per day. Doesn't seem too unreasonable.

madcaptenor · on Aug 14, 2014

Quick sanity check: most of the anagrams there are from short tweets, as you'd predict.

DanBC · on Aug 13, 2014

I'm not a statistician.

Is it really that surprising? English has plenty of redundancy; Twitter statuses have limited length.

What's surprising to me is the niceness of the found anagrams. "another math genius" / "he ain't smart enough".

tchalla · on Aug 13, 2014

> What's surprising to me is the niceness of the found anagrams.

That's because they are manually curated [0]

   Q: Is this manually curated?

    A: Mostly for issues of volume ( there are a lot of variations 
    of 'goooood mooornnniinng!', there are a lot of spam bots 
    posting subtely different versions of the same message, etc) 
    the bot doesn't automatically post every anagram it finds. 
    Essentially there's an iphone client that reviews matches, 
    which are manually approved or rejected.

[0] https://github.com/cmyr/anagramatron

madcaptenor · on Aug 13, 2014

I am a statistician. Maybe I should sit down and actually do some calculations.

sillysaurus3 · on Aug 13, 2014

Please do. I'd be interested.

sirclueless · on Aug 13, 2014

It's actually extremely likely. The chance that any two statuses are anagrams is miniscule, and even the chance that a particular status has an anagram among all other statuses is probably small, but the chances that there are no collisions at all is tiny.

See a description of the Birthday Paradox[1] for the mathematics behind this. For example, if you put 70 people in a room, there is a 99.9% chance that two people share a Birthday.

[1]: http://en.wikipedia.org/wiki/Birthday_problem

joopxiv · on Aug 13, 2014

I find it interesting that they're manually approving the hits, because, as they indicate, most hits are (nearly) identical.

It shouldn't be too difficult to solve this automatically though. Identical hits can be discarded very easily. The ones that only have a few words or letters reversed can be detected with some kind of similarity algorithm.

jcampbell1 · on Aug 13, 2014

I had a look at the source code, and it does quite a bit of filtering, particularly around making sure the words are unique, and there is a primitive character comparison algorithm.

The code could be simplified by using Python's set() and improved by doing a copy'n'paste on a Levenshtein function.

cmyr · on Aug 13, 2014

oh hey yea that would've been useful. ^_^

bussiere · on Aug 13, 2014

Nice one :)

I love when people use programming to play with words.

Sad it's english only. I may work on a french version.

But really nice idea.

cmyr · on Aug 13, 2014

hey, author here: It's english only mostly because of volume, and because I review results. Making a french language version would mostly just be a matter of hosting. If you're interested let me know, I'd love to help you out.

bussiere · on Aug 13, 2014

maybe as a side project.

But i have some friends who makes rap and there is some diamonds that i've found with your bot :

=

I want to see this world change.

Let's see what I can do right now

=

And :

=

you have destroyed me

do you deserve my hate?

=

i keep it in mind but it will not be before six months.

I will make a pull or notify with github, i code also in python.

In french we have some software to find rimes.

Putting the finding in a database could be a nice addition, i could help to compose text.

You have my admiration for the idea and the execution ...

fjcaetano · on Aug 13, 2014

This is delightfully ironic

bengali3 · on Aug 13, 2014

Oily Shirted Filch Linguist