Hacker News new | past | comments | ask | show | jobs | submit login
Simple Haskell webapp: Generate random tweets using Markov chains (jaspervdj.be)
55 points by jaspervdj on Jan 8, 2011 | hide | past | favorite | 30 comments



Using markov chains is fun until it all goes wrong. My Twitter bot sadly seems to have become infatuated with Justin Bieber sigh

http://twitter.com/markov_chains


Not surprising, considering Justin Bieber uses up 3% of Twitter's infrastructure.[1]

[1] http://mashable.com/2010/09/07/justin-bieber-twitter/


Running @FoxNews into it:

  #UN Security Council, General Assembly evacuated due to hold joint military exercises
"New Tax Bill: DADT" was another good'un.


"TSA releases all 24 hostages after Kim Jong Il introduced son as House member"


Running @Jesus through it, I am informed he is the scatman.


Reminds me of Mark V Shaney.

"I spent an interesting evening recently with a grain of salt." http://en.wikipedia.org/wiki/Mark_V_Shaney


This is a lot like an app I wrote a few years ago for a programming contest. It uses the same technique to generate random invention descriptions using patent application abstracts: http://eurekaapp.com/ (Yes, I realize how unreasonably slow it is)


Hey that's cool! I imagine that you probably caught yourself, a time or two, actually trying to read and understand what the "invention" is. As I just did.


I usally get head pain from this sort of things, because I almost understand what it says but I not quite.


> Maple says nlognlogn, which could solve NP party?

And you are all invited.


It seems that it needs to sample more tweets. I've done 5 tweets, and 2 of them were exact tweets that I sent (they were only a few words long, and I guess they had words that I rarely tweet, so it had few or no other ways to go once it started repeating the tweet)

Edit: Then again since there are only a few words in a tweet, you'd have to go a really long way back to really ensure that won't happen. Possibly farther back than Twitter will let you.


It's more fun if you use it on politicians or celebrities. You get strange alternative world announcements.


It generates downright disturbing results when you apply it to http://twitter.com/Othar

Oslaka died 2 kilometers in order to come from seafaring folk, so I do it. I spend twenty minutes left! He is hideously scarred. My host removes his face a lot in college.


Yeah, I use as much tweets as twitter can give me in one request. Any more would probably be a too long delay for the end user.


This reminds me of an old Perl script I made with Markov chains for a very similar sort of random nonsense text generation.

I think I fed it some text from a few usenet kooks/conspiracy theorists and something like Alice in Wonderland and got quite a few laughs a long time ago, though it was made to allow you to combine arbitrary texts into a single chain.


@jennyholzer:

OFTEN AS OFTEN AS OFTEN AS OFTEN AS POSSIBLE

Very apt, but what n-gram length is being used? n=1 is my guess, since "as often as" is a common English construct. Obvious feature request: tweakable lengths.

edit: I'd make the fix myself and send a pull request but I don't know haskell and am too lazy to figure it out.


n=1 is indeed being used. The problem with a larger n is that you get original tweets really often, because the dataset is limited (as much tweets as you can get in one request).


Along those lines, I've been playing recently with using the google ngram data for markov chaining. The size of the corpus allows using 5grams without the problem of seeing text that has actually been written before (mostly.. it could just decide to spit out the complete works of shakespeare any second!), and the results were more interesting to me than most markov chains I've seen before. http://kitenet.net/~joey/blog/entry/dadagoogoo/

Random example 1: nothing had pleased God to bestow upon you as to participation in physical activity and exercise . Don ' t rain .

Random example 2: sad and terrifying each time . After a quick nap .


are you using smoothing for large n? kneser-ney smoothing seems to give the best results.

http://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tut...


I did not read the paper, but what does more accurate mean in this case ? Likelihood of some unseen data ? Seems pretty hard to define or measure to me, if the goal is shear amusement.


Interesting link, thanks!


I'll leave this here, since it's loosely on-topic (of Markov chains:) http://www.joshmillard.com/garkov/ — Garfield strips with MC-generated text instead of the original. I've had countless hours of fun with them.


This is hilarious! Thanks for putting it up, I'm laughing my arse off at what it's generating!


Unfortunately, if you do as I did and tweet what it generated, you can't use it again, as it would be eating its own output; and generating nonsense from nonsense is not so entertaining.


> And a new every day, I'll bring The first bill, since that was a beer Me too!. I don't, but some PHP servers aren't set up 64 pixels.

Pretty accurate stuff.


Cool! It's great to see real-world haskell in action.



How long until the link expires?


Until any of the following events occurs:

- the machine runs out of memory (the tweets are stored in a redis backend);

- someone (I) accidentally clears the database (this has happened before);

- zombies attack our datacenter.

But I'll try to keep them up as long as possible.


@allah

So, you've a direct connect to the One?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: