Hacker News new | past | comments | ask | show | jobs | submit login

You're on the right track with the recursive dictionary search.

For the Sling Blade Runner problem, I would make a trie (http://en.wikipedia.org/wiki/Trie) out of hash tables, like this:

    trie = {
     'dracula': {'': None,
                'dead': {'and': {'loving': {'it': "Dracula: Dead and Loving It!"}}}},
     'the': {'brides': {'of': {'dracula': "The Brides of Dracula"}}},
    # ...
    }
Then, for each word in each movie title, walk the trie from the beginning until you run out of matches (no match), or run out of words (at least one match).

Your anagram server is similar, but since they're not ordered like movie titles, you have to walk your scrambled word(s) and only generate permutations that have a matching trie path.

It's also quite possible to do it for phrases, not just words.

You can cheat a little by representing your anagram as a 26-key multiset containing the number of times each letter appears (in Python, use a dict; in C, use an array of 26 integers).

For each letter that appears in your anagram multiset one or more times, make a copy of the original multiset, and recursively descend the Dictionary trie, removing each matching letter from a new copy of the multiset (they're tiny, so don't worry). For each complete word you find, add it to a list of result candidates, along with the multiset representing the remaining letters in the anagrammatic phrase.

Repeat the process for each result candidate, growing your list into a tree. Stop when you run out of matches, or when you run out of letters in your multisets. (You can set a maximum number of leftover letters to accept, if you're having trouble matching a phrase).

If you test the letters in alphabetical order, your result phrases will already be sorted.




> Then, for each word in each movie title, walk the trie from the beginning until you run out of matches (no match), or run out of words (at least one match).

I think that's the challenge ITA wants you to solve - recursing 300 levels deep into a list of 1000 movies will literally take forever. As you can see with the 2nd linked article, brute-force just doesn't work.

My original algo for the multi-word anagrams was nearly identical to what you described. The problem was when you entered a long phrase. You can form 8000 unique words from the letters in "William Jefferson Clinton." Even when I removed duplicates words with the same characters (art = tar = rat), it was still 7000+ long. Add in another 'r' or 's' to the original phrase and you're over 10,000 unique words. If I'm allowing my algorithm to make 5 word anagrams, that's 10^5 calculations for a single web query - absolutely impossible.

And even if I do magically return the 100k possible results in 0.1s, users can't parse through 100k results to find the ONE anagram that's funny or creative. I do have an idea of what I want to do. Maybe in a week or so, I'll have a demo working.


I think that's the challenge ITA wants you to solve - recursing 300 levels deep into a list of 1000 movies will literally take forever. As you can see with the 2nd linked article, brute-force just doesn't work.

You're not recursing 300 levels deep; you abandon a subpath as soon as it doesn't match.

As for the anagrams, that sounds about right (if you install an anagram generator like wordplay, you can see that it generates a similar amount of matches).

You may want to devise a compact representation for a partly calculated results list, and just show n per page, with "prev" and "next" links that encode the original phrase and the first or last phrase on the current page.

When the user clicks "next", you run the algorithm as usual, but at each step of the trie search, you only search letters that are greater or equal to the current letter in the last known phrase.

You may also train a classifier like CRM114 to recognize "funny" and "boring" phrases, and omit the boring ones (except in your "prev" and "next" links!)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: