Oh lordy, I've got a thing to say about your Javascript.
You could keep the word pairs server side and retrieve them as the page is accessed... or at the very least if you're going to store all your data in a static file you could store them in an array to save space, instead of the whole links. You could get real fancy and make some kind of soundex map thing.
Question: did you just break each word into syllables, and find soundex matches?
No worries, though, I like the compulsive punning. I just like talking about things on the internet.
Is that a function with a 560000+ lines long comment inside of it, that gets turned into a string, then sliced, and split on newlines, then assigned to a global variable that gets used in another global function?
It took a minute just to open the source code in my browser. Chrome used 5.2GB of RAM while viewing the source code, just for that tab.
This is one of those things I would have just assumed was impossible, so never would have tried. Not that I should try something like this, mind, but it is a good reminder not to underestimate these machines we have.
Yeah! Just goes to show you can make a cool thing that is really quick and dirty under the hood.
You could make this page load instantly by making the random selection happen in the backend instead of in the browser, but then I guess it would no longer just be a static file.
You could also just have a compact array of pairs and then generate the sentences in the browser instead of having half a million pre-generated strings.
I feel that to really make this work you would need to filter words with common origins. Like, if one word appears in the dictionary definition of the other word, they likely aren't going to be funny.
Examples:
YOU PUT THE REFINE IN REFINERY.
YOU PUT THE DETECT IN DETECTS.
YOU PUT THE POLYTECHNOLOGY IN POLYTECHNOLOGIES.
But words with no commonality at all and a good rhyme are brilliant.
Hmm, rhymes don't really do it for me. I might even filter out rhymes if I built this. I've heard humans use this format often enough, and I don't think I've ever heard someone use a rhyme. It's just too easy. Putting "the cologne in colonialism" is significantly better.
Yeah, I think the currently allows 1-phoneme differences. I think it would be better if it only allowed 2-3+ phoneme differences when at the end of the word. -s -ies -y arent interesting suffixes.
There are also some that are just nonsensical but when you read a dozen of these in a row, they feel actually kind of funny. Just tunes your brain to a different wavelength.
This is great! Just wish it was possible to filter out uncommon words. I have no idea what half of them means or how they're supposed to be pronounced (non-native).
Maybe they should do a You Can't Spell X without Y snowclone generator next.
Detractors of the popular genre of music often trot out the old adage "you can't spell crap without rap", but what most people fail to realize is that you can't spell fish and chip shop without hip hop.
This reminds me of the time I needed to generate some names for testing, and I simply combined the Social Security list of most common first and last names at random. It turns out that the vast majority of possible name combinations are terrible. Not quite "Sleve McDichael" from the notorious Super Famicon game, but close.
Like this, I could read through a whole page of names before finding one that scans. Bennett Takeshita, e.g., is effectively unpronounceable.
I'm not positive what's going on here, but it seems like it's incorrectly detecting syllable divisions? If it thinks that the division is between "o" and "s" in the first and between "t" and "h" in the second, those results would make sense. I don't think anyone actually pronounces "reauthorize" like "warthog", though...
There really needs to be a filter to block these kinds of results.
I wish the word links would let me navigate the set of puns rather than going to google. X links should let me see all the combos that have the same word in the X position. Y links should do the same thing, mutatis mutandis.
Looks pretty great. If someone is looking to do something similar the NLTK has a library with mappings to the pronunciation or you can use soundex. I used the NLTK library for answering "How many rhymes are there in English"[1] but the phonemes were in a different format than IPA.
I don't know if there is something similar to the NLTK library with the mappings but for IPA
I noticed there's a few cases where a th sound puns with a t sound. For example, you put the OUGHT in ROTHMEYER. Intentional due to loose matching? Uncertainty in pronunciations? Or is this a bug where the θ phoneme is represented as the digraph th?
I've been looking for a good one that 1. includes variant pronunciations, and 2. distinguishes all phonemes, e.g. considers Rosa's and roses distinct. So far I haven't found any that fulfill these criteria.
If the author is reading I think a nice small improvement would be to prefix the Google search with "define" (as in: "define ${word}") to invoke Google's dictionary function.
Welp, I have to say it's a resounding success.