This sentence has forty-five (45) characters.

Lxr · on March 9, 2017

Reminds me of a fun problem I encountered as a younger lad:

This sentence has three as, one b, two cs, two ds, thirty-six es, three fs, three gs, eleven hs, nine is, one j, one k, three ls, one m, eighteen ns, twelve os, one p, one q, eight rs, twenty-six ss, twenty ts, two us, five vs, seven ws, three xs, four ys and one z.

I eventually found the solution by iteratively updating the approximate distribution of each letter and finally sampling [1] - not sure if there is a better way!

[1] solve_2 in tsh.py at https://bitbucket.org/akxlr/tsh/src

ouid · on March 9, 2017

you use a gradient descent approach, but there's no way to guarantee anything like convexity, right?

I bet you can construct a language where a solution exists, but all of its neighbors' errors (using the topology implied by your gd function) are local maxima.

Lxr · on March 10, 2017

Right, it is definitely not convex in any way and gradient descent is basically useless (and not very interesting because all it really means here is try perturbing each count by one and look for improvement). I left that there as an initial attempt at solving the problem but I don't use it in my solution (except for the call to `gd(iter(gd(v)))` where I explore the neighbouring solutions to each sample point, which is probably not necessary).

The basic idea instead is to treat each letter count as a random variable, and ask what the distribution of each count is. In particular, each letter count can be expressed as a sum of (a mapping of) the other letter counts, so if you know the distribution of all other letter counts you can improve the distribution of the letter of interest. Initially assume uniform for all counts, then iterate until convergence. The images in that repo show the distribution of some letters at various stages. After doing this for about 10 iterations you stop seeing any improvement (the letter distributions are as 'peaked' as they are going to get), at which point I drew samples until I found a solution.

ouid · on March 12, 2017

Alright, I admit that I skimmed, came across a bunch of GD code at the beginning and assumed that you just got lucky with a very greedy approach, sorry :P.

The update function you wrote on the distribution space is continuous, and distribution space is compact (since there's no way to have more than N of any letter), so there is necessarily at least one fixed point.

Fixed points could still be sources though, right? It clearly wasn't, but I'm curious to know if you got lucky, or if you merely didn't get unlucky.

I wonder how effective your code would be on the following self descriptive sentences (base j).

For instance when j=2, "100 1,11 0" has 4 ones and 3 zeros as described.

Does your code consistently find solutions as you increase j? At what point does the computation become unfeasible?

Lxr · on March 13, 2017

Nice! I admit I don't have a good understanding of the analysis, but if you mean "does repeatedly applying f to some starting point v, like f(f(f(...(v)))) eventually lead to a solution" the answer is definitely not - you can be right next to a solution and not get there by GD or by applying f. My function `iter` applies f over and over, but this was another failed attempt at a solution.

I will endeavour to try your problem, is it related in some way to a well-known problem?

JadeNB · on March 8, 2017

Or the good old self-referential aptitude test. http://www.maa.org/press/periodicals/math-horizons/self-refe...

calt · on March 8, 2017

Or "(This Song's Just) Six Words Long" ... even though he's clearly not singing it as a contraction.

https://www.youtube.com/watch?v=JWi5jdgTUJs

kristopolous · on March 9, 2017

That's pretty catchy. Weird Al is a really talented musician. I almost think he'd do greater things if he wasn't such a goofball all the time.

noonespecial · on March 9, 2017

Remaining a goofball all the time despite his talent and success is the great thing he's done.

SmellyGeekBoy · on March 9, 2017

The guy's career has spanned decades. Much longer than most of the "stars" he's parodied. He must be doing something right.

bdamm · on March 9, 2017

Not quite, because it is computationally infeasible to brute-force a pre-image on MD5.

glitchdout · on March 10, 2017

Slightly related, I think you'll like this puzzle: https://www.youtube.com/watch?v=x1THOPm0qTw