Evolutionary algorithm to find Quake III's fast inverse square hack with Python

chengsun · on April 9, 2014

Řrřola (a famous demoscene coder [1]) found a far better constant than the original (a 2.7-fold accuracy improvement) by using an optimisation algorithm over all three constants present in the original inverse square hack.

His algorithm and the result (along with a plot of relative error) can be found on his blog [2].

[1]: See, for example, this 256-byte raycaster https://www.youtube.com/watch?v=R35UuntQQF8

[2]: http://rrrola.wz.cz/inv_sqrt.html

soravux · on April 9, 2014

This is really interesting. I'll be sure to include these references in my next blog post on the subject. The goal of the post was to find the equation from scratch, though. As it can be seen, the constant optimization was less the spotlight of the post.

zo1 · on April 9, 2014

Not to mention your blog engine/site is broken. Please fix the fact that we can't click on the right scrollbar due to your fancy shmancy javascript side menu.

Put it on the left hand side, or don't overlay it over the scrollbar.

Damn #hipstercoders

nitrogen · on April 9, 2014

The placement of any site elements on top of the scrollbar is indeed annoying. Is that a Blogspot thing, or is it due to a customization?

masklinn · on April 9, 2014

It's blogspots "Dynamic Views". Complete dreck. I can only assume it's the default when you create a blog nowadays as I have no idea why anybody (least of all developers) would opt into that crap.

coldpie · on April 9, 2014

Someone at blogspot needs a smack on the head. The site takes several seconds to load, which is absolutely ludicrous for displaying plain text, and when it finally does it's unusable as you've noted.

Sprint · on April 9, 2014

It's a sadly commonly used theme that also does not display anything at all if you don't enable Javascript.

on April 9, 2014

[deleted]

coldpie · on April 9, 2014

Requiring tons of javascript and seconds of load time to display plaintext? Yeah, that is pretty crazy.

soravux · on April 9, 2014

Everything is on default from Blogspot. I only put the Gist source code displayer and the LaTeX math javascript. If anyone is offering a new design, I would gladly take it... As long as I can concentrate on the content and not the container.

xellisx · on April 9, 2014

Has anyone gone in an plugged this into the Quake III / IOQuake 3 source and see if it's make a difference? Speed or better looking geometry?

jheriko · on April 9, 2014

its not faster and the accuracy improvement is more of a curiosity than of real value.

infact on modern desktop/console hardware invsqrt hack is really bad. it causes type aliasing which is 'expensive' in native software terms. the x86(-64) chips underlying them all now has a dedicated instruction for this, as do the PPCs that came in the generation before... still of value on ARM though which is becoming more and more important.

Houshalter · on April 8, 2014

There is an amazing, easy to use peice of software for doing stuff like this called Eureqa (http://www.nutonian.com/products/eureqa/). I've used it to find approximations for a lot of different functions.

danbruc · on April 8, 2014

Am I correct when I think of this as a curve/surface fitting tool that tries to automatically find a low complexity equation together with parameters that fits some provided data?

jmpeax · on April 8, 2014

The term for this is called "Symbolic Regression".

Houshalter · on April 8, 2014

Yes, that's exactly it.

chengsun · on April 9, 2014

On the flip side of the coin, there's ries, which takes a number and finds a mathematical expression that approximates it: http://mrob.com/pub/ries/

chinpokomon · on April 9, 2014

I had a program for my HP-48GX that use to do something similar. It was especially useful when I knew that my decimal answer had some numerical constant, such as sqrt(2), and I wanted to convert it to something more reasonable. It would only return a single value, so I had to be careful that I didn't just use it every time, since just because an answer could approximate a multiple of Pi, it might only be a coincidence and not a good answer.

wtallis · on April 9, 2014

Some of that was actually built-in to the calculator. The "→Q" and "→Q𝛑" functions took a floating-point number and return a rational number or rational expression in terms of pi. The tolerance for the approximation was determined by how many significant digits the calculator was configured to display. It was pretty nice for a calculator that didn't include a computer algebra system.

chinpokomon · on April 9, 2014

It was specifically an improved library of those functions. I'll have to dig around on hpcalc.org and see if I can find it again. You could do algebraic calculations if you wrapped the statement in quotes, but once you got used to RPN that made little sense to do.

chinpokomon · on April 10, 2014

It was specifically this library: http://www.hpcalc.org/details.php?id=1403.

Houshalter · on April 9, 2014

That's really cool, I've been looking for something like that.

soravux · on April 8, 2014

Eureqa seems a wonderful software for this specific application. If only I could get 2,500$ for a one-year license...

Houshalter · on April 8, 2014

It has a 30 day free trial and you can use it for free if you limit to a few hundred datapoints (not ideal but workable.)

It's not perfect for this application though because it doesn't bitwise operations or use execution time as a fitness function.

cfn · on April 9, 2014

There are cheaper ones such as GeneXproTools.

s-macke · on April 8, 2014

This reminds of another trick on the C64 to multiply two 8 Bit numbers a*b = [ ((a+b)/2)^2 - ((a-b)/2)^2 ]

more or less: one 8 bit additon, one 8 bit subtraction, two table lookups for the squares and one 16 bit subtraction.

bane · on April 9, 2014

Were the divide by twos just bit shift operations? (I have no idea if the 6510 had bit shift operators.)

muyuu · on April 9, 2014

Yep, it does.

ROL rotate left

ROR rotate right

ASL arithmetic shift left

LSR logic shift right

But for it to work for all 2-complement coded values we need to ensure that the MSB is copied back into itself to keep the value the same sign, which can be done with another shift.

; Divide a signed 16-bit value by 2

        LDA MEM+1    ;Load the MSB

        ASL A        ;Copy the sign bit into C

        ROR MEM+1    ;restore MSB

        ROR MEM+0    ;Rotate the LSB

It's a bunch of cycles but nothing compared to a generic division.

s-macke · on April 9, 2014

Well, I think it is part of the table.

http://codebase64.org/doku.php?id=base:fast_8bit_multiplicat...

But the C64 had also a shifting operation.

anonymousUser · on April 9, 2014

Hacker's Delight [1][2] is an amazing book dedicated to discussing about all kinds of bit hacking, including fast integer square root, cube root calculation.

[1]: http://www.hackersdelight.org/

[2]: http://www.amazon.com/Hackers-Delight-Edition-Henry-Warren/d...

hcarvalhoalves · on April 9, 2014

Now the question is: could you arrive at this algorithm without previous knowledge of what it should converge to?

soravux · on April 9, 2014

That is the exact goal of the post. I arrived to the algorithm through evolution with random initialization. I enforced absolutely no heuristic to make it converge to this equation.

hcarvalhoalves · on April 9, 2014

I know, but can we prove there's no heuristic on that algorithm? Because if the answer is "yes", this should be general enough to find other optimizations. It's fun to think about.

soravux · on April 9, 2014

The best answer I can provide is: The code is there. There are no heuristics in it. Everytime you run it, you get different results (because of random initialization seeds and the stochastic nature of EA). It may find the (a - (x >> 1)) equation on a specific execution, or not. Over the runs I made, this equation (or similar) was the most popular and nothing come close to it. In fact, it finds a lot of other optimizations; either less accurate or way more complex. I remember getting tens and tens of operations in equations with "bof" accuracies.

pdenya · on April 9, 2014

Why is that a relevant question now? How would doing this be more helpful without the results of the algorithm you're attempting to cut corners to approximate?

lelandbatey · on April 8, 2014

Is anyone else seeing a perpetual "loading" string where the example code is supposed to be?

soravux · on April 8, 2014

It's javascript that loads the Gist into the page. You may have NoScript or similar that prevents the loader to fetch the code. The first code is the one from the Wikipedia page (fast inverse square root) and the second is this one: https://gist.github.com/soravux/9673839

EGreg · on April 9, 2014

Shameless plug some of you might like to read: http://www.flipcode.com/tpractice/

(I used to be really into game programming when I was in my teens.)

Orangeair · on April 8, 2014

Why on earth would they create a named constant for 1.5F, but not for 0x5f3759df?

jws · on April 9, 2014

What would you name it?

    #define AND_THEN_A_MIRACLE_OCCURS (0x5f3759df)

Orangeair · on April 9, 2014

Fair enough. Still, it seems silly to name a constant THREE_HALVES. What information does that add over 1.5? What happens if you change it and it becomes wrong?

officialjunk · on April 9, 2014

MAGIC_NUMBER?

jheriko · on April 9, 2014

With fast native code its more than practical to search the entire space of floating point numbers and guarantee that you find the optimal magic number:

http://jheriko-rtw.blogspot.co.uk/2009/04/understanding-and-...

As already mentioned in another comment it has been taken further by Řrřola who has optimised a version with the newton step including the constants involved in that.

http://rrrola.wz.cz/inv_sqrt.html

pixelcort · on April 9, 2014

For a moment I thought this was a way to workaround any patents with this hack, by rediscovering the hack dynamically; but I could be remembering a different patented technique.

georgemcbay · on April 9, 2014

You might be thinking of depth-fail shadow volumes, semi-famous for being patented by Creative despite having been discovered independently by a number of people (including Carmack) over the years.

Houshalter · on April 9, 2014

That's actually a cool idea. I wonder if that would hold up.

personalcompute · on April 9, 2014

Under US law it would not, as even genuine independent (re-)discovery is not a defense to infringement.

Houshalter · on April 9, 2014

That's pretty ridiculous. The purpose of the law is to keep people from unfairly copying ideas, not rediscovering them.

However this is different. It's not a person rediscovering it, but an algorithm which happens to do exactly the same thing but is provably not unique or patented. It's a different algorithm that just happens to converge on the same process.

This is perfectly analogous to patents on real world inventions - there are plenty of cases of different machines that do the same thing. You can't patent the output of an algorithm, just the process by which it gets to that output.

polymatter · on April 9, 2014

You can be liable for triple damages if the court decides you knew about the patent and infringed anyway. Normal damages if you did not. I believe independent invention is meant to be so rare as to be suspicious (despite many documented cases where this happened).

The (supposed) purpose is to give an incentive to inventors to freely broadcasting their inventions, by giving them a monopoly on it so they can recoup their investment. Without that incentive, inventors would keep their inventions trade secrets.

jheriko · on April 9, 2014

and how do you prove this?

alexholehouse · on April 9, 2014

This is probably one of my favourite HN posts of 2014. Outstanding.

acchow · on April 9, 2014

Oh, I thought this gonna be like regex golf:

http://xkcd.com/1313/

Misleading title.