Řrřola (a famous demoscene coder [1]) found a far better constant than the original (a 2.7-fold accuracy improvement) by using an optimisation algorithm over all three constants present in the original inverse square hack.
His algorithm and the result (along with a plot of relative error) can be found on his blog [2].
This is really interesting. I'll be sure to include these references in my next blog post on the subject.
The goal of the post was to find the equation from scratch, though. As it can be seen, the constant optimization was less the spotlight of the post.
Not to mention your blog engine/site is broken. Please fix the fact that we can't click on the right scrollbar due to your fancy shmancy javascript side menu.
Put it on the left hand side, or don't overlay it over the scrollbar.
It's blogspots "Dynamic Views". Complete dreck. I can only assume it's the default when you create a blog nowadays as I have no idea why anybody (least of all developers) would opt into that crap.
Someone at blogspot needs a smack on the head. The site takes several seconds to load, which is absolutely ludicrous for displaying plain text, and when it finally does it's unusable as you've noted.
Everything is on default from Blogspot. I only put the Gist source code displayer and the LaTeX math javascript.
If anyone is offering a new design, I would gladly take it... As long as I can concentrate on the content and not the container.
its not faster and the accuracy improvement is more of a curiosity than of real value.
infact on modern desktop/console hardware invsqrt hack is really bad. it causes type aliasing which is 'expensive' in native software terms. the x86(-64) chips underlying them all now has a dedicated instruction for this, as do the PPCs that came in the generation before... still of value on ARM though which is becoming more and more important.
There is an amazing, easy to use peice of software for doing stuff like this called Eureqa (http://www.nutonian.com/products/eureqa/). I've used it to find approximations for a lot of different functions.
Am I correct when I think of this as a curve/surface fitting tool that tries to automatically find a low complexity equation together with parameters that fits some provided data?
I had a program for my HP-48GX that use to do something similar. It was especially useful when I knew that my decimal answer had some numerical constant, such as sqrt(2), and I wanted to convert it to something more reasonable. It would only return a single value, so I had to be careful that I didn't just use it every time, since just because an answer could approximate a multiple of Pi, it might only be a coincidence and not a good answer.
Some of that was actually built-in to the calculator. The "→Q" and "→Q𝛑" functions took a floating-point number and return a rational number or rational expression in terms of pi. The tolerance for the approximation was determined by how many significant digits the calculator was configured to display. It was pretty nice for a calculator that didn't include a computer algebra system.
It was specifically an improved library of those functions. I'll have to dig around on hpcalc.org and see if I can find it again. You could do algebraic calculations if you wrapped the statement in quotes, but once you got used to RPN that made little sense to do.
But for it to work for all 2-complement coded values we need to ensure that the MSB is copied back into itself to keep the value the same sign, which can be done with another shift.
; Divide a signed 16-bit value by 2
LDA MEM+1 ;Load the MSB
ASL A ;Copy the sign bit into C
ROR MEM+1 ;restore MSB
ROR MEM+0 ;Rotate the LSB
It's a bunch of cycles but nothing compared to a generic division.
Hacker's Delight [1][2] is an amazing book dedicated to discussing about all kinds of bit hacking, including fast integer square root, cube root calculation.
That is the exact goal of the post. I arrived to the algorithm through evolution with random initialization. I enforced absolutely no heuristic to make it converge to this equation.
I know, but can we prove there's no heuristic on that algorithm? Because if the answer is "yes", this should be general enough to find other optimizations. It's fun to think about.
The best answer I can provide is: The code is there. There are no heuristics in it. Everytime you run it, you get different results (because of random initialization seeds and the stochastic nature of EA). It may find the (a - (x >> 1)) equation on a specific execution, or not. Over the runs I made, this equation (or similar) was the most popular and nothing come close to it. In fact, it finds a lot of other optimizations; either less accurate or way more complex. I remember getting tens and tens of operations in equations with "bof" accuracies.
Why is that a relevant question now? How would doing this be more helpful without the results of the algorithm you're attempting to cut corners to approximate?
It's javascript that loads the Gist into the page. You may have NoScript or similar that prevents the loader to fetch the code.
The first code is the one from the Wikipedia page (fast inverse square root) and the second is this one:
https://gist.github.com/soravux/9673839
Fair enough. Still, it seems silly to name a constant THREE_HALVES. What information does that add over 1.5? What happens if you change it and it becomes wrong?
With fast native code its more than practical to search the entire space of floating point numbers and guarantee that you find the optimal magic number:
As already mentioned in another comment it has been taken further by Řrřola who has optimised a version with the newton step including the constants involved in that.
For a moment I thought this was a way to workaround any patents with this hack, by rediscovering the hack dynamically; but I could be remembering a different patented technique.
You might be thinking of depth-fail shadow volumes, semi-famous for being patented by Creative despite having been discovered independently by a number of people (including Carmack) over the years.
That's pretty ridiculous. The purpose of the law is to keep people from unfairly copying ideas, not rediscovering them.
However this is different. It's not a person rediscovering it, but an algorithm which happens to do exactly the same thing but is provably not unique or patented. It's a different algorithm that just happens to converge on the same process.
This is perfectly analogous to patents on real world inventions - there are plenty of cases of different machines that do the same thing. You can't patent the output of an algorithm, just the process by which it gets to that output.
You can be liable for triple damages if the court decides you knew about the patent and infringed anyway. Normal damages if you did not. I believe independent invention is meant to be so rare as to be suspicious (despite many documented cases where this happened).
The (supposed) purpose is to give an incentive to inventors to freely broadcasting their inventions, by giving them a monopoly on it so they can recoup their investment. Without that incentive, inventors would keep their inventions trade secrets.
His algorithm and the result (along with a plot of relative error) can be found on his blog [2].
[1]: See, for example, this 256-byte raycaster https://www.youtube.com/watch?v=R35UuntQQF8
[2]: http://rrrola.wz.cz/inv_sqrt.html