You've just described the dual numbers, which provide a way of implementing forw...

goldenkey · on Aug 1, 2022

> Autodiff using the dual numbers has pulled off the seemingly impossible. It's not seemingly impossible, if you understand that the chain rule for f' is just being executed at the same time as the calculation of f by having derivatives for basic operations already defined.

However, like I said, if you have a calcuation method that hides derivatives in terms that have been truncated, then this will not save you.

(-1)*i * x*(2*i + 1) / factorial(i) is not one of those methods -- and sin is rather regular in regard to its Taylor series. So of course it works out in this case, with a normal power series calculation method.

Try a different series or calculation method, and dual numbers will get you a wildly different result. Understand, dual numbers only work well, when you use a method of calculation that front-loads terms that have high bearing on the derivative. Otherwise the missing terms/truncation, causes severe inaccuracy.

However, stenciling might actually perform better in these scenarios.

ogogmad · on Aug 2, 2022

> Try a different series or calculation method, and dual numbers will get you a wildly different result

No. The example works because:

  While the exact value of f(x,a) isn't 1, given any inexact representation of real numbers like floating point or fixed point, the value of "a" can be chosen so that f(x,a) has 1 as its closest representation.

Trying to compute f(x,a) differently isn't going to change that, so stencilling methods are never going to work here. But autodiff will always work. This means I win your challenge.

Your other claims are probably gibberish. You need to provide an example.

goldenkey · on Aug 2, 2022

What don't you understand -- the method of calculating the function is an approximation, thus the AD derivative is dependent on it, and the AD derivative is a derivative of the approximation, not the actual function. Whereas an approximation of the actual derivative is what we are truly after.

> The key take away here is that the map is not the territory. Most nontrivial functions on computers are implemented as some function that that approximates (the map) the mathematical ideal (the territory). Automatic differentiation gives back a completely accurate derivative of the that function (the map) doing the approximation. Furthermore, the accurate derivative of an approximation to the idea (e.g d_my_sin), is less accurate than and approximation to the (ideal) derivative of the ideal (e.g. my_cos). There is no truncation error in the work the AD did; but there is a truncation error in the sense that we are now using a more truncated approximation that we would write ourselves.

https://www.oxinabox.net/2021/02/08/AD-truncation-error.html

AD is great but if you have a calculation method ill-suited for AD, then you'll get shite results. Why is this surprising?

And yeah, stenciling is mostly for PDEs and other state spaces that we don't have a closed form for. It's generally not used for an analytic function. But you can use it for an analytic function if you tailor the stencil to the function. In fact, you'll just yield a truncated Taylor polynomial if you provide a perfect function-specific stencil.