Hacker Newsnew | past | comments | ask | show | jobs | submit | rounakdatta's favoriteslogin

Like so many addictive habits, while you’re in it day to day, it feels so… important… and when you’re out… it doesn’t even make sense that complete truth could be fully knowable in the moment.

Most consequential decisions are made, or built up to, over a long time. Sure, somebody has to call moment-to-moment balls and strikes, but if that’s not literally you, your weight in the world might be better applied in slower, quieter, subtler places.

Am I morbidly curious about a plane crash? How could I not be? But the NTSB didn’t earn the credibility they have by parachuting in day-of and shooting from the hip. If anything, their processes provide discipline against first impressions blinding them to true causes.

For those who haven’t encountered Admiral Cloudberg, this is a good opportunity: https://admiralcloudberg.medium.com/


I've found algorithmic (I prefer this to "automatic") differentiation strangely slippery to explain, given the concept is essentially rather simple. I think the main reason for this is the way people often think of what the "derivative" of a function is.

In high school you're taught that the derivative of the function f(x) = x^2 is the function f'(x) = 2x. That is, the derivative of the function is another function that computes its gradient. Algorithmic differentation is very confusing if you think in these terms.

When learning multivariable calculus, and when getting your head around AD, it's better to think of the derivative as a linear approximation of the original function around a particular point. In the single-variable case that means we interpret the derivative not as (a function that gives you) the gradient of the tangent line at a point, but as the tangent line itself at a particular point.

In the case of f(x) = x^2, then, the derivative at x=3 is the tangent line to the curve at the point x=3, y=9. It's best to define this in terms of offsets from the point where we're evaluating the derivative, so (y - 9) = 2 * (x - 3).

Key points to note:

1. This derivative is a linear function. Specifically, in this case in defining the tangent line we get y-9 as a linear function of x-3. (Incidentally an alternative and perhaps more obvious representation of this tangent line would give y in terms of x, i.e. y = 2x+3. This is a less useful representation though because y is not a linear function of x, it's merely an affine function - i.e. the line doesn't pass through the origin.)

2. The linear function we get depends on which value of x we evaluate it for. We have a different derivative at each point.

3. When we evaluate this linear function we must give it an offset, a value of (x-3) for which to compute the corresponding value of (y-9).[1]

This concept of derivative generalises naturally to multiple input and output variables. The derivative is still a linear function that approximates the original function around a particular (multidimensional) point. Being a linear function it can be represented as a matrix: the Jacobian matrix. It's worth nothing, though, that the Jacobian matrix is merely a representation of the linear function we are calling the derivative, and it is far from the only one possible. In a computer we can (and usually do in AD) represent the derivative as a (computer) function that takes an offset and returns an offset, and make a call to it to evaluate it rather than ever forming a matrix.

With this way of looking at things, it's relatively easy to understand AD. The key insight is that derivatives of functions compose in the same way as the original functions. If we compose two functions and we know their individual derivatives (also functions), we can evaluate the derivative of the composition by composing the two derivatives in the same way. More generally, if you have a program consisting of a call graph of numeric functions and you can make derivative functions that correspond to those individual functions they will form a call graph with the same structure, allowing you to evaluate the derivative function of the entire program. This is forward mode AD.[2]

This is related to the chain rule but in AD calling it that somewhat obscures the point and hides its intuitive obviousness. We shouldn't be surprised that derivatives compose in the same way as the original functions given they are linear approximations of those original functions!

Anyway, personally I found this to be the insight that let me get my head round AD. I hope it helps someone!

[1] This input exists in AD code when we evaluate the derivative, and its meaning is often very unclear to beginners. In forward mode AD we actually evaluate this linear approximation function. That is, we choose a point around which to linearly approximate and an offset from that point for which to evaluate the derivative function. See [2] for the purpose we use that offset for.

[2] Adjoint/reverse mode uses a different linear function which composes the opposite way round to the original function, making it less convenient to compute but bringing big computational complexity benefits in typical real-world cases where you have a small number of inputs but a large number of outputs. In forward mode a single evaluation of the graph of derivatives can give you the sensitivities of a single output to each of the inputs. If we want the sensitivities of all the outputs (i.e. to compute the entire Jacobian) we need one evaluation for each output. The purpose we use the offset discussed in [1] for is selecting which output we're currently computing the sensitivities for. In adjoint mode this is reversed, and we need to perform one evaluation of the graph for each input, of which there are usually many fewer than outputs.


My secret is f5bot.com ^_^

I’m notified when keywords related to “human wants thing, my app can do thing” appear on HN, Reddit and Lobsters.

If I can then contribute with information to that discussion, I’ll also leave a link to my app.

Don’t just self plug, people (myself included) appreciate more detailed information on how they can solve their own personal problem, instead of being thrown into “here’s an app, figure it out”


Where is the "Don't switch" option? Because as much as I hate that Facebook bought them, they're part of what I would call "critical social infrastructure."

Asking to leave WhatsApp is like cancelling your contract with your ISP and going offline just because you don't like the company.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: