I think you sum up my feelings about him as well. He's a bit much sometimes but ...

voidhorse · 2025-06-20T23:27:53 1750462073

> we laugh at "theory" which we stand on

This aspect of the industry really annoys me to no end. People in this field are so allergic to theory (which is ironic because CS, of all fields, is probably one of the ones in which theoretical investigations are most directly applicable) that they'll smugly proclaim their own intelligence and genius while showing you a pet implementation of ideas that have been around since the 70s or earlier. Sure, most of the time they implement it in a new context, but this leads to a fragmented language in which the same core ideas are implemented N times with everyone particular personal ignorant terminology choices (see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages).

godelski · 2025-06-21T00:09:07 1750464547

My favorite Knuth quote[0]

  If you find that you're spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you're spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice.

But yeah, in general I hate how people treat theory, acting as if it has no economic value. Certainly both matter, no one is denying that. But there's a strong bias against theory and I'm not sure why. Let's ask ourselves, what is the economic impact of Calculus? What about just the work of Leibniz or Newton? I'm pretty confident that that's significantly north of billions of dollars a year. And we what... want to do less of this type of impactful work? It seems a handful of examples far covers any wasted money on research that has failed (or "failed").

The problem I see with our field, which leads to a lot of hype, is the belief that everything is simple. This just creates "yes men" and people who do not think. Which I think ends up with people hearing "no" when someone is just acting as an engineer. The job of an engineer is to problem solve. That means you have to identify problems! Identifying them and presenting solutions is not "no", it is "yes". But for some reason it is interpreted as "no".

  > see for example, the wide array of names for basic functional data structure primitives like map, fold, etc. that abound across languages

Don't get me started... but if a PL person goes on a rant here, just know, yes, I upvoted you ;)

[0] You can probably tell I came to CS from "outside". I have a PhD in CS (ML) but undergrad was Physics. I liked experimental physics because I came to the same conclusion as Knuth: Theory and practice drive one another.

mindcrime · 2025-06-20T23:49:24 1750463364

I get weird looks sometimes lately when I point out that "agents" are not a new thing, and that they date back at least to the 1980's and - depending on how you interpret certain things[1] - possibly back to the 1970's.

People at work have, I think, gotten tired of my rant about how people who are ignorant of the history of their field have a tendency to either re-invent things that already exist, or to be snowed by other people who are re-inventing things that already exist.

I suppose my own belief in the importance of understanding and acknowledging history is one reason I tend to be somewhat sympathetic to Schmidhuber's stance.

[1]: https://en.wikipedia.org/wiki/Actor_model

godelski · 2025-06-21T00:15:41 1750464941

Another interesting thing I see is how people will refuse to learn history thinking it will harm their creativity[0].

The problem with these types of interpretations is that it's fundamentally authoritarian. Where research itself is fundamentally anti-authoritarian. To elaborate: trust but verify. You trust the results of others, but you replicate and verify. You dig deep and get to the depth (progressive knowledge necessitates higher orders of complexity). If you do not challenge or question results then yes, I'd agree, knowledge harms. But if you're willing to say "okay, it worked in that exact setting, but what about this change?" then there is no problem[1]. In that setting, more reading helps.

I just find these mindsets baffling... Aren't we trying to understand things? You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.

[0] https://bsky.app/profile/chrisoffner3d.bsky.social/post/3liy...

[1] Other than Reviewer #2

bluefirebrand · 2025-06-21T00:39:44 1750466384

> Aren't we trying to understand things?

Unfortunately, for most of us, no. We are trying to deliver business units to increase shareholder value

godelski · 2025-06-21T00:54:17 1750467257

I think you should have continued reading from where you quoted.

  >> Aren't we trying to understand things? ***You can really only brute force new and better things if you are unable to understand. We can make so much more impact and work so much faster when we let understanding drive as much as outcome.***

I'm arguing that if you want to "deliver business units to increase shareholder value" that this is well aligned with "trying to understand things."

Think about it this way:

  If you understand things:
    You can directly address shareholder concerns and adapt readily to market demands. You do not have to search, you already understand the solution space.

  If you do not understand things:
    You cannot directly address shareholder concerns and must search over the solution space to meet market demands.

Which is more efficient? It is hard to argue that search through an unknown solution space is easier than path optimization over a known solution space. Obviously this is the highly idealized case, but this is why I'm arguing that these are aligned. If you're in the latter situation you advantage yourself by trying to get to the former. Otherwise you are just blindly searching. In that case technical debt becomes inevitable and significantly compounds unless you get lucky. It becomes extremely difficult to pivot as the environment naturally changes around you. You are only advantaged by understanding, never harmed. Until we realize this we're going to continue to be extremely wasteful, resulting is significantly lower returns for shareholders or any measure of value.

voidhorse · 2025-06-20T23:59:21 1750463961

I'm in the same boat. At least there's a couple of us that think this way. I'm always amazed when I run into people who think neural nets are a relatively recent thing, and not something that emerged back in the 1940s-50s. People seem to tend to implicitly equate the emergence of modern applications of ideas with the emergence of the ideas themselves.

I wonder at times if it stems back to flaws in the CS pedagogy. I studied philosophy and literature in which tracing the history of thought is basically the entire game. I wonder if STEM fields, since they have far greater operational emphasis, lose out on some of this.

mindcrime · 2025-06-21T00:11:56 1750464716

> people who think neural nets are a relatively recent thing, and not something that emerged back in the 1940s-50s

And to bring this full circle... if you really (really) buy into Schmidhuber's argument, then we should consider the genesis of neural networks to date back to around 1800! I think it's fair to say that that might be a little bit of a stretch, but maybe not that much so.

godelski · 2025-06-21T03:38:42 1750477122

Tbf, he literally says that in the interview

> Around 1800, Carl Friedrich Gauss and Adrien-Marie Legendre introduced what we now call a linear neural network, though they called it the “least squares method.” They had training data consisting of inputs and desired outputs, and minimized training set errors through adjusting weights, to generalize on unseen test data: linear neural nets!

spwa4 · 2025-06-21T16:39:39 1750523979

... except linear neural nets have a very low level of maximum complexity, no matter how big the network is, until you introduce a nonlinearity, which they didn't. They tried, but it destroys the statistical reasoning so they threw it out. Also I don't envy anyone doing that calculation on paper, least squares is already going to suck bad.

Until you do that, this method is version of a Taylor series, and the only real advantage is the statistical connection between the outcome and what you're doing (and if you're evil, you might point out that while that statistical connection gives reassurance that what you're doing is correct, despite being a proof, points you in the wrong direction)

And if you want to go down that path, SVM kernel-based networks do it better than current neural networks. Neural networks throw out the statistical guarantees again.

If you want to really go back far with neural networks, there's backprop! Newton, although I guess Leibniz' chain rule would make him a very good candidate.

godelski · 2025-06-21T18:29:20 1750530560

There have been non-linear least squares like methods for quite some time.