I stopped at: "causal sequence of “thoughts” "

benchmarkist · 2024-11-22T01:23:13 1732238593

Interpretability research is basically a projection of the original function implemented by the neural network onto a sub-space of "explanatory" functions that people consider to be more understandable. You're right that the words they use to sell the research is completely nonsensical because the abstract process has nothing to do with anything causal.

HeatrayEnjoyer · 2024-11-22T03:00:10 1732244410

All code is causal.

benchmarkist · 2024-11-22T03:05:28 1732244728

Which makes it entirely irrelevant as a descriptive term.

mdp2021 · 2024-11-22T09:26:15 1732267575

"Servers shall be strict in formulation and flexible in interpretation."