Skipping some detail: the model applies many high-dimensional functions to the i...

Skipping some detail: the model applies many high-dimensional functions to the input, and we don't know the reasoning for why these functions solve the problem. Reducing the dimension of the weights to human-readable values is non-trivial, and multiple neurons interact in unpredictable ways.

Interpretability research has resulted in many useful results and pretty visualizations[1][2], and there are many efforts to understand Transformers[3][4] but we're far from being able to completely explain the large models currently in use.

[1] - https://distill.pub/2018/building-blocks/

[2] - https://distill.pub/2019/activation-atlas/

[3] - https://transformer-circuits.pub/

[4] - https://arxiv.org/pdf/2407.02646