Great example. Right, the deep learning approach uncovers all kinds of hidden fe...

EricMausler · on Sept 22, 2023

I am still learning transformers, but I believe part of the issue may be that the weights do not necessarily correlate to things like "orangeness"

Instead of a transformer for each color, you have like 5 to 100 weights that represent some arbitrary combination of colors. Literally the arbitrariness is defined by the dataset and the number of weights allocated.

They may even represent more than just color.

So I am not sure if a weight is actually a "dial" like you are describing it, where you can turn up or down different qualities. I think the relationship between weights and features is relatively chaotic.

Like you may increase orangeness but decrease "cone shapedness" or accidentally make it identify deer as trees or something, all by just changing 1 value on 1 weight

space_boy · on Sept 23, 2023

It is possible that the parameters, like weights in a machine learning model, interact to yield outcomes in a manner analogous to the interactions between genes in biological systems, which produce traits. These interactions involve complex interdependencies, so there really aren't 1 to 1 dials.

mistrial9 · on Sept 22, 2023

> the deep learning approach uncovers all kinds of hidden features and relationships automatically that a team of humans might miss

sitting in a lecture from a decent DeepLearning practitioner, there were two questions from the audience (among others). The first question asked "How can we check the results using other models, so that computers will catch the errors that humans miss?"

The second question was more like "when a model is built across a non-trivial input space, the features and classes that come out are one set of possibilities, but there are many more possibilities. How can we discover more about the model that is built, knowing that there are inherent epistemological conflicts in any model?"

I also thought it was interesting that the two questioners were from large but very different demographic groups, and at different stages of learning and practice (the second question was from a senior coder).