Hacker News new | past | comments | ask | show | jobs | submit login

Great example. Right, the deep learning approach uncovers all kinds of hidden features and relationships automatically that a team of humans might miss.

I guess I'm thinking about this problem from the perspective of these GPT models requiring more training data than a normal person can acquire. Currently, it seems you need the entire internet worth of training data (and a lot of money) to get something that can communicate reasonably well. But most people can communicate reasonably well, so it would be cool if that basic communication knowledge could be somehow used to accelerate training and minimize the reliance on training data.




I am still learning transformers, but I believe part of the issue may be that the weights do not necessarily correlate to things like "orangeness"

Instead of a transformer for each color, you have like 5 to 100 weights that represent some arbitrary combination of colors. Literally the arbitrariness is defined by the dataset and the number of weights allocated.

They may even represent more than just color.

So I am not sure if a weight is actually a "dial" like you are describing it, where you can turn up or down different qualities. I think the relationship between weights and features is relatively chaotic.

Like you may increase orangeness but decrease "cone shapedness" or accidentally make it identify deer as trees or something, all by just changing 1 value on 1 weight


It is possible that the parameters, like weights in a machine learning model, interact to yield outcomes in a manner analogous to the interactions between genes in biological systems, which produce traits. These interactions involve complex interdependencies, so there really aren't 1 to 1 dials.


> the deep learning approach uncovers all kinds of hidden features and relationships automatically that a team of humans might miss

sitting in a lecture from a decent DeepLearning practitioner, there were two questions from the audience (among others). The first question asked "How can we check the results using other models, so that computers will catch the errors that humans miss?"

The second question was more like "when a model is built across a non-trivial input space, the features and classes that come out are one set of possibilities, but there are many more possibilities. How can we discover more about the model that is built, knowing that there are inherent epistemological conflicts in any model?"

I also thought it was interesting that the two questioners were from large but very different demographic groups, and at different stages of learning and practice (the second question was from a senior coder).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: