Neural networks are merely generalized methods used to translate data. These translations occur by multiplying / adding / dividing / etc weights to the input then passing the output to the next node(s).
RNNs keep a memory of prior values such that you can pass on a “memory”.
At the end what this is doing is replacing components of the graph with mini-RNNs and pruning based on another network overseeing the first.
Having done quite a bit of work in this space I have a couple of thoughts.
What was the major advance in games? It was networks playing themselves.
Here we may want to do the same thing. Meta-learners need lots of “experience” (data) just like anything living.
This paper doesn’t dive too in-depth on the idea that the meta learner can be made general, but they can. There’s only so many problem types (classification, regression, generation, etc) and only so many data formats. Further, networks are fairly well defined, much more so than language.
This is the field of AutoML of anyones interested.
Self-play probably refers to different agents playing each other, but that has also been done since forever. It just didn't seem that interesting when the models were small with games like 4x4 tac-tic-toe.
> Neural networks are merely generalized methods used to translate data.
That's true of all computation!
To see this, consider lambda-calculus, which is a calculus of functions (i.e. things that convert inputs into outputs). The computational equivalence of lambda-calculus with Turing machines was one of the early computability theory.
Yeah that cringy start to the OP's point removed quite a bit of credibility, despite stating 'they work in the area'.
Probing towards a meta learner that is useful in many contexts is an intriguing idea though.
I also wonder about the dream for meta learners and if they can approach a limit that is different than 'Adam-and-crew' when getting away from domain-specific parameter landscapes. I imagine much of the work will tend towards context switching to deal with effectively multiple domain specific optimizers, but I may be very far off from what is observed here.
The claim of formal equivalence is restricted to "linear transformers".
The tone of the comment seems to suggest that S.'s claim is slightly ridiculous. Can you point out a concrete technical shortcoming of the paper?
It is valuable to point out connections with existing work, if only to avoid reinventing the wheel, and properly stand on the shoulders of gigants: אֵין כָּל חָדָשׁ תַּחַת הַשָּׁמֶשׁ (there is nothing new under the sun).
My guess is that tiny LSTM is less computationally expensive than a tiny transformer. Since paper proposes replacing every weight with this tiny shared net, you want to be as tiny & performant as possible.
Transformers are more complicated than RNNs and require more fine tuning. I’m guessing the RNNs were used to simplify the problem. I’m not even sure transformers would work here given we’re theyre dropping them into a process
1) This is by Schmidhuber so take it with an extra grain of salt
2) The experiments were on toy datasets. There is a reason they did not do this on a real dataset.
It being from Schmidhuber's lab makes it dramatically more credible, in my views. They've been practically a decade ahead of everybody for ages now from a theoretical point of view.
RNNs keep a memory of prior values such that you can pass on a “memory”.
At the end what this is doing is replacing components of the graph with mini-RNNs and pruning based on another network overseeing the first.
Having done quite a bit of work in this space I have a couple of thoughts.
What was the major advance in games? It was networks playing themselves.
Here we may want to do the same thing. Meta-learners need lots of “experience” (data) just like anything living.
This paper doesn’t dive too in-depth on the idea that the meta learner can be made general, but they can. There’s only so many problem types (classification, regression, generation, etc) and only so many data formats. Further, networks are fairly well defined, much more so than language.
This is the field of AutoML of anyones interested.