Hacker News new | past | comments | ask | show | jobs | submit login
Meta Learning Backpropagation and Improving It (2020) (arxiv.org)
138 points by lnyan on Oct 25, 2022 | hide | past | favorite | 29 comments



Neural networks are merely generalized methods used to translate data. These translations occur by multiplying / adding / dividing / etc weights to the input then passing the output to the next node(s).

RNNs keep a memory of prior values such that you can pass on a “memory”.

At the end what this is doing is replacing components of the graph with mini-RNNs and pruning based on another network overseeing the first.

Having done quite a bit of work in this space I have a couple of thoughts.

What was the major advance in games? It was networks playing themselves.

Here we may want to do the same thing. Meta-learners need lots of “experience” (data) just like anything living.

This paper doesn’t dive too in-depth on the idea that the meta learner can be made general, but they can. There’s only so many problem types (classification, regression, generation, etc) and only so many data formats. Further, networks are fairly well defined, much more so than language.

This is the field of AutoML of anyones interested.


> What was the major advance in games? It was networks playing themselves.

I'm not sure. We have been using self-play since forever. That's how Alpha-Beta search works..


Self-play probably refers to different agents playing each other, but that has also been done since forever. It just didn't seem that interesting when the models were small with games like 4x4 tac-tic-toe.


Tesauro used self-play to train a master level backgammon player nigh 30 years ago.

https://www.semanticscholar.org/paper/TD-Gammon%3A-A-Self-Te...


> Neural networks are merely generalized methods used to translate data.

That's true of all computation!

To see this, consider lambda-calculus, which is a calculus of functions (i.e. things that convert inputs into outputs). The computational equivalence of lambda-calculus with Turing machines was one of the early computability theory.


Yeah that cringy start to the OP's point removed quite a bit of credibility, despite stating 'they work in the area'. Probing towards a meta learner that is useful in many contexts is an intriguing idea though. I also wonder about the dream for meta learners and if they can approach a limit that is different than 'Adam-and-crew' when getting away from domain-specific parameter landscapes. I imagine much of the work will tend towards context switching to deal with effectively multiple domain specific optimizers, but I may be very far off from what is observed here.


What's the reasoning behind choosing LSTMs here? Won't an attention model be even more effective?


Did you see who the co-author of the paper is? :)


As you should know: Schmidhuber even invented the Transformer-Models, to be read in: https://arxiv.org/abs/2102.11174

;)


The claim of formal equivalence is restricted to "linear transformers". The tone of the comment seems to suggest that S.'s claim is slightly ridiculous. Can you point out a concrete technical shortcoming of the paper?


Naaa, I just wanted to pointing out his attitude towards other research.


It is valuable to point out connections with existing work, if only to avoid reinventing the wheel, and properly stand on the shoulders of gigants: אֵין כָּל חָדָשׁ תַּחַת הַשָּׁמֶשׁ‎ (there is nothing new under the sun).


Yep, but you can do this without acting like an ass.


I bet OP was his student


How much are you willing to bet?


My guess is that tiny LSTM is less computationally expensive than a tiny transformer. Since paper proposes replacing every weight with this tiny shared net, you want to be as tiny & performant as possible.


Transformers are more complicated than RNNs and require more fine tuning. I’m guessing the RNNs were used to simplify the problem. I’m not even sure transformers would work here given we’re theyre dropping them into a process


If you want it tiny and performant and choose LSTM over transformers why not just skip LSTM and use GRUs?


Anyone else read this headline as "Facebook parent company does a thing"?



That one always cracks me up. Munroe truly is a genius. The 'imaginary friends' one is just as funny.


Can we start calling Zuckerberg's Meta something else. Zmeta?


Facebook works!


Inb4 any learning algorithm released in the next 20 years is claimed to be just a special case that can be learnt via this meta learning framework.

(For the record, I do believe the man has been done dirty a few times. Hate the meme, not the memer.)


1) This is by Schmidhuber so take it with an extra grain of salt 2) The experiments were on toy datasets. There is a reason they did not do this on a real dataset.


It being from Schmidhuber's lab makes it dramatically more credible, in my views. They've been practically a decade ahead of everybody for ages now from a theoretical point of view.


Agreed. I'd go as far as to put Schmidhuber and his lab above the laureates.


> This is by Schmidhuber so take it with an extra grain of salt

What makes you say that?


Yes! Spill the tea!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: