The generator we use in our paper is a form of restriction of 'direct' training ...

gcucurull · on Feb 20, 2020

Hey, congrats on the paper, I read it a while ago and thought it was really interesting.

I tried implementing it, and the samples generated by the Teacher seem to suffer from mode collapse (as if the generator is ignoring the random vector z but not the label condition). Do you recall having that issue at some point?

I have to say I'm using a simpler generator than the one in the paper, and I'm not changing the learner architechture at each batch, only its weights.

Thanks!

felipepsuch · on Feb 20, 2020

Thanks, I'm glad you liked it! Mode collapse was actually the one thing I never encountered during my exploration (which was the reason we looked into using GTNs as a mode-collapse solution for GANs). That said, I found meta-learning to be surprisingly hard to implement efficiently and ran into more bugs in both PyTorch and TensorFlow than I can count.

Changing the learner architecture is not that important actually so that's probably not your problem.

gcucurull · on Feb 20, 2020

Ok, I'll keep digging to figure out where the problem might be, thanks!

_0ffh · on Feb 20, 2020

> Feel free to ask any questions

Hi, thanks for the invitation!

How can you be sure that the synthetic data you generate does not bias the architecture search away from the optimal solution for real data in a way similar to how early truncation of learning biases architecture search towards quick learners, and possibly away from peak performers?

heyitsguay · on Feb 21, 2020

Probably just an empirical comparison with other NAS strategies and hand crafted architectures right? The whole area of research is still ruthlessly empirical.

Eug894 · on Feb 20, 2020

Oh! I have a somewhat inconvenient question. I am ok, if you don't answer it. But... Why not work for Elon Musk or the USA government? There are rumors that Uber are owned by Russians and report directly to Putin, jk ; )