Hey, congrats on the paper, I read it a while ago and thought it was really interesting.
I tried implementing it, and the samples generated by the Teacher seem to suffer from mode collapse (as if the generator is ignoring the random vector z but not the label condition). Do you recall having that issue at some point?
I have to say I'm using a simpler generator than the one in the paper, and I'm not changing the learner architechture at each batch, only its weights.
Thanks, I'm glad you liked it!
Mode collapse was actually the one thing I never encountered during my exploration (which was the reason we looked into using GTNs as a mode-collapse solution for GANs). That said, I found meta-learning to be surprisingly hard to implement efficiently and ran into more bugs in both PyTorch and TensorFlow than I can count.
Changing the learner architecture is not that important actually so that's probably not your problem.
How can you be sure that the synthetic data you generate does not bias the architecture search away from the optimal solution for real data in a way similar to how early truncation of learning biases architecture search towards quick learners, and possibly away from peak performers?
Probably just an empirical comparison with other NAS strategies and hand crafted architectures right? The whole area of research is still ruthlessly empirical.
Oh! I have a somewhat inconvenient question. I am ok, if you don't answer it. But... Why not work for Elon Musk or the USA government? There are rumors that Uber are owned by Russians and report directly to Putin, jk ; )
PS: I'm the author of the GTN paper. Feel free to ask any questions