Hacker News new | past | comments | ask | show | jobs | submit login

Given that all parameters are trained jointly at inference time and a single sample of z is supposed to encode ALL inputs and outputs for a given puzzle (I think), I don't quite understand the role of the latent z here. Feels like μ and Σ could be absorbed into θ (and an additional variance parameter).

Although if they partition z such that each section corresponds to one input and run f_θ through those sections iteratively, then I guess it makes sense.




I agree, z should be absorbed into μ and Σ, e.g. you always input `[1 0 0 ... 0]`, and the first layer of the neural network would essentially output z. They would have to stop approximating the KL(q(θ)||p(θ)) as O(θ^2) though, so maybe this is more computationally efficient?


Could be. Also, as you imply, they'd have to loosen the regularization penalty on θ, and maybe it's difficult to loosen it such that it won't become too prone to overfitting.

Maybe their current setup of keeping θ "dumb" encourages the neural network to take on the role of the "algorithm" as opposed to the higher-variance input encoded by z (the puzzle), though this separation seems fuzzy to me.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: