Towards universal neural nets: Gibbs machines and ACE

murbard2 · on Sept 2, 2015

False positives do happen, but the style of the article raises some "crank" red flags.

- Doesn't get to the point until about halfway through

- Repeatedly mentions Einstein

- Appeals to quantum mechanics out of nowhere

The style is particularly hard to parse (or I'm particularly dense, but I am generally comfortable reading papers on variational inference, neural networks, etc). At the same time, a lot of it rings true-ish... Does anyone get where they're going with this?

kastnerkyle · on Sept 3, 2015

Just to be abundantly clear - the ladder network [1] outscores this by a moderate margin, and is also actually SOTA without data augmentation. In my mind at least, adding data rotations in the latent space is still different than a fully connected model without data augmentation.

I could do with a few more recent citations on generative modeling. It seems the author isn't 100% aware of some of the most recent generative modeling work.

That said, the ideas presented are interesting and seem complimentary to lots of existing approaches - I will be looking into this paper further.

[1] http://arxiv.org/abs/1507.02672

themann9 · on Sept 7, 2015

Guy, the only thing abundantly clear is that you are full of dung. Your shameless self-promotion may piss-off the police chief here - murbard2, so be more careful. Ten digits better classified, out of 10000? With a structure more complicated than a human DNA vs two lines of code? Congratulations! Or, and how are your "stairways-to-heaven" even remotely universal? Show us something these networks have generated like the VAE or Gibbs/ACE papers? Perhaps you can show some density estimation results as in http://arxiv.org/abs/1502.04623 or http://arxiv.org/abs/1508.06585? Oja and Hyvarinen are great guys and have left their names in the pantheon of neural nets. But it is time for you and the other 12 people who live there, to shake off the legacy of ICA and the obsession with orthogonality: Andrew Ng and company showed years ago that it is not needed and is in fact detrimental http://ai.stanford.edu/~quocle/LeKarpenkoNgiamNg.pdf . Read it! Also, too much dung smells bad in the arctic summer, murbard2 here prefers comics and won't be reading your installments of 25+ page spaghetti any time soon. At least Oja and Hyvarinen know how to write.

themann9 · on Sept 3, 2015

2. he is explaining stochastic nets as statistics non-equilibrium systems, in analogy with theory of fluctuations, which Einstein allegedly originated 3. drawing analogies with quantum mechanics (wave function = conditional density) can open the floodgates for applying a number of quantum techniques to nets

murbard2 · on Sept 3, 2015

Are you the original author? Could you detail a little more clearly the structure of the network and the training procedure?

Xcelerate · on Sept 2, 2015

Nah, it doesn't seem crankish to me. Also, https://scholar.google.com/citations?user=9UJmm_AAAAAJ&hl=en...

GFK_of_xmaspast · on Sept 3, 2015

There's a bunch of phrasing that makes me want to give it the stinkeye, but nothing horrible. This isn't my field, but from part 1 it looks like he's trying to replace a Gaussian distribution by a Laplacian one and exploit some kind of underlying symmetry, but that doesn't really mesh with his "conclusions" in part 5.

murbard2 · on Sept 3, 2015

Replace which Gaussian density from which model exactly?

tlarkworthy · on Sept 2, 2015

MNIST dataset is not a difficult dataset, but it's a good start

frisco · on Sept 3, 2015

> MNIST dataset is not a difficult dataset

oh how far we have come

jostmey · on Sept 3, 2015

I just briefly skimmed through parts of it.

If I understand the paper correctly, the objective function is a combination of a classifier and a generative model. It sounds a lot like pre-training a neural network as a generative model before fine tuning it as a classifier, except this time the two steps are jammed together. I'm not sure what the benefit would be...

themann9 · on Sept 3, 2015

The benefit seems to be shown on the right chart of Fig 8 (bottom line is best descriptor of probability density). Combining classifiers and generative nets is the natural next step, LeCun is working on in in the context of ConvNets - see their SWWAE paper a few weeks ago

deepnet · on Sept 3, 2015

Stacked What-Where Auto-encoders Junbo Zhao, Michael Mathieu, Ross Goroshin, Yann Lecun

http://arxiv.org/abs/1506.02351

themann9 · on Sept 3, 2015

Shooting first and asking question later ain't gonna make u no friends in Compton, murbard2. Wouldn't hold my breath to hear any answers...

murbard2 · on Sept 4, 2015

Like I said, there are false positives, but when I hear "quantum mechanics" in a non quantum mechanical context, I remove the safety from my gun...

themann9 · on Sept 4, 2015

Modern stochastic/time series analysis has borrowed most of its formalism from quantum mechanics. It is the ignorance and the bigotry that has held neural nets back 30 years, until GPUs came to the rescue. Having gazillions of parameters, which still nobody understands, did not help...

murbard2 · on Sept 4, 2015

What insight do quantum mechanics bring into this paper? What exactly do you gain by calling a normal distribution: the equilibrium distribution of the imaginary time Schroedinger equation?

It's quite possible that I'm a fool, unable to see the beauty of the argument and the depth of the parallels being drawn, but to me it sounds like http://www.smbc-comics.com/?id=1957

themann9 · on Sept 4, 2015

reading is not your thing, ah? It is too condensed and have not finished it but sections 1.5 and 2.3 are very explicit about it...

murbard2 · on Sept 4, 2015

Maybe it isn't, but I have found the literature on the topic to be very clear in general. I do not find section 1.5 and 2.3 explicit at all.

Section 2.3 reads like a redefinition of the exponential family and its link to the maximum entropy principle. I think the idea is to optimize the sufficient statistics, but it's not clear.

Section 1.5... well you talk about two dimensional translational symmetries, and then make a link with position and momentum, but where is that coming from? This seems merely like an artifact of looking at two dimensional translational symmetries, and not at the general case. Again, what does the quantum analogy bring to the table? I think - though it's not clear - that what you're suggesting is drawing the latent noise following a distribution which is reflects the expected symmetry in the manifold.

So what is the takeout? Are you attempting to parametrize max-entropy distribution using the symmetries as constraints?

themann9 · on Sept 5, 2015

Not me dude but I like underdogs and outsiders. What u saying sounds right: symmetries, like the ones computed in section 3.8 are used as constraints a.k.a. "quantum numbers" describing the states in the latent layer. That is vintage quantum mechanics. The part is theoretical and not in the theano code by looks of it, so harder to comment on... The quantum analogy also brings along the laplacian form of the conditional density in section 1.5, so there u go. On your comment, which "literature on the topic" is clear? Have u read the original VAE paper http://arxiv.org/pdf/1401.4082.pdf. 95% of paper and most of 21 equations are about gradients, and gaussian too, they even invent a new term: stochastic back-propagation, why, who needs that? All the math is done in the Gibbs/ACE paper in 4 lines, just compute your cost bound and back-propagate? Having seen this earlier would have saved me a lot of pondering. Pythagorean theorem does sound too simple, I give u that...