Jeff Hawkins gets shit on a lot in ML because his theories haven’t produced models with results as good as the mainstream approach, but I’m glad they keep working at it and keep coming up with interesting ideas.
Too much of ML these days is about some NN model that does 0.n% better than SOTA on some specific task. Then you change one tiny parameter and he entire thing breaks, and it turns out we didn’t understand why it was working at all.
I do think his reasoning and focus of what's wrong with ML today is good. Online learning, sequence learning and "motor" feedback are all critical.
Unfortunately he came up with this HTM and appears to be trying to cram all advances into the same framework rather than stepping back and thinking about how those critical features could be implemented differently.
Also as far as I know hebbian learning (like HTM) has achieved ~90% accuracy on MNIST while SOA is above 99% accuracy. So it's not a small gap between HTM and state of the art.
If HTM was near 98% it would be impressive (learning is faster and transfer better). I think alternatives or hybrids are necessary, but he seems stuck on his pet project.
That said, the principles and "what's important for intelligence" I think he is right on target and articulates those aspects well.
>Jeff Hawkins gets shit on a lot in ML because his theories haven’t produced models with results as good as the mainstream approach, but I’m glad they keep working at it and keep coming up with interesting ideas.
And also because Numenta's work isn't good, empirically-checkable neuroscience either.
Not sure whether that was a jab at Numenta or not, but I do think the combination of these two comments cuts to the heart of the PR problem for Numenta: they're neither trying to do 0.1% better on the classic perception benchmarks than other algorithms do, nor trying to publish accurate, testable descriptions of the true details of meatspace neuroscience.
To me, exploring alternative network architectures and algorithms seems an extremely worthwhile goal even if it's only loosely tethered to actual biology, but from a PR perspective they really need to be better about priming the conversation if they want people to care.
Bad (neuroscience-focused): "We're doing a lot of research on neuroscience, and finding some really interesting stuff, so we built a model that doesn't exactly match the way the brain works but is still interesting. No, we haven't tried to make it work to classify ImageNet test cases, that's not our goal. But look, it's closer to biology, and we have working code that we're playing with!"
Better (ML-focused): "We're developing a novel neural network architecture that performs online unsupervised learning using only local update rules. Though it performs competently at classic benchmarks X, Y, and Z when a small WTA layer is thrown on top, it can also tackle problems A, B, and C that classical deep learning networks can't make any progress on."
To be fair, I'm not even sure if Numenta's networks could perform competently at any classic benchmarks (I'm guessing that if they could, it would take some work to get them to do so), and I have no idea what new problems it could work on. But they really do need to reframe the conversation and emphasize that sort of innovation if they want to be taken more seriously - focusing on neuroscience underpinnings is not a great move if they're not engaged in research that can actually win over neuroscientists, and just pointing out that they're focusing on those things is not a way to win over industry ML folks if they don't have any results to point at.
>To be fair, I'm not even sure if Numenta's networks could perform competently at any classic benchmarks (I'm guessing that if they could, it would take some work to get them to do so), and I have no idea what new problems it could work on. But they really do need to reframe the conversation and emphasize that sort of innovation if they want to be taken more seriously - focusing on neuroscience underpinnings is not a great move if they're not engaged in research that can actually win over neuroscientists, and just pointing out that they're focusing on those things is not a way to win over industry ML folks if they don't have any results to point at.
It was definitely a jab, but I've also got some sympathy for their project. I genuinely agree that, well, theoretical and computational neuroscience need to become more genuinely computational! We're seeing an emerging computational paradigm for neuroscience that isn't just about jamming "network architectures" or "neural circuits" together and hoping something works; it supposedly has strong mathematical principles.
Ok, so where's the code? Sincere question. Some papers do simulations in Matlab, R, or Python that's just not shared. This includes even papers that purport to be applying these neuroscience-derived principles to robotics problems.
Computational cognitive science does a bunch better: their custom-built Matlab gets shared!
If we really believe our theories, we should put them to the computational test. If we put them to the test and they don't work well, we should either revise the theories, or revise the benchmarks. Maybe ImageNet classification scores are a bad idea for how to measure precise, accurate sensorimotor inference! New benchmarks for measuring the performance of "real" cognitive systems are a great idea! Let's do it!
But that requires that we do the slow work of trying to merge theoretical/computational neurosci, cognitive science, and ML/AI back together, at least in some subfields. This is challenging, because nobody's gonna give us our own journal for it until a few prestigious people advocate for one.
What part of Numenta's model is wrong? I certainly see some guesses and simplifications in their model, especially in poorly researched areas, but I figure that's the cost of trying to get things to work.
Numenta's HTM theory seems functionally similar to variants of N-gram models (https://en.wikipedia.org/wiki/N-gram), where the model is able to predict future steps based on learned conditional probabilities. Maybe that's why it hasn't worked exceptionally well?
Cargo cult programming abounds in a lot of the ML stuff I read. It's complex and complicated, and I think it happens because a lot of the research happens in corporate environments with top-down pressure. I have worked in computational modeling of sensory motor systems, and it is really challenging work. This is way more interesting than banal model performance bragging!
As a neuroscientist who just started doing ML research, I would call this paper cargo cult programming. If you cobble together a hodgepodge of ideas from neuroscience and build a network to accomplish some trivial task with no baseline to compare it to, I find it really difficult to take anything away from that. Ignoring the cortical column aspect, I'm not particularly convinced that Hawkins's model is a better approximation of biology than a typical deep neural network, just different (and likely far less capable, if you were to apply it to a challenging task). Why not start with a network that we know works and make a biologically-inspired change, and then see if that improves performance on a well-studied problem? If it does, then you have 1) an improvement on the previous network and 2) weak evidence that your idea of how the brain works may be right, if we assume that the brain is a highly optimized information processing device.
Jeff Hawkins gets shit on a lot in ML because his theories haven’t produced models with results as good as the mainstream approach, but I’m glad they keep working at it and keep coming up with interesting ideas.
I just ran across a paper [1] from Numenta on Time Series Anomaly Detection using HTM last night which provide a benchmark[2] with some existing approachs. (But it seems to me there is no NN based approach in them. )
Huh, I didn't realise that. That 'patents' clause is pretty powerful - if my (very quick) reading is right, any time you contribute to a GPL3 project, any patents that you own which would apply to that project are basically set free for anyone to use. Cool for FOSS advocates but I can see why a lot of businesses are wary of anything using GPL3.
GPLv3 initial draft was a copy of the clause from the apache license:
"each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work,"
The final gplv3 text says:
"Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version."
As can be seen, they are very similar. The key difference between apache, a fairly popular license with businesses, and GPLv3 in the realm of patents, is the Novell-Microsoft pact clause. Its not about patents that you own, but rather patents for which you sub license from others.
"If you convey a covered work, knowingly relying on a patent license,... you must ether ... arrange to deprive yourself of the benefit of the patent license for this particular work or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients."
This clause is the main crux of the issue. Large companies with large lawyer departments has complex patent deals with competitors, and deprive yourself of the patent would mean to open up yourself to being sued, and the other option would mean renegotiate with the competitor to grant the patent to all and everyone.
From a mostly outside point of view, I've wondered a lot about this. It seems a lot of ML/DL research that goes on nowadays amounts to educated guessing and tinkering i.e. "Based on our notion of what seemed right, we tried this other configuration of layers and/or activation functions and look; it worked".
I imagine at some point that the kind of research discussed in the linked post will begin to pan out more and more and will eventually lead to another period of rapid progress like that which followed Hinton's work on DL.
Too much of ML these days is about some NN model that does 0.n% better than SOTA on some specific task. Then you change one tiny parameter and he entire thing breaks, and it turns out we didn’t understand why it was working at all.