Hacker News new | past | comments | ask | show | jobs | submit login

This reminds me of a thought I had some years ago. The idea was that we can think of general intelligence not as an optimization for a specific given goal, but as optimization for a special position from which some wider set of goals can be most rapidly converged upon. Thus optimizing for future flexibility rather than current results.

At the time, I remember being excited to hear of a physics paper[1] concerning an inverted pendulum, where they solved the system for some dynamic forces which would keep the system at the position of maximum instability, and claimed that it was, in some sense, a description of dynamic intelligence. The analogy there is that this is the unique position from which the pendulum can be efficiently made to move quickly in any 'required' direction (the 'goal'.)

I still think that idea has some merit, but putting together a coherent formalization of it seems really tricky and requiring some genius far beyond my own meager pondering.

[1] I found the article: https://physics.aps.org/articles/v6/46




As a step in that direction, you can take some inspiration from what the brain does: as you learn things better and better, the knowledge essentially gets pulled down to neuronal layers that are closer to sensory input. This leaves the higher layers more free to do other stuff (potentially reusing the results from the surface layers), which is a step in the direction of optimizing for future flexibility.

It's possible to create rules that operate on an already-trained network and push it in this direction without totally destroying what it's learned, by "fuzzing" the original network to generate a bunch of input/output pairs, and then using that dataset to retrain smaller sub-networks. For instance, if you have a 5 layer network that you've trained on a classification task, you can often use that network as a teacher to train a smaller network to do pretty damn well on the same classification task, even in some cases where training the smaller network directly would have been very difficult. There are several reasons that this trick can work, not the least of which is that in a sense it is a way to expand the training set dramatically.

NB: the above approach is probably not how you'd implement this, there are less crude methods to incentivize shallower levels to have more activation than deeper ones that would probably work better

I can easily imagine a phased training strategy that oscillates between a) learning new things by making the deeper layers more malleable and the shallower ones fairly rigid, and b) compressing all the data by opening up the shallow layers to change and replaying input/output into itself. I have no idea if there are any benchmarks around this sort of thing, though, typically benchmarks have fixed goals so ability to retrain for additional tasks is not really measured.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: