To start on this you would want a system that got rewarded based on how well it ...

To start on this you would want a system that got rewarded based on how well it was able to predict aspects of its environment. This would have to go in hand with preferring stimulation, so you would need something like preference for inputs which maximize relative entropy with respect to its thus far learned model.