It's interesting how entropy maps to arts. Some art has almost no surprises, like romantic novels and pop songs. Other forms of art like experimental music and theater of absurd is built around surprises. Depending on your tastes you would seek out art with just the right amount of surprises that keeps you engaged. The brain struggles to find the meaning in consumed media. If it is too easy to find then the content is trivial and boring, if it is too hard then content is frustratingly inaccessible (and often labeled as pretentious). For example the prog rock music is to my tastes a just right mix of familiar structures and frequent surprises.
It is curious how same art piece becomes more enjoyable after you experience it multiple times. For me the song enjoyment peaks around forth listening. It's like brain enjoys repeating same surprises over and over. I'd guess that can be explained by experience of the reward that comes from brain creating a successful model of input that can perfectly predict next parts.
One case that stands out from this predictability-of-art model is modern sitcom format. Generally the humor is built around concept of surprising with punchline, but many sitcoms are just repeating same joke in predictable way. Instead of actual humor they are just broadcasting "this bit right here is supposed to be funny" every couple of seconds. The laugh track creates conditioning that triggers good emotions in people. Just find a sitcom video with laughing track removed and see how eerie they become.
Anyway this is more about how mind experiences entropy, and has nothing to do with information theory itself. For that I would recommend a great book The Information: A History, a Theory, a Flood. The historical treatment of information storage (writing) and transmitting (communication) is especially good.
There's a paper by Schmidhuber [1] in which he argues, IIRC, that not the surprise as in high or just-right entropy is what satisfies the observer of e.g. art, but the fact that it teaches you a new, previously unknown pattern of regularity about the world which allows you to compress information better. I may be misconstruing it though b/c it's been a while since I looked into it.
On another note, Brian Eno has cited Morse Peckham's "Man's Rage for Chaos"[2], which seems to be more in line with what you are saying and to claim that the purpose of art is offering surprising shifts in perspective that break up the otherwise overly strong drive towards order and predictability.
See also the goldilocks effect in developmental psychology: "Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex".
> Generally the humor is built around concept of surprising with punchline, but many sitcoms are just repeating same joke in predictable way. Instead of actual humor they are just broadcasting "this bit right here is supposed to be funny" every couple of seconds. The laugh track creates conditioning that triggers good emotions in people. Just find a sitcom video with laughing track removed and see how eerie they become
I've heard this idea a few times, and seen YouTube videos of, for example, segments from Friends with the laughter removed. And it does feel weird without the laughter. But I'm not sure it's correctly diagnosed. Friends (which I'm picking on because that's the example I've seen) was filmed in front of a live audience, and the actors respond to the audience. I think mostly it feels weird without the laughter because you get lots of unnaturally long pauses in the flow of the lines - those pauses are where the actors were leaving space for the audience's laughter. This is completely normal when acting in front of a live audience and doesn't mean that the jokes aren't funny. (Yes, some of the jokes aren't funny - when you have a lot of episodes you'll get some duds)
I'm still trying to make sense of this all myself - you've got Karl Friston's Free Energy Principle essentially saying that the brain is trying to minimise surprise. And yet if you look at the explore-exploit dilemma then 'optimism in the face of uncertainty' is a possible strategy. I'm sure there is a way to reconcile these - I don't understand the Karl Friston stuff properly yet.
This view of art, where the spectator is active participant and generalizes from incomplete information, reminds of the painting that Dan Dennett loves to talk about: https://youtu.be/fjbWr3ODbAo?t=690
I love maths when it takes complicated problems and models them in simple terms. I think "Information Theory" actually does the opposite - it's taken a problem about *data encoding* and articulated that problem as something that can tell us about the properties and nature of *information* itself. The theory implies that pieces of information are comparable with other pieces of information but this is only true inside a strict set of boundaries defining what can possibly be encoded.
Investigating the nature of information and finding ways to measure it is a subject that will never sit comfortably in the sciences. That's okay, let's not force it! Maybe it's time we reconsider labeling this with the very grand title of "Information Theory".
Richard Hamming (known for the Hamming code among others) has said that "information theory" ought to be called "communication theory". Obviously, communication theory is a field in itself, but I think the gist of his argument holds somewhat true in that information theory tells little about what information is intrinsic but instead tells about what information can be _learned_.
But I think the intuition breaks down. What is mutual information then? Classically, it is the information which can be reliably transmitted between a transmitter and receiver (given some model of both.) However, the unit is still bits.
Communication theory is far too narrow a subfield.
Physics has a lot to say about this. Physicists have done a lot of information-theoretic analysis of the laws of physics, and physics-theoretic analysis of information theory. Information theory can no longer be disentangled from the physics itself. Meaning, what information processing tasks (including but not limited to communication tasks) are possible and with what complexity depends on the laws of the universe we live in. And conversely, which universes are even possible (those with "reasonable" laws) are constrained by the theorems of information theory.
Yes, it is from Hamming's "The Art of Doing Science and Engineering".
From page 89:
> Information Theory was created by C.E.Shannon in the late 1940s. The management of Bell Telephone Labs wanted him to call it “'Communication Theory” as that is a far more accurate name, but for obvious publicity reasons “Information Theory” has a much greater impact—this Shannon chose and so it is known to this day. The title suggests the theory deals with information—and therefore it must be important since we are entering more and more deeply into the information age.
Then later on on page 90:
> First, we have not defined “information”, we merely
gave a formula for measuring the amount. Second, the measure depends on surprise, and while it does match, to a reasonable degree, the situation with machines, say the telephone system, radio, television, computers, and such, it simply does not represent the normal human attitude towards information. Third, it is a relative measure, it depends on the state of your knowledge. If you are looking at a stream of “random numbers” from a random source then you think each number comes as a surprise, but if you know the formula for computing the “random numbers” then the next number contains no surprise at all, hence contains no information! Thus, while the definition Shannon made for information is appropriate in many respects for machines, it does not seem to fit the human use of the word. This is the reason it should have been called “Communication Theory”, and not “Information Theory”. It is too late to undo the definition (which produced so much of its initial popularity, and still makes people think it handles “information”) so we have to live with it, but you should clearly realize how much it distorts the common view of information and deals with something else, which Shannon took to be surprise.
> This is a point which needs to be examined whenever any definition is offered. How far does the proposed definition, for example Shannon’s definition of information, agree with the original concepts you had, and how far does it differ? Almost no definition is exactly congruent with your earlier intuitive concept, but in the long run it is the definition which determines the meaning of the concept—hence the formalization of something via sharp definitions always produces some distortion.
Information theory provides a way to measure the "size" of probability distributions, which has applications wherever we are interested in quantifying our uncertainties. For example, it can be used to target observations for reducing forecast uncertainties, such as in [0], or [1].
>The theory implies that pieces of information are comparable with other pieces of information but this is only true inside a strict set of boundaries defining what can possibly be encoded.
Information theory is concerned exactly to formalize this "strict set of boundaries".
It's not some grandiose claim or some kind of philosophical musings, if you imply that, it's math.
At least the information theory as known from Shannon and co.
> The theory implies that pieces of information are comparable with other pieces of information but this is only true inside a strict set of boundaries defining what can possibly be encoded.
How much information is in the story of Little Red Riding Hood? You could measure the bits encoded in the text but is that really measuring the amount of information communicated? I'm sure that story has additional meaning (and less surprise) to folk in Europe who recognize the symbolism and analogues in the story to relationships to other parts of culture.
Does this story have more or less information in it than Goldilocks and the Three Bears? You could measure the amount of information encoded in the telling of the story to find out but that would give you a very unsatisfactory answer.
> I'm sure that story has additional meaning (and less surprise) to folk in Europe
Indeed, information is a receiver dependent quantity, as you say, and is formalized as such in any information theory textbook. Really, the information in a message about variable X is how much the receiver's subjective probability distribution about X is constrained by the message.
In practice, creating a complete model of the system i.e. enumerating all the possible variables and their prior probabilities is impossible. However, this doesn't mean the entire field is useless. In fact, the work of Shannon and others helps us understand the limits of what is possible to model and gain information about, and by how much. What those limits mean requires grappling with philosophical questions far above my paygrade.
I don't see any reason why information theory has to be limited to encoded information in bits. It deals generally with a quantity p(x), with no qualification for what p(x) is. You could, for example, tokenize words and figure out how complex the language used is, given some measure over the distribution of words in the english language. Similarly, you could go higher and work with characters, and figure out the surprise factor of using bears, but again you would need some model or measure of the distribution of character types in english literature. Figuring out the entropy of a coin toss is fairly trivial, but figuring out the entropy of, say, a story book, requires coming up with a model of the distribution over story books, which is the hard part. In that regard, information theory is quite powerful and gives us the tools to do all of that, it's just that the models are highly non-trivial, or otherwise missing.
I have a feeling you may be conflating some concepts in information theory, and possibly redefining information in multiple places.
Goldilocks and Red Riding Hood require a codec to decipher. Otherwise they are literally just strings of bits. A simple codec can take the bitstring and encode it as characters. So that you, may use your codec (your eyes) to send an encoding to another codec (your brain) to decipher and finally intepret the meaning. Codecs are kind of a tricky thing, in that they can compress a lot of information into a single bit, but to set up that compression requires a lot of ‘work’ outside the scope of a given transmission window.
To answer your question as to which story contains more information, you need to first ask ‘how sophisticated is my codec?’ I can tell you from watching children grow up, the same story can have vastly different information conveyed depending on when the story is consumed.
>How much information is in the story of Little Red Riding Hood? You could measure the bits encoded in the text but is that really measuring the amount of information communicated?
Yes, in several important ways.
It's not about putting a number to the "subjective experience" of reading/appreciating the story however.
It is curious how same art piece becomes more enjoyable after you experience it multiple times. For me the song enjoyment peaks around forth listening. It's like brain enjoys repeating same surprises over and over. I'd guess that can be explained by experience of the reward that comes from brain creating a successful model of input that can perfectly predict next parts.
One case that stands out from this predictability-of-art model is modern sitcom format. Generally the humor is built around concept of surprising with punchline, but many sitcoms are just repeating same joke in predictable way. Instead of actual humor they are just broadcasting "this bit right here is supposed to be funny" every couple of seconds. The laugh track creates conditioning that triggers good emotions in people. Just find a sitcom video with laughing track removed and see how eerie they become.
Anyway this is more about how mind experiences entropy, and has nothing to do with information theory itself. For that I would recommend a great book The Information: A History, a Theory, a Flood. The historical treatment of information storage (writing) and transmitting (communication) is especially good.