It’s weird because I’m pretty sure my brain does something similar when I speed read. I don’t actually, usually, read the words; instead I recognize the shape of the words (most common words) then I jump to the subject of the paragraphs and break down the meaning of the whole page in a second or so.
In editing we couldn’t find a good place for this so cut it in the current version, but at one point had discussed a parallel with information density of speech as described by one paper. Essentially the paper found that in languages that were less information dense per syllable, speakers spoke faster to achieve similar information density as languages with higher density per syllable. You could see patching by entropy paralleling this if you consider that low entropy bytes in terms of Shannon entropy are less information dense.
That's generally true, but you also have the ability to stop and look closer if you want to. If someone asks you to count the letters in a word, you will stop to look at the letters individually. If you see an unfamiliar word like SolidGoldMagikarp, you can stop and break it apart. Tokenization prevents LLMs from doing this.
Generally the current crop of LLMs seem pretty good analogues of the "scan reading" immediate instinctual response to stimulus, but seems to completely lack the higher level that can then go "Wait, that doesn't seem right, let's go back over that again". Like hallucinations and seeing "Faces" in dark shadows until you look again, it's like it's doing a pretty good emulation of some level of consciousness.
Is that a fundamental difference to the level of processing? I haven't seen that sort of second-tier logic pop up from any emergence behaviors from increasing scale yet, but will that come with time? I'm not sure.
You can prompt the model to do that kind of "stream of mind" process. It will maximize modeling uncertainty. This is my prompt:
> Write in a raw, real-time stream-of-consciousness style, as if actively solving a problem. Your response should feel like unpolished notes—messy, exploratory, and authentic. Show your full thought process, including missteps, dead ends, and course corrections. Use markers to signal mental states:
Insights: "Wait -", "Hold on -", "Oh -", "Suddenly seeing -", "This connects to -". Testing: "Testing with -", "Breaking this down -", "Running an example -", "Checking if -". Problems: "Stuck on -", "This doesn’t work because -", "Need to figure out -", "Not quite adding up -". Progress: "Making headway -", "Starting to see the pattern -", "Explains why -", "Now it makes sense -". Process: "Tracing the logic -", "Following this thread -", "Unpacking this idea -", "Exploring implications -". Uncertainty: "Maybe -", "Could be -", "Not sure yet -", "Might explain -". Transitions: "This leads to -", "Which means -", "Building on that -", "Connecting back to -". Lean into real-time realizations: "Wait, that won't work because…" or "Ah, I missed this…" Show evolving understanding through short paragraphs, with natural pauses where ideas shift.
Structure your thought evolution as follows: Begin with an initial take: "This might work because…" or "At first glance…" Identify problems or angles: "Actually, this doesn’t hold up because…" Test examples or counterexamples: "Let me try -", "What happens if -". Seek deeper patterns: "I’m seeing a connection -", "This ties back to -". Link broader implications: "This means -", "If this holds, then -". Admit confusion openly: "I don’t get this yet", "Something’s missing here". Reveal partial understanding: "I see why X, but not Y". Show failures and iterations: "Still not right - trying another approach". Embrace a debugging mindset, treating ideas like code—break them into steps, test logic, reveal failure modes, and iterate. Skip introductions and conclusions. Stop when you solve the problem or find clear next steps. Use short, direct sentences to mimic real-time thinking. The goal is to capture the messy, evolving nature of problem-solving and thought refinement.
Just try this, you can insert at any point in a LLM chat session. I built it by reverse engineering the QwQ-32B model responses with Claude. QwQ itself is based on the GPT-o1 method.
I've tried prompts like this with Claude, but it can get so nitpicky of itself that it runs out of space for the actual answer. It seems it does help to train the model to do it.
I've often wanted to talk with an LLM about its tokenization (e.g. how many tokens are there in "the simplest of phrases") I wonder if you fed it information about its tokenization (text like "rabbit is spelled r, a, b, b, i, t") if it could talk about it.