You can think of an LLM as a production line - feed a series of tokens in, and t...

You can think of an LLM as a production line - feed a series of tokens in, and they get embedded and then processed through the system one step at a time though however many transformer layers the model has (undisclosed for most recent models, but GPT-3 has 96).

Those fixed 96 (or whatever) steps of processing limit the complexity of what the model can do, so it will fail if the task is too complicated unless it it breaks it down into simpler steps that each can be done well with that depth (96 steps) of processing.

It's not just appearing to do so - with chain-of-thought prompting you are literally telling it to "think step by step" as part of the prompt, so this is what it outputs. You could also tell it to generate a step by step plan, then elaborate on each of those steps.

I don't think we can say exactly how it is deciding to break a task into steps, anymore than we can in general say exactly how these LLMs are working, but intuitively it's similar to how we think and talk (which is what the LLM is trained on) - a good speaker/writer will introduce a complex topic as a top-down decomposition.