Hmm; I would think a better definition would include a description of the behavior as being greater than the sum of its parts. I can imagine abilities and behaviors being in the training data but not exhibited by a smaller LLM but being present in a bigger one. My favorite example is the ability to multiply large number consistently and accurately (e.g. 23540932 x 3274, which ChatGPT-4 just failed to do correctly for me when I asked it). A better LLM would learn how to carry the process out step-by-step.