I often think of this as the corollary to the bitter lesson: it is enough just to break some particular relationship between two things, and then throw compute at it. In the process of trying to recover that relationship, the model learns about the underlying real relationship - sometimes it even learns about each thing itself.