Yes, only the missing file data needs to be sent. Working out what's missing is ...

Yes, only the missing file data needs to be sent. Working out what's missing is the problem, since protocols that aren't aware of git's structure (blobs, trees, commits, etc.) cannot exploit that knowledge to e.g. walk up from each ref and stop when we hit existing ancestors. Instead, they're stuck comparing the file contents of two entire copies of a repo (via their Merkle trees). That's probably not a big deal for smaller projects (e.g. I've played with hosting my own git projects on IPFS), but it's a lot of overhead for projects like the Linux kernel with massive histories, lots of refs, many developers frequently pushing and pulling many changes, etc.