Well, aside from the edited in bit about OCR. Of course there isn't a separate run to do OCR because that was literally the first step during image analysis. You know, before the conversion to simple tokens.
You understand that OCR is the process of extracting text from images, right? You know, such as what Gemini does, and they reference repeatedly in their paper. I have absolutely no idea why you repeatedly make some bizarre distinction about it being a "separate process".
Okay, it's been fun talking to you but feel free to have the last word. Good luck.
Image tokens are patches of the image. Each image is divided into ~256 parts. Those parts are the tokens.
There's no separate run to another OCR.