So that 3-4 mins at 1FPS means you are using about 500 to 700 tokens per image, ...

jxy on Feb 21, 2024 | parent | context | favorite | on: The killer app of Gemini Pro 1.5 is using video as...

So that 3-4 mins at 1FPS means you are using about 500 to 700 tokens per image, which means you are using `detail: high` with something like 1080p to feed to gpt-4-vision-preview (unless you have another private endpoint).

The gemini 1.5 pro uses about 258 tokens per frame (2.8M tokens for 10856 frames).

Are those comparable?