You can correct the transcript to create the ground truth. Or print your own doc...

47282847 · 2024-08-09T22:38:47 1723243127

Standard datasets can no longer be used for benchmarking against LLMs since they have already been fed into it and are thus too well-known to compare to lesser known documents.

eigenvalue · 2024-08-09T19:20:23 1723231223

Oh you meant for just a single benchmarked document. I thought you meant to report that for every document you process. I wouldn't want to mislead people by giving stats on a particular kind of scan/document, because it likely wouldn't carry over in general.