I think this is one of the few functional applications of LLMs that is really un...

nnurmanov · 2025-02-06T06:29:09 1738823349

It is not OCR to blame, when you have garbage in you should not expect anything of high quality, especially with handwriting and tables and different languages. Even human beings fail to understand some documents (see doctor's prescriptions)

devmor · 2025-02-06T15:51:37 1738857097

If OCR is a solution designed to recognize documents and it does not recognize all documents, then it is an imperfect solution.

That is not to say there is a perfect solution, but it is still the fault of the solution.

nnurmanov · 2025-02-10T16:13:40 1739204020

E.g. oftentimes there is l and I (capital I), this may be an issue for OCR. The perfect case is when there is a PDF document and data embedded as XML data, but unfortunately it is not the case.