Hacker News new | past | comments | ask | show | jobs | submit login

I think very soon a new model will destroy whatever startups and services are built around document ingestion. As in a model that can take in a pdf page as a image and transcribe it to text with near perfect accuracy.



Extracting plain text isn’t that much of a problem, relatively speaking. It’s interpreting more complex elements like nested lists, tables, side bars, footnotes/endnotes, cross-references, images and diagrams where things get challenging.


OCR is not 100% either. Reading order is also fragile, it might OCR the word but mess up the line structure.


I think the Azure Document Intelligence, Google Document AI and Amazon Textract are among the best if not the best services though and they offer these models.


I have not tested Azure Document Intelligence, Google Document AI, but AWS Textract, LLamaparse, Unstructured and Omni made to my shortlist. I have not tested Docling, as I could not install it on my Windows laptop.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: