We started with using LLMs for parsing at Tensorlake (https://docs.tensorlake.ai...

We started with using LLMs for parsing at Tensorlake (https://docs.tensorlake.ai), tried Qwen, Gemini, OpenAI, pretty much everything under the sun. My thought was we could skip 5-6 years of development IDP companies have done on specialized models by going to LLMs.

On information dense pages, LLMs often hallucinate half of the times, they have trouble understanding empty cells in tables, doesn't understand checkboxes, etc.

We had to invest heavily into building a state of the art layout understanding model and finally a table structure understanding for reliability. LLMs will get there, but there are some ways to go there.

Where they do well is in VQA type use cases, ask a question, very narrowly scoped, they will work much better than OCR+Layout models, because they are much more generalizable and flexible to use.