Hacker News new | past | comments | ask | show | jobs | submit login

Thanks. How does Textract compare to come of the common cli utilities like pdftotext, tesseract, etc (if you made a comparison)?



I did, none of the open source parser worked well with tables. I had the following issues:

- missing cells. - partial identification for number (ex: £43.54, the parser would pick it up as £43).

What I did to compare is drawing lines around identified text to visualize the accuracy. You can do that with tesseract.


Interesting. Did you try MS's offering (Azure AI Document Intelligence). Their pricing seems better than Amazon.


Not yet but planning to give it a try and compare with textract.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: