Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
rudolph9
4 months ago
|
parent
|
context
|
favorite
| on:
Ingesting PDFs and why Gemini 2.0 changes everythi...
Under the hood tika uses tesseract for ocr parsing. For clarity this all works surprisingly well generally speaking and it’s pretty easy to run your self and order of magnitude cheaper than most services out there.
https://tesseract-ocr.github.io/tessdoc/
Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
https://tesseract-ocr.github.io/tessdoc/