Hacker News new | past | comments | ask | show | jobs | submit login

Nice IPhone cluster.

Have you tried something based on deep-learning that uses Transformers : https://github.com/roatienza/deep-text-recognition-benchmark (available weights are for tasks that seem similar to OCR so there is a good chance you can use it out of the box). With a good gpu it should process hundreds to thousands image per seconds, so you likely can build your index in less than a day. (Maybe you can even port it to your iphone stack :) )

https://github.com/microsoft/GenerativeImage2Text (You'll probably have to train on your custom dataset that you have constituted)

There are tons of other freely available solutions that you can get with a search for things with keywords like "image to text ocr" "transformers" "visual transformers"...




You can do better than a general image-to-text model reading memes, because they all use the same fonts - so you want something trained off synthetic data made with that font.


Personally, I've been hunting for something that can extract both the text and the associated image. I've never seen anything that can do both.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: