Hacker News new | past | comments | ask | show | jobs | submit login

I made a script which does exactly the same thing but locally using koboldcpp for inference. It downloads MiniCPM-V 2.6 with image projector the first time you run it. If you want to use a different model you can, but you will want to edit the instruct template to match.

* https://github.com/jabberjabberjabber/LLMOCR




MiniCPM-v 2.6 is probably the best self-hosted vision model I have used so far. Not just for OCR, but also image analysis. I have it setup, so my NVR (frigate) sends couple of images upon motion alert from a driveway security camera to Ollama with minicpm-v 2.6. I’m able to get a reasonably accurate description of the vehicle that pulled into the driveway. Including describing the person that exits the vehicle and also the license plate. All sent to my phone.


I love this. Can you share the source?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: