Hacker News new | past | comments | ask | show | jobs | submit login
Open Source OCR in JavaScript (antimatter15.com)
111 points by aram on July 30, 2014 | hide | past | favorite | 26 comments



I am a bit surprised at how low the accuracy seems to be. Does anyone know if this is typical just of OCR done in JS, or OCR in general? I am aware that at least one or two implementations are extremely good (eg. Google ones) but are those complete outliers?


That is specific for this implementation. Note that cursive/handwritten text will continue to be an issue, but machine printed is pretty solid and especially easy if you can narrow down the scope of the expected result (somewhere in this thread someone wonders of it would be possible to prefer 0 to o etc - sure). Note that

a) it's not possible to consistently reach 100% (and depending on the source material and circumstances far less) recognition (have to educate customers about that..)

b) errors are a tradeoff between 'dunno' and 'might be a zero' aka miss vs. false positive. Benchmarks/evaluations usually consider the latter far worse, prefering jf?oster to jf0ster by a long shot. So that's what you'll try to archive.

Source: Approaching ten years (yeah..) in a company that sells OCR solutions and more, integrating Abby, Oce and another half a dozen commercial engines.


> machine printed is pretty solid

You mean OCR in general or ocradJS? I tried cropping a picture down in the later and I can't get it to come even close: http://imgur.com/cesg0IE


I work at a publisher with millions of digitized historical books that we OCR and Abbyy is what we use. Nothing else came close to Abbyy. It is incredibly good, but incredibly expensive.


in my humble experience using OCR programs, there is always a considerable amount inaccuracy. no matter what font I use or font size, I always either end up proof reading the scanned document or just typing it by hand. the letter "O" is almost always translated by the OCR as a "0" or a zero is translated as an "O". it can be pretty frustrating.


I used the ABBYY orc engine to digitize printed documents (idk why they couldn't just keep around the file used to print) and it was quite accurate. At worst one out of a couple hundred would have enough issues where readability was an issue.


Similar experience here, when building a mobile app that did OCR + translations.. As long as the source image was in decent shape, ABBYY did very well. It's also incredibly expensive.


I wish there was a library that allowed you to input expected data (ex: we expect to see zeros 20% more often then the letter O), then the interpreter could compare against that fact and determine the likelihood of it being each letter. As it stands for most libraries that I'm aware of, you just have to get the data and run your own tests to see whether it should be an O or a zero.


Ocrad is not very powerful, it uses hand-written recognisers (one per character) to identify the shapes of the characters. Compare this with more modern libraries such as Tesseract which use neural networks and OCRopus which adds language modelling.


I liked that the demo shows the program failing. Nice to see the capabilities AND the limitations displayed front an center. Definitely impressive.


I kept giggling at its poor recognition. It's comically bad, but I think it's a step in the right direction. It was very fast at incorrectly identifying letters. If only it were very fast and mostly correct.


It is good at recognizing machine-generated text - hit the blue arrow - and not that good at human-scribbled text with a mouse, I find.

I assume you were testing hand-written text?


Hand writing my name Chris was difficult for it to pick up. It kept thinking my "C" was an "L" and putting spaces in between letters. Also determined my "S" was an underscore. Still pretty cool. Thanks!


Looks like underscore character ("_") is used when the letter can't be determined - so in fact it had no idea what your "S" was ^_^


Interesting - I've had the Project Naptha (http://projectnaptha.com/) Chrome extension installed without really looking under the hood. Turns out it has Ocrad.js and Tesseract as two engine options - it uses them to automaticaly convert images on the page to selectable text.


Yup! And Naptha and Ocrad.js are both authored by antimatter15.


It demos well, but then I tried a simple test - a photograph of some text (http://imgur.com/TCnGlZG), Ocrad.js utterly failed at it, almost all letters were incorrect.


Love the idea of it. However, I threw some random Swedish at it, and it didn't fare too well. http://imgur.com/nZLtoj5 Kudos for the effort though!


I looked at this recently to try to pick some values off a high res png of a pdf. That was a little too ambitious for this library. It's probably good for smaller images with a few words.


It'd be nice to be able to invoke this from within PDF.js.


How does Ocrad compare to Abbyy in quality?


http://www.splitbrain.org/blog/2010-06/15-linux_ocr_software... is a very simple comparison (linked from the post).

So with a small enough test set, abbyy is infinitely better


Would this be an easy way to get OCR into an iPhone app with phone gap?

Could it operate on a live video feed?


It might be easy, but until iOS 8 is released, non-Safari JS still takes a performance hit. [1] You may want to take a look at the Tesseract library and Objective-C wrapper. [2]

[1] http://9to5mac.com/2014/06/03/ios-8-webkit-changes-finally-a... [2] https://github.com/ldiqual/tesseract-ios

edit: Looking closer at this lib, impressive. Might give it a go.


Doing anything to a live video feed in JS on a phone probably won't be very feasible except for extremely low resolution video.


This is hot. Nice work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: