This is great work.
In the same light is anyone aware of ip67 cameras with integrated lens? I see someone posted the opencv camera above, but I’d be interested in more options.
The openmv h7 and arducam boards have tempted me, but getting housing made for them is difficult.
I do agree with you on the power dynamics that may plague some workplaces today.
It feels as though there is a greater burden or accountability placed on the lower-level employees more than the c-suite.
What do you think if we speak to the value of a code of ethics separately from the unfortunate power dynamics some companies have chosen?
Did you come across any documentation that you are able to share while building yours? I'm interested in building one without relying on DL methods and comparing.
I just made it up, because I was unhappy with some of the seemingly obvious and strangly random errors by the ML that I couldn't sort out (and the size of model data etc), and it worked out way, way better than expected.
I tend to think some problems attacked by ML are possibly also directly solveable if the problem is understood in sufficient detail - sometimes it seems like, "we have a problem we don't fully understand (or can explain), so we will just train a model by example".
Obviously sometimes it's just depth of indirection and degrees of freedom beyond comprehension, but there still comes the problem of explainability if you have a regulator like banking, or medicine etc - how do you know and can explain exactly what it's doing or when/how it might mislead you on certain edge cases.
Anyhoo, in this case I knew it would work, but accuracy was well up into my best hoped expecations on first pass.
Basic recipe:
1) Convert PDF (to .tiff or internally to numpy), at least 300 DPI if you can.
1a) Gaussian blur and threshold as you wish, not always needed, then make an inverse (white on black) copy for contours and bounding rectangle finding.
2) Use opencv to find your bounding rects for your characters, sort them for order as the contours will not be in reading order.
3) Do a dummy run and use the bounding rects to excise out individual characters and write standardised images to disk. Then select your best for each character variant and rename them A.png, B.png etc. (You can use Tesseract to help here to save time and then hand fix any errors, but it's just a once off use). For maximum speed you could stash these on a ramdisk, but if you have enough RAM I am guessing the system will chache them anyway so maybe no point, I'm on PCIE 4.0 with 6Gbs disk reads, so don't really care.
4) When you want to OCR, just brute force it by getting your characters into individual standardised images in turn, resizing them to match each reference bitmap exactly as you cycle thru the checks (otherwise XOR fails). Tip seems best to always re-size the wider one to be size of the narrower one for better accuracy. You can also discard at this stage for obvious no matches based on dimensions, eg the diff between a I and M or W is obvious based on aspect ratio, so you can skip a few XORs if you want.
5) Pixelwise XOR the test OCR char against each ref, if it's a match you should get left with just a thin outline where the two don't exactly match.
6) Each time count the number of black pixels (or white depending if you were inverted or not) for the XOR result.
7) Lowest number of pixels is your match, simplistically, but see down for refinements.
8) To sort out B vs 8 and I vs 1 etc, you can go two ways:
8a) Store multiple B's and 8's as references (eg B_0, B_1 etc) and average, and/or look for lowest residual of the XORed pixel count across multiple minor variants of the same reference character, sort of a half arsed decision forest. Especially if you have scanned docs this can work well.
8b) For the selected problem cases you can also do an extra sub-check - eg you get and 8 or a B, do an extra check. Focus on a relevant sub region of the characters, eg for 8 and B the middle band (row) of plus/minus 10% of the centre is the salient difference zone, excise this region and run XOR again against each likely reference, the diff in XOR pixel count will be magnified (as a ratio) and you will be able to select this way.
When you do the bounding rects for the characters you will likely pick up any residual noise, punction etc, especially if scanned docs - you can easily split these out for discard or special attention by testing against volume (simple w * h or contour volume), and/or any disproportionate ratios of w and h, or even position relative to other characters.
It works surprisingly well on different fonts not the actual reference font, to a point, obviously YMMV, but a bit of dilate and erode all around helps this.
One advantage is that it scales automatically for different font sizes, so on an engineering drawing where it's typically all the same font on the sheet and the entire drawing set I can grab title block info, notes etc all for no more effort.
You can "train" specifically for special characters or fonts as needed, it will do multiple fonts in one go, best match wins, it will just slow down a little as there will be checking of more possibilitities, for each character.
If you get the right, or near right font, you can generate a checking print by writing over the top of your OCR'ed text in say red (or yellow on light blue, etc) as you go, and then you can check for any errors on a page at a near glance. Or XOR back over the top of original with found, blur, then threshold.
If you are scanning text of known form you can obviously use some regex, or in my case certain things should be fixed number of chars or unique, so I can check for missing characters or double ups etc to flag. But typically I have found there are just a few specific problem confusions and the can be addressed as above as needed and then it's near 100%.
So that's it, maybe you can come up with some more improvemets or automation of the setup.
I was originally going to look at varying the blur of reference chars, or minor variations to aspect ratios, or positioning in the character frame, to be applied as multiple runs to try and run a sort of sensitiviy analysis to home in on the best matches or problem areas, but found even with scanned documents I had no need, so never bothered.
While dreaming intially I also considered some sort of algebra/trignometry/stats on the contours for problem characters - bother vector angle and volume enclosed, but once again had no need. But things like look at ratios of types of contour vectors for a single character, eg straight vs curved (a C is all curved, a T is almost all straight) by comparing to adjacent contour vectors, plus also ratios of volume to circumferance, but once again no need in the end.
Finally, could also use the opencv blob matching, I never got that far in OCR use as I have yet to need, but I do use it for finding regions of inerest as I am doing engineering diagrams - eg P+IDs which are highly coded symbolically and I am mostly interested in symbols, then tags enclosed (alpha + numeric chars) and modififying characters next to symbols.
It sounds like a little work when I write it down now, but really using Python at a semi experienced level, and not really having used opencv much before, it was maybe a lazy weekend and a few nights to have something pretty solid for my own use - but still too shameful code quality to publish.
Luckily the only code review involved will be a self review between me and the dog...
Once set up for a standard font, or generic fonts, it's done, so the ROI goes up with use/time. The only other thing I might do yet is to try and integrate it with right click for a selected window area, into the clipboard, but not sure if it is worth it compared to other already available options.
This is amazing - thank you so much for sharing this, people like you who go to great depths for random internet users make this world a much better place.
I'm going to attempt to use this methodology on a pet-project for components on PCBs.
The openmv h7 and arducam boards have tempted me, but getting housing made for them is difficult.