"""try to read bad handwriting from records from nuclear weapons plants. It's a ...

superhuzza · on March 27, 2019

I would be very, very hesitant to to this.

First of all, you definitely don't want any chance of incorrect recognition of handwriting coming from a nuclear weapons plant.

Secondly, who knows what policies he might violate by using some unauthorized, untested software (whether OP is the author or not) with potentially sensitive information.

ratsimihah · on March 27, 2019

    First of all, you definitely don't want any chance of incorrect recognition of handwriting coming from a nuclear weapons plant.

Loving this, this sounds like the plot of a bad sci-fi movie.

debatem1 · on March 27, 2019

It reminds me of the start of "Brazil", bugs and all.

billwear · on March 27, 2019

Sounds like a plot for a really decent sci-fi book with an AI as the antagonist.

fabiomaia · on March 27, 2019

He can use it such that the model's predictions are still manually verified by him before actually submitting them or whatever. Seems pretty harmless to me.

throwaway2019Z · on March 27, 2019

In his current situation, if he zones out, he produces nothing. In the situation where he's aided by the potentially faulty described system, if he zones out, he produces erroneously transcribed data.

fabiomaia · on March 27, 2019

What? In his current situation, if he zones out, he can produce erroneously transcribed data too. What's your point?

throwaway2019Z · on March 27, 2019

What? In his current situation, even if he doesn't zone out, he can still produce erroneously transcribed data too. Why even go to work?

Retric · on March 27, 2019

Manual verification is likely just as tedious as doing it by hand in the first place.

fabiomaia · on March 27, 2019

Probably. But maybe he'll get a kick out of modeling and automating it. Challenging his intellect seems to be the key point here.

sergiosgc · on March 27, 2019

I'd first go for a pareto approach. There's probably some easy automation that can aid the manual work, which is easy to implement and provides visible performance gains. Stuff like pre selecting interesting image parts (aim for high false positive rates and zero false negatives) or even just an interface for speeding up dull manual work like pulling documents from the repository, associating relevant documents and presenting them in a fast acting UI.

fma · on March 27, 2019

I've done something similar, automating a similar process. Essentially the screen was split on half. The left was the actual image, the right was the data of what the program thinks it is after OCR and some magic interpretation. The user can scroll around and both panes would move together. User clicks accepts on each value. Otherwise user can edit each value, or add a new value (if it was completely missed) and then accept. Once everything is accepted they can save the data.

On paper it saved a lot of man hours but it was never fully rolled out. The project didn't have a long lifespan and we had enough people sitting around doing nothing to just manually do all the work. But I got a nice award for it LOL.

digitalsushi · on March 27, 2019

one could make a similar argument that no one should have tried making a car that could drive better than a human

avar · on March 27, 2019

This is more like some guy in the early 1900s having the job of hand-delivering important information on a horse.

Then deciding on his own on the advice of HN that he should try a car for that task instead, the car breaking down, and the guy getting fired because his employer never approved of this whole "car" thing and he shouldn't have introduced that into his job workflow without having talked to his employer about it.

eightorbit · on March 27, 2019

I will take a look at it, thanks. I have tried some off the shelf OCR things and had little success. But what I do is really needle in the haystack kind of thing. I am looking for a "U" or and "Sr" in a particular place and happy to find one in a thousand pdf pages.

I am looking into auto-scrolling pdfs and batch loading, sequencing of pdfs to make things easier. But I will definitely check out fast.ai - thanks!

indy7500 · on March 27, 2019

Only problem is I don't know how likely it would be that the writing is neatly separated by character for mnist to work, and handwriting recognition isn't accurate enough. Maybe some restraints on the inputs will fix that

taneq · on March 28, 2019

Regretfully pragmatic viewpoint here: The poster is currently relying on this menial job for income. Automating it before he has something else lined up could be a really bad idea, financially.

sonofaragorn · on March 27, 2019

This is a great idea. If you manage to pull this off you'd gain extremely valuable experience that would allow you to easily transition later on.