Author here. Seeing how difficult it was to get a reliable OCR transcription wit...

jandrese · 2024-05-31T17:42:44 1717177364

There is nothing pristine about images transmitted over Fax. It's such a grotty old technology with loads of aliasing issues. A modern cell phone picture of a word screen full of hex would almost certainly be easier to OCR.

Symbiote · 2024-05-31T16:24:47 1717172687

Did you try a search-and-replace in Word, changing the problem characters to something else?

e.g.

  0123456789ABCDEF
  012345M7XPAVKHEF

remram · 2024-06-01T03:10:59 1717211459

That's brilliant! You could even expand each input character to multiple characters to build an error correcting code.

tfvlrue · 2024-05-31T17:38:43 1717177123

Nope! That's a good idea though.

The transcription errors I was getting were not consistent. Like, D would be O or 0 or D, with no apparent rhyme or reason to it. And the turnaround time on each fax attempt was long enough that I focused on doing the image recognition myself instead.

1-more · 2024-05-31T19:57:22 1717185442

This was a phenomenal effort and such a joy to read. Based on how much work this was, these were probably some very important sound files that mean a lot to someone in your family, so thanks for your hard work getting them off the laptop.

My goofy idea was using the font OCR-A but you'd be very lucky if that Mac came with that.

https://en.wikipedia.org/wiki/OCR-A

adastra22 · 2024-05-31T16:32:38 1717173158

Why not display the info as a series of QR images? There probably wasn’t a dev environment on the laptop though.

For the record, you’d have had no problem mounting an image of the HFS disk on any modern Linux or macOS system.

kelnos · 2024-05-31T20:06:58 1717186018

Well, mounting the disk itself. If it was simple to get an image of the disk, the author could have used the same method to just get the files they wanted.

adastra22 · 2024-05-31T20:20:52 1717186852

There are many different SCSI to USB cables out there, for exactly this purpose. Even the weird mini-SCSI interface used by Apple in the 90’s.

ComplexSystems · 2024-06-01T19:24:15 1717269855

Couldn't you just do a bunch of different faxes, perhaps in different fonts or different font sizes, which would lead to different randomly distributed errors? Then you can do OCR for all of them, and just take the median of the result, and get exponentially less error.

Maxion · 2024-06-01T07:24:46 1717226686

Did you try seeing how well ChatGPT was at OCRing the images? Though since it is HEX characters it might not do so good. I've found it to be very reliable at OCRin e.g. photos of receipts.