I fought with Tesseract for quite a while. Its good if high accuracy doesn't matter. Transcribing a book from clean, consistent non-skewed data its fine and an LLM might even be able to clean it up. But for legal or accounting data from hand scanned documents, the error rate made it untenable. Even clean, scanned documents of the same category have all sorts of density and skew anomalies that get misinterpreted. You'll pull your hair out trying to account for edge cases and never get the results you need even with numerous adjustments and model retraining on errors.
Flash 2.5 or 3 with thinking gave the best results.
Thanks. I was surprised that Tesseract had recognized poorly scanned magazines and with some Python library I was able to transcribe two-columns layout with almost no errors.
Tesseract is a cheap solution as it doesn’t touch any LLM.
For invoices, Gemini flash is really good, for sure, and you receive “sorted” data as well. So definitely thumbs up. I use it for transcription of difficult magazine layout.
I think that for such legally problematic usage as companies don’t like to share financial data with Google, it is be better to use a local model.
Figma has become absolutely shocking in the past few years. The performance is so bad these days. It doesn’t help that almost every designer doesn’t care to split things into more than one document. I’ve seen Figma documents with hundreds of screens.
> It doesn’t help that almost every designer doesn’t care to split things into more than one document
That’s how these tools encourage you to use them. If the tool crumbles under its own usage modalities, that’s because it’s poorly designed, not the user’s fault.
You don't need to split into multiple files to make large documents manageable, multiple pages works just fine (pages you're not using aren't loaded). But even still, I have absolutely massive pages with ~100 screens on them that work just fine on this base-tier M2 MBA.
Honestly given the complexity of the screens involved I feel Figma's performance is pretty reasonable. (Now, library publish and update - that's still unreasonably slow IMO)
Zed sounds desperate, nobody in the world would use Zed as office. Zed is a niche. Nobody wants it's multiplayer features since coding id usually asynchronous until crunch happens and nobody wants crunch All the time.
All office have their own communication tools , you can't expect all developers just to use zed l, it is missing features and the whole ecosystem which would take one more decade and non development people won't even touch it. Then the office cannot happen inside Zed for the rest of the world, except Zed team.
reply