Hacker Newsnew | past | comments | ask | show | jobs | submit | v3ss0n's commentslogin

How this is better over Surya/Marker or kreuzberg https://github.com/kreuzberg-dev/kreuzberg.

Sounds like someone needs to run their own test cases and report back on which solution does a better job...

Let me fire up Claude code.

Let me fire up Tesseract.

https://github.com/tesseract-ocr


I fought with Tesseract for quite a while. Its good if high accuracy doesn't matter. Transcribing a book from clean, consistent non-skewed data its fine and an LLM might even be able to clean it up. But for legal or accounting data from hand scanned documents, the error rate made it untenable. Even clean, scanned documents of the same category have all sorts of density and skew anomalies that get misinterpreted. You'll pull your hair out trying to account for edge cases and never get the results you need even with numerous adjustments and model retraining on errors.

Flash 2.5 or 3 with thinking gave the best results.


Thanks. I was surprised that Tesseract had recognized poorly scanned magazines and with some Python library I was able to transcribe two-columns layout with almost no errors.

Tesseract is a cheap solution as it doesn’t touch any LLM.

For invoices, Gemini flash is really good, for sure, and you receive “sorted” data as well. So definitely thumbs up. I use it for transcription of difficult magazine layout.

I think that for such legally problematic usage as companies don’t like to share financial data with Google, it is be better to use a local model.

Ollama or HuggingFace has a lot of them.


Surya is a lot better in that.

Would be very bad for LLMs , `fn` is bad , braces are bad and it won't be useful to do anything for a long time . So all the LLMs will pass.

What i tested happned 5 months ago. if the issue exist 1 month ago too it is the same problem.

The problem lies with the whole thing is XML and SVG unlike Figma's Canvas/WebASM . The whole thing is unable to scale.


They are actually working on a new canvas-based rendering engine in order to get away from using the DOM so that should help performance quite a bit.

https://community.penpot.app/t/its-time-for-penpot-to-almost...


that gonna be huge effort , looking forward to that.


Unstable, very crash prone with just a few users designing 10 plus pages. And a huge memory hog too.

I run it on Dedicated server with 64GB Ram , it starts to lag as soon as a 5-6 pages and memory 20GB, lagging out the whole team and then crashes.


Figma is a huge memory hog, too...


Figma has become absolutely shocking in the past few years. The performance is so bad these days. It doesn’t help that almost every designer doesn’t care to split things into more than one document. I’ve seen Figma documents with hundreds of screens.


> It doesn’t help that almost every designer doesn’t care to split things into more than one document

That’s how these tools encourage you to use them. If the tool crumbles under its own usage modalities, that’s because it’s poorly designed, not the user’s fault.


You don't need to split into multiple files to make large documents manageable, multiple pages works just fine (pages you're not using aren't loaded). But even still, I have absolutely massive pages with ~100 screens on them that work just fine on this base-tier M2 MBA.

Honestly given the complexity of the screens involved I feel Figma's performance is pretty reasonable. (Now, library publish and update - that's still unreasonably slow IMO)


I'm sure if the original developer bothered to show up again he could fix it in a weekend.


Figma can handle unlimited amount of screens in one huge canvas


> very crash prone

> And a huge memory hog

On the server side or the frontend side?


What we tested is both. It start to choke on Browser and then server side.


Zed sounds desperate, nobody in the world would use Zed as office. Zed is a niche. Nobody wants it's multiplayer features since coding id usually asynchronous until crunch happens and nobody wants crunch All the time. All office have their own communication tools , you can't expect all developers just to use zed l, it is missing features and the whole ecosystem which would take one more decade and non development people won't even touch it. Then the office cannot happen inside Zed for the rest of the world, except Zed team.


Or Fishy?


Bruno is quite good with postman compatibility and it's own syntax


Not better at all, we don't need flashy UI, we need better content focus


Yeah definitely AI


Check Litestart if you want a Fullstack Async Experience for HTMX. https://litestar.dev/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: