Hacker News new | past | comments | ask | show | jobs | submit | hugodutka's comments login

I used this approach extensively over the past couple of months with GPT-4 and GPT-4o while building https://hotseatai.com. Two things that helped me:

1. Prompt with examples. I included an example image with an example transcription as part of the prompt. This made GPT make fewer mistakes and improved output accuracy.

2. Confidence score. I extracted the embedded text from the PDF and compared the frequency of character triples in the source text and GPT’s output. If there was a significant difference (less than 90% overlap) I would log a warning. This helped detect cases when GPT omitted entire paragraphs of text.


One option we've been testing is the 'maintainFormat` mode. This tries to return the markdown in a consistent format by passing the output of a prior page in as additional context for the next page. Especially useful if you've got tables that span pages. The flow is pretty much:

- Request #1 => page_1_image

- Request #2 => page_1_markdown + page_2_image

- Request #3 => page_2_markdown + page_3_image


>frequency of character triples

What are character triples? Are they trigrams?


I think so. I'd normalize the text first: lowercase it and remove all non-alphanumeric characters. E.g for the phrase "What now?" I'd create these trigrams: wha, hat, atn, tno, now.


> I extracted the embedded text from the PDF

What did you use to extract the embedded text during this step? Other than some other OCR tech


PyMuPDF, a PDF library for Python.


A different approach from vanilla OCR/parsing seems to be this mixed ColPali integrating a purposed small vision models and a ColBERT type indexing for retrieval. So - if search is the intended use case - it can skip the whole OCR step entirely.

[1] https://huggingface.co/blog/manu/colpali


It detects if a message contains the ”Final Answer” substring preceded by a specific emoji. The emoji is there to make the substring relatively unique.


You're right that sections reference each other, and sometimes reference other regulations. By creating the "plan for the junior lawyer", the LLM can reference multiple related sections at the same time. In the second step of the example plan in the post there's a reference to "Articles 8-15", meaning 7 articles that should be analyzed together.

The system is indeed limited in the way that it cannot reference other regulations. We've heard it's a problem from users too.


One of the applications are ZK-Rollups [1] which allow developers to move heavy computation off a blockchain. The blockchain receives the results and only validates proofs that they are valid. This is especially useful on Ethereum because its computational throughput is pretty low.

There's also ZCash [2], which is a cryptocurrency that lets you make untraceable transactions. This is in stark contrast to Bitcoin or Ethereum, where transaction information is available publicly to everyone. They have a series of blog posts [3] on the math that actually makes it work under the hood.

[1] https://ethereum.org/en/developers/docs/scaling/zk-rollups/

[2] https://z.cash

[3] https://electriccoin.co/blog/snark-explain/


We've been using https://github.com/electric-sql/electric for real-time sync for the past month or so and it's been great. Rather than make you think about CRDTs explicitly, Electric syncs an in-browser sqlite db (WASM powered) with a central postgres instance. As a developer, you get local-first performance and real-time sync between users. And it's actually faster to ship an application without writing any APIs and just using the database directly. Only downside is Electric is immature and we often run into bugs, but as a startup we're willing to deal with it in exchange for shipping faster.


I've been wondering how well Electric's been working for people ever since I heard about it; good to hear that it's been useful for you.

Couple of questions:

- How big is the WASM blob that you need to ship for in-browser SQLite? Have you had any noticable issues from shipping a large payload to the browser?

- What are you using to persist the SQLite database on clients? Have you been using the Origin Private File System?


This is the WASM blob and it's 1.1 MB uncompressed. https://github.com/rhashimoto/wa-sqlite/blob/master/dist/wa-.... No issues - it's cached by cloudflare.

We're using IndexedDB. Here's a writeup on alternatives https://github.com/rhashimoto/wa-sqlite/issues/85 and a benchmark https://rhashimoto.github.io/wa-sqlite/demo/benchmarks.html


Gotcha, interesting. 1.1 MB isn't too bad, especially with Cloudflare providing a local PoP. And if this is for Hocus, I'm guessing your frontend isn't used much on mobile devices with iffy connections.

That writeup on different SQLite VFS's for in-browser use is helpful, thanks for linking that.


How do you handle migrations?


Every postgres migration is done through an Electric proxy and it converts it into a corresponding sqlite migration that it can apply later on the client. In case of a migration that would be somehow breaking you can also drop the client-side sqlite database and resync state from postgres.


The docs say Electric propagates migrations (DDL) on Postgres synced tables to their “satellite” clients


What kinds of bugs have you run into? Any large-scale corruption of data at rest?


We have run into queries that corrupted the database client-side, but fortunately that doesn't propagate into postgres itself. In that case we had to drop the client-side db and resync from a clean state.

The corruption was also caught by sqlite itself - it threw a "malformed disk image" error and stopped responding to any further queries.

Also bugs around syncing some kinds of data - one bug that's already been fixed was that floats without decimal points would not get synced. https://github.com/electric-sql/electric/issues/506

In general electric's team is very responsive and fixes bugs when we bring them up.


SQLite had 2 bugs[1] where batch atomic writes would corrupt your DB if you used IndexedDB to back your VFS. It has been patched in SQLite so rolling a new electric release that pulls in the latest SQLite build should fix that.

[1] - https://github.com/vlcn-io/js/issues/31#issuecomment-1785296...


Any idea on what the root cause of the sqlite corruption was? There's some discussion on the SQLite forums about corruption with wasm (I've encountered it myself on a personal project), but from what I understand no one has identified a cause yet.


How do you deal with shapes and permissions not being available yet?


There's a workaround - if a table has an "electric_user_id" column then a user with that id (based on their JWT) can only read rows which have the same id. It's basic but it works for us. https://electric-sql.com/docs/reference/roadmap#shapes


How are conflicts resolved?


With something called Rich-CRDTs - they were invented by Electric's CTO. They have a section in the docs and some blog posts dedicated to it: https://electric-sql.com/docs/reference/consistency#rich-crd...


It looks like HN automatically stripped the reference to the comment I originally linked to: https://github.com/rui314/mold/issues/190#issuecomment-14028.... The title should be more clear in this context.


Oh yes, in this post I was not trying to. Hocus gives you a web interface that lets you spin up a dev env with a single click of a button. We also implemented a git-integrated CI system that prebuilds your dev env on new commits. It’s basically a self-hosted Gitpod or GitHub Codespaces.


Nix solves a different problem than Hocus. Nix lets you define a development environment, Hocus gives you a way to run it on a remote server. Right now we use Dockerfiles to let users define the packages they need in their dev env, but we would like to support Nix in the future too. Interestingly, you can use custom BuildKit syntax https://docs.docker.com/build/dockerfile/frontend/ to build Nix environments with Docker https://github.com/reproducible-containers/buildkit-nix, and that's probably what we will end up supporting.


From what I can tell, devenv [1] and devbox [2] (both built on Nix) can also deploy to remote servers.

[1] https://devenv.sh

[2] https://www.jetpack.io/devbox


There are a lot of other ways to deploy to remote servers with Nix. Many of them are NixOS-based, but some don't require NixOS at all.

Whatever way you're using Nix for developer environments, you can reuse most of that work to define a package suitable for remote deployment.


I think Nix is relevant here, because being able to run software across different machines reproducibly is one of its major selling point. I particularly like that it doesn't rely on virtualization or containerization to do that. It's up to the user to decide how to isolate the runtime environment from the host or whether they even should. Alternatively, tools building upon Nix can make that decision for them. Either way, it allows for a more flexible approach when you have to weigh the pros and cons of different isolation strategies. Development environments defined by Nix tend to compose well too, as a result of this design.


see this is the problem w/ all these devtools - i need to pair together 5 different things when i just want a reproducible, ephemeral environment

someone needs to bring a heroku-like experience but for cloud-native development


That's the mission we're on at Argonaut. I'd love to know more about how you think about it if you're up for a chat.


What are you trying to offer above and beyond GitHub codespaces?


I didn't want to go into all the technical details, but we have another write-up that goes into details about RAM management: https://github.com/hocus-dev/hocus/blob/main/rfd/0003-worksp...

Other than making sure we release unused memory to the host, we didn't customize QEMU that much. Although we do have a cool layered storage solution - basically a faster alternative to QCOW2 that's also VMM independent. It's called overlaybd, and was created and implemented in Alibaba. That will probably be another blog post. https://github.com/containerd/overlaybd


> I didn't want to go into all the technical details

HN is here for the technical details ;)


that should be the HN motto!


Thirded.. Ed.


We do, and we'd love to use it in the future. We've found that it's not ready for prime time yet and it's missing some features. The biggest problem was that it does not support discard operations yet. Here's a short writeup we did about VMMs that we considered: https://github.com/hocus-dev/hocus/blob/main/rfd/0002-worksp...


Thanks for the link to the elaboration! FYI footnotes 3 and 4 seem to be swapped.


> footnotes 3 and 4 seem to be swapped

Maybe they are async footnotes and there is a race condition. /s


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: