Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my case I had hundreds of invoices in a not-very-consistent PDF format which I had contemporaneously tracked in spreadsheets. After data extraction (pdftotext + OpenAI API), I cross-checked against the spreadsheets, and for any discrepancies I reviewed the original PDFs and old bank statements.

The main issue I had was it was surprisingly hard to get the model to consistently strip commas from dollar values, which broke the csv output I asked for. I gave up on prompt engineering it to perfection, and just looped around it with a regex check.

Otherwise, accuracy was extremely good and it surfaced a few errors in my spreadsheets over the years.



I hope there is a future where csv comma's don't screw up data. I know it will never happen but it's a nightmare.

Everyone has a story of a csv formatting nightmare




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: