Hacker News new | past | comments | ask | show | jobs | submit | thelittleone's comments login

Regarding context reduction. This got me wondering. If I use my own API key, there is no way for the IDE or coplilot provider to benefit other than monthly sub. But if I am using their provided model with tokens from the monthly subscription, they are incentivized to charge me based on tokens I submit to them, but then optimize that and pass on a smaller request to the LLM and get more margin. Is that what you are referring to?

Yup. but also there was good reason to do this. Models work better with smaller context. Which is why I rely on Gemini for this lazy/inefficient workflow of mine.

As a youngster in the 70s (before lead sas a known carcinogen) I had a piece of lead in my lego bag (no idea how it came to be there). I thought that soft metal was neat and bit it more than once. Fortunately I still have all my hair though I know a woman who would say 'that explains a lot.

Very cool. Though might want to increase contrast on diagrams, for example here https://pico.sh/tuns


Have you had a chance to compare results from MinerU vs LLM such a Gemini 2.0 or anthropic's native PDF tool?


Yes, i have. The problem with using just an LLM is that while it reads and understands text, but it cannot reproduce it accurately. Additionaly the textbooks I've mentioned have many diagrams and illustrations in them (e.g. books on anatomy or biochemistry). I don't really care about extracting text from them, I just need them extracted as images alongside the text, and no LLM does that.


Interesting, although would be great to see some comparative results, e.g., with and without the html alt tag approach.


How about building a tool which indexes ocr chunks / tokens and a confidence grading. Setting a tolerance level and defining actions where the token or chunk (s) fall below that level. Actions could include could include automated verification using another model or last resort human.


How would you calculate the confidence? LLMs are notoriously bad at grading their own output.


Text from diagrams can be useful in LLMs. For example an LLM can understand a flow charts decision making shapes etc, but without text it could misinterpret information. I process a bunch of PDFs including procedures. Diagrams are concerted to code. The text helps in many cases.


  Diagrams are concerted to code
That's cool. May I ask what your pipeline looks like? And what code format do you use for diagrams? Mermaid?


Could that be true and at the same time a 'vulnerability' exists that megacorp is party to?


Anthropic has a beta endpoint for PDFs which has produced impressive results for me with long and complex PDFs (tables, charts etc).



1. Work for megacorp 2. Megacorp CEOs gloat about forthcoming mass firings of engineers 3. Pay taxes as always 4. Taxes used to fund megacorp (stargate) 5. Megacorp fires me.

The bitter irony.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: