Regarding context reduction. This got me wondering. If I use my own API key, there is no way for the IDE or coplilot provider to benefit other than monthly sub. But if I am using their provided model with tokens from the monthly subscription, they are incentivized to charge me based on tokens I submit to them, but then optimize that and pass on a smaller request to the LLM and get more margin. Is that what you are referring to?
Yup. but also there was good reason to do this. Models work better with smaller context. Which is why I rely on Gemini for this lazy/inefficient workflow of mine.
As a youngster in the 70s (before lead sas a known carcinogen) I had a piece of lead in my lego bag (no idea how it came to be there). I thought that soft metal was neat and bit it more than once. Fortunately I still have all my hair though I know a woman who would say 'that explains a lot.
Yes, i have. The problem with using just an LLM is that while it reads and understands text, but it cannot reproduce it accurately. Additionaly the textbooks I've mentioned have many diagrams and illustrations in them (e.g. books on anatomy or biochemistry). I don't really care about extracting text from them, I just need them extracted as images alongside the text, and no LLM does that.
How about building a tool which indexes ocr chunks / tokens and a confidence grading. Setting a tolerance level and defining actions where the token or chunk (s) fall below that level. Actions could include could include automated verification using another model or last resort human.
Text from diagrams can be useful in LLMs. For example an LLM can understand a flow charts decision making shapes etc, but without text it could misinterpret information. I process a bunch of PDFs including procedures. Diagrams are concerted to code. The text helps in many cases.
1. Work for megacorp
2. Megacorp CEOs gloat about forthcoming mass firings of engineers
3. Pay taxes as always
4. Taxes used to fund megacorp (stargate)
5. Megacorp fires me.
reply