Hacker News new | past | comments | ask | show | jobs | submit login

If you're hopping between these products instead of learning and understanding how inference works under the hood, and familiarizing yourself with the leading open source projects (i.e. llama.cpp), you are doing yourself a great disservice.



I know how training and inference works under the hood, I know the activation functions and backprop and MMUL, and I know some real applications I really want to build. But there's still plenty of room in the gap between that LM studio helps fill. I also already have software built around the openai api, and the lmstudio openai api emulator is hard to beat for convenience. But if you can outline a process I could follow (or link good literature) to shift towards running LLMs locally with FOSS but still interact with them through an API, I'll absolutely give it a try.


"hopping between these products instead of learning and understanding" was intended to exclude people who already know how they work, because I think it is totally fine to use them if you know exactly what all the current knobs and levers do.


Have you tried Jan? https://github.com/janhq/jan


Fantastic, thank you.


Why would someone expect interacting with a local LLM to teach anything about inference?

Interacting with a local LLM develops one's intuitions about how LLMs work, what they're good for (appropriately scaled to model size) and how they break, and gives you ideas about how to use them as a tool in a bigger applications without getting bogged down in API billing etc.


Assuming s/would/wouldn't: If you are super smart then perhaps you can intuit details about how they work under the hood. Otherwise you are working with a mental model that is likely to be much more faulty than the one you would develop by learning through study.


Knowing the specific multiplies and QKV and how attention works doesn't develop your intuition for how LLMs work. Knowing that the effective output is a list of tokens with associated probabilites is of marginal use. Knowing about rotary position embeddings, temperature, batching, beam search, different techniques for preventing repetition and so on doesn't really develop intuition about behavior, but rather improve the worst cases - babbling repeating nonsense in the absolute worst - but you wouldn't know that at all from first principles without playing with the things.

The truth is that the inference implementation is more like a VM, and the interesting thing is the model, the set of learned weights. It's like a program being executed one token at a time. How that program behaves is the interesting thing. How it degrades. What circumstances it behaves really well in, and its failure modes. That's the thing where you want to be able to switch and swap a dozen models around and get a feel for things, have forking conversations, etc. It's what LM Studio is decent at.


But those things are all so cool though. Like... how could you not want to learn about them.

Seriously though, I guess I'm just kind of uncomfortable with "treating inference implementation like a VM" as you put it. It seems like a bad idea. We are turning implementation details into user interfaces in a space that is undergoing such rapid and extreme change. Like people spent a lot of time learning the stable diffusion web ui, and then flux came out and upended the whole space. But maybe foundational knowledge isn't as valuable as I'm thinking and its fine that people just re-learn whatever UIs emerge, I don't know.


You can also learn how a user will approach prompting.


Why


It's not that high a bar, and we're still very much publication to implementation. Most recently, I was able to use SAM2, SV3D, Mistral NeMo, and Flux.dev day-one, and I'm certainly not some heady software engineer.

There's just a lot of great stuff you're missing out on if you're waiting on products while ignoring the very accessible, freely available tools they're built on top of and often reductions of.

I'm not against overlays like ollama and lm studio, but I feel more confused by why they exist when there's no additional barrier to going on huggingface or using kcpp, ooba, etc.

I just assume it's an awareness issue, but I'm probably wrong.


While it is most proper and convenient to use these out-of-the-box products for fit scenarios,

Doing so will at the very least not help us with our interviews. It will also restrict our mindset of how one can make use of LLMs through the distraction of sleek, heavily abstracted interfaces. This makes it harder, if not impossible for us to come up with bright new ideas that undermine models in various novel ways, which are almost always derived from deep understanding of how things actually work under the hood.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: