LM Studio 0.3 – Discover, download, and run local LLMs

yags · 2024-08-24T13:00:58.000000Z

Hello Hacker News, Yagil here- founder and original creator of LM Studio (now built by a team of 6!). I had the initial idea to build LM Studio after seeing the OG LLaMa weights ‘leak’ (https://github.com/meta-llama/llama/pull/73/files) and then later trying to run some TheBloke quants during the heady early days of ggerganov/llama.cpp. In my notes LM Studio was first “Napster for LLMs” which evolved later to “GarageBand for LLMs”.

What LM Studio is today is a an IDE / explorer for local LLMs, with a focus on format universality (e.g. GGUF) and data portability (you can go to file explorer and edit everything). The main aim is to give you an accessible way to work with LLMs and make them useful for your purposes.

Folks point out that the product is not open source. However I think we facilitate distribution and usage of openly available AI and empower many people to partake in it, while protecting (in my mind) the business viability of the company. LM Studio is free for personal experimentation and we ask businesses to get in touch to buy a business license.

At the end of the day LM Studio is intended to be an easy yet powerful tool for doing things with AI without giving up personal sovereignty over your data. Our computers are super capable machines, and everything that can happen locally w/o the internet, should. The app has no telemetry whatsoever (you’re welcome to monitor network connections yourself) and it can operate offline after you download or sideload some models.

0.3.0 is a huge release for us. We added (naïve) RAG, internationalization, UI themes, and set up foundations for major releases to come. Everything underneath the UI layer is now built using our SDK which is open source (Apache 2.0): https://github.com/lmstudio-ai/lmstudio.js. Check out specifics under packages/.

Cheers!

-Yagil

pcf · 2024-08-24T06:52:02.000000Z

In some brief testing, I discovered that the same models (Llama 3 7B and one more I can't remember) are running MUCH slower in LM Studio than in Ollama on my MacBook Air M1 2020.

Has anyone found the same thing, or was that a fluke and I should try LM Studio again?

viccis · 2024-08-24T16:29:38.000000Z

Just chiming in with others to help out:

By default LM Studio doesn't fully use your GPU. I have no idea why. Under the settings pane on the right, turn the slider under "GPU Offload" all the way to 100%.

pcf · 2024-08-25T13:49:07.000000Z

That froze the whole computer, and even disabled the possibility of clicking both the internal and external trackpad.

The model is Dolphin 2.9.1 Llama 3 8B Q4_0.

I set it to 100% and wrote this: "hi, which model are you?"

The reply was a slow output of these characters, a mouse cursor that barely moved, and I couldn't click on the trackpads: "G06-5(D&?=4>,.))G?7E-5)GAG+2;BEB,%F=#+="6;?";/H/01#2%4F1"!F#E<6C9+#"5E-<!CGE;>;E(74F=')FE2=HC7#B87!#/C?!?,?-%-09."92G+!>E';'GAF?08<F5<:&%<831578',%9>.='"0&=6225A?.8,#8<H?.'%?)-<0&+,+D+<?0>3/;HG%-=D,+G4.C8#FE<%=4))22'*"EG-0&68</"G%(2("

Help?

cma · 2024-08-24T19:41:36.000000Z

Maybe so the web browser etc. still has some GPU without swapping from main memory? What % does it default to?

Terretta · 2024-08-24T14:07:56.000000Z

Two replies to parent immediately suggest tuning. Ironically, this release claims to feature auto-config for best performance:

“Some of us are well versed in the nitty gritty of LLM load and inference parameters. But many of us, understandably, can't be bothered. LM Studio 0.3.0 auto-configures everything based on the hardware you are running it on.”

So parent should expect it to work.

I find the same issue: using a MBP with 96GB (M2 Max with 38‑core GPU), it seems to tune by default for a base machine.

christkv · 2024-08-24T07:40:31.000000Z

Make sure you turn on the use of the GPU using the slider. By default it does not leverage the full speed.

napier · 2024-08-28T16:49:26.000000Z

Yeah, me. Even without other applications running in the background and without any models loaded, the new 0.3 UI is stuttering and running like a couch-locked crusty after too many edibles on my Macbook Air 2021, 16GB. When I finally get even a 4B model loaded, inference is glacially slow. The previous versions worked just fine (they're still available for download).

smcleod · 2024-08-24T09:00:14.000000Z

Don’t forget to tune your num_batch

smcleod · 2024-08-24T08:59:41.000000Z

Nice, it’s a solid product! It’s just a shame it’s not open source and its license doesn’t permit work use.

yags · 2024-08-24T14:41:35.000000Z

Thanks! We actually totally permit work use. See https://lmstudio.ai/enterprise.html

jdboyd · 2024-08-24T18:19:15.000000Z

An email us link is a bit discouragement for using it work purposes. I want a clearly defined price list, at least for some entry levels of commercial use.

fragmede · 2024-08-24T18:26:04.000000Z

Or even just a ballpark. Are we talking $500, $5,000, $50,000 or $500,000?

e-clinton · 2024-08-26T01:12:30.000000Z

This. When companies don’t list prices, it automatically gives me a “they want to rip you off” vibe. Put in the effort and define enterprise pricing. If you later find that it isn’t right, change it.

smcleod · 2024-08-24T22:25:53.000000Z

Thanks, what license is it under? This means that anyone that wants to try it at work has to fill that out though right?

mythz · 2024-08-24T16:36:16.000000Z

Originally started out with LM Studio which was pretty nice but ended up switching to Ollama since I only want to use 1 app to manage all the large model downloads and there are many more tools and plugins that integrate with Ollama, e.g. in IDEs and text editors

xeromal · 2024-08-24T12:55:41.000000Z

I never could get anything local working a few years ago and someone on reddit told me about LM Studio and I finally managed to "run an AI" on my machine. Really cool and now I'm tinkering with it using the built in HTTP server

pornlover · 2024-08-24T08:00:06.000000Z

LM Studio is great, although I wish recommended prompts were part of the data of each LLM. I probably just don't know enough but I feel like I get hunk of magic data and then I'm mostly on my own.

Similarly with images, LLMs and ML in general feel like DOS and config.sys and autoexec.bat and qemm days.

TeMPOraL · 2024-08-24T09:02:10.000000Z

Does anyone know if there's a changelog/release notes available for all historical versions of this? This is one of those programs with the annoying habit to surface only the list of changes in the most recent version, and their release cadence is such that there are some 3 to 5 updates between the times I run, and then I have no idea what changed.

flear · 2024-08-24T20:52:54.000000Z

Same. I found their Discord announcement Channel [1] and they may have started to use their blog for a full version changelog [2]

[1] https://discord.gg/aPQfnNkxGC [2] https://lmstudio.ai/blog

swalsh · 2024-08-24T11:45:40.000000Z

I LOVE LM studio, it's super convenient for testing model capabilities, and the OpenAI server makes it really easy to spin up a server and test. My typical process is to load it up in LM studio, test it, and when I'm happy with the settings, move to vllm.

qwertox · 2024-08-24T08:44:00.000000Z

Yesterday I wanted to find a conversation snippet in ChatGPT of a conversation I had maybe 1 or 2 weeks ago. Searching for a single keyword would have been enough to find it.

How is it possible that there's still no way to search through your conversations?

xyc · 2024-08-26T04:44:32.000000Z

Check out https://recurse.chat (I'm the dev). You can import ChatGPT messages. It has almost instant full text search over thousands of chat sessions. Also supports llama.cpp, local embedding / RAG, and most recently bookmarks and nested folders.

code51 · 2024-08-24T09:00:09.000000Z

For Mac and iOS, you can install ChatGPT app.

Why they won't enable search for their main web user crowd is beyond me.

Perhaps they are just afraid of scale. With all their might, it's still possible that they can't estimate the scale and complexity of queries they might receive.

xyc · 2024-08-26T04:49:53.000000Z

A user's personal data really does not have that much scale. Worst case they can cache everything locally. I've imported thousands of chat sessions into a local AI chat app's database, total storage is under 30MB. Full text search (with highlights and all) is almost instant.

nilsherzig · 2024-08-24T11:32:34.000000Z

They did staged rollouts for almost every recent feature.

I think it might be in their interest if you just ask the LLM again? Old answers might not be up to their current standards and they don't gain feedback from you looking at old answers

BaculumMeumEst · 2024-08-24T13:12:04.000000Z

There are lots of ways to search through your conversations, just not through OpenAI's web interface. If you don't want to explore alternatives because you don't want to lose access to your conversations, I would argue you've just demonstrated to yourself why you should avoid proactively avoid vendor lock-in.

Jedd · 2024-08-24T08:46:36.000000Z

Are you complaining about OpenAI's ChatGPT's web UI interface?

potatoman22 · 2024-08-24T15:32:19.000000Z

Try exporting your data and searching the JSON/HTML.

mark_l_watson · 2024-08-24T19:11:09.000000Z

Question for everyone: I am using the MLX version of Flux to generate really good images from text on my M2 Mac, but I don’t have an easy setup for doing text + base image to a new image. I want to be able to use base images of my family and put them on Mount Everest, etc.

Does anyone have a recommendation?

For context: I have almost ten years experience with deep learning, but I want something easy to set up in my home M2 Mac, or Google Colab would be OK.

MacsHeadroom · 2024-08-24T22:31:10.000000Z

Try Diffusion Bee's latest release https://github.com/divamgupta/diffusionbee-stable-diffusion-...

fallinditch · 2024-08-24T13:01:08.000000Z

Does anyone know what advantages LM Studio has over Ollama, and vise versa?

vunderba · 2024-08-24T16:34:41.000000Z

A better question would be over something like Jan or LibreChat. Ollama's is CLI/API/backend for easily downloading and running models.

https://github.com/janhq/jan

https://github.com/danny-avila/LibreChat

Jan's probably the closest thing to a open-source LLM chat interface that is relatively easy to get started with.

I personally prefer Librechat (which supports integration with image generation) but it does have to spin up some docker stuff and that can make it a bit more complicated.

himhckr · 2024-08-24T20:04:38.000000Z

There is also Msty (https://msty.app), which I find much easier to get started with and it comes with interesting features such as web search, RAG, Delve mode, etc.

barrkel · 2024-08-24T16:09:17.000000Z

Ollama doesn't have a UI.

webprofusion · 2024-08-24T05:29:38.000000Z

Cool, it's a bit weird that the Windows download is 32-bit, it should be 64-bit by default and there's no need for a 32-bit windows version at all.

webprofusion · 2024-08-24T05:32:39.000000Z

It's probably 64-bit and they just call it x86 on their website. Needs an option to choose where models get downloaded to as your typically C: drive is an SSD with limited space.

diggan · 2024-08-24T12:50:50.000000Z

> Needs an option to choose where models get downloaded to as your typically C: drive is an SSD with limited space.

You can already do this? https://i.imgur.com/BpF3K9t.png

Jedd · 2024-08-24T12:45:56.000000Z

Has an option to choose where models get downloaded - in the Models tab you can pick the target path.

IronWolve · 2024-08-24T23:17:58.000000Z

Been using LM studio for months on windows, its so easy to use, simple install, just search for the LLM off huggingface and it downloads and just works. I dont need to setup a python environment in conda, its way easier for people to play and enjoy. Its what I tell people who want to start enjoying LLM's without the hassle.

dgreensp · 2024-08-24T20:17:37.000000Z

I filed a GitHub issue two weeks ago about a bug that was enough for me to put it down for a bit, and there’s been not even a response. Their development velocity seems incredible, though. I’m not sure what to make of it.

yags · 2024-08-24T20:23:47.000000Z

We probably just missed it. Can you please ping me on it? “@yagil” on GitHub

alok-g · 2024-08-24T07:25:07.000000Z

See also: Msty.app

It allows both local and cloud models.

* Not associated with them in any way. Am a happy user.

k2enemy · 2024-08-24T14:44:01.000000Z

Also jan.ai for offline and online.

vunderba · 2024-08-24T16:35:50.000000Z

+1 for Jan, and unlike Msty/LM Studio - it's open source.

2browser · 2024-08-24T14:37:43.000000Z

Running this on Windows on an AMD card. Llama 3.1 Instruct 7B runs really well on this if anyone wants to try.

BaculumMeumEst · 2024-08-24T13:15:44.000000Z

If you're hopping between these products instead of learning and understanding how inference works under the hood, and familiarizing yourself with the leading open source projects (i.e. llama.cpp), you are doing yourself a great disservice.

hnuser123456 · 2024-08-24T15:40:15.000000Z

I know how training and inference works under the hood, I know the activation functions and backprop and MMUL, and I know some real applications I really want to build. But there's still plenty of room in the gap between that LM studio helps fill. I also already have software built around the openai api, and the lmstudio openai api emulator is hard to beat for convenience. But if you can outline a process I could follow (or link good literature) to shift towards running LLMs locally with FOSS but still interact with them through an API, I'll absolutely give it a try.

BaculumMeumEst · 2024-08-24T18:02:55.000000Z

"hopping between these products instead of learning and understanding" was intended to exclude people who already know how they work, because I think it is totally fine to use them if you know exactly what all the current knobs and levers do.

gastonmorixe · 2024-08-24T17:58:29.000000Z

Have you tried Jan? https://github.com/janhq/jan

hnuser123456 · 2024-08-24T19:15:45.000000Z

Fantastic, thank you.

barrkel · 2024-08-24T16:05:17.000000Z

Why would someone expect interacting with a local LLM to teach anything about inference?

Interacting with a local LLM develops one's intuitions about how LLMs work, what they're good for (appropriately scaled to model size) and how they break, and gives you ideas about how to use them as a tool in a bigger applications without getting bogged down in API billing etc.

BaculumMeumEst · 2024-08-24T17:54:05.000000Z

Assuming s/would/wouldn't: If you are super smart then perhaps you can intuit details about how they work under the hood. Otherwise you are working with a mental model that is likely to be much more faulty than the one you would develop by learning through study.

barrkel · 2024-08-24T20:35:38.000000Z

Knowing the specific multiplies and QKV and how attention works doesn't develop your intuition for how LLMs work. Knowing that the effective output is a list of tokens with associated probabilites is of marginal use. Knowing about rotary position embeddings, temperature, batching, beam search, different techniques for preventing repetition and so on doesn't really develop intuition about behavior, but rather improve the worst cases - babbling repeating nonsense in the absolute worst - but you wouldn't know that at all from first principles without playing with the things.

The truth is that the inference implementation is more like a VM, and the interesting thing is the model, the set of learned weights. It's like a program being executed one token at a time. How that program behaves is the interesting thing. How it degrades. What circumstances it behaves really well in, and its failure modes. That's the thing where you want to be able to switch and swap a dozen models around and get a feel for things, have forking conversations, etc. It's what LM Studio is decent at.

BaculumMeumEst · 2024-08-25T19:50:10.000000Z

But those things are all so cool though. Like... how could you not want to learn about them.

Seriously though, I guess I'm just kind of uncomfortable with "treating inference implementation like a VM" as you put it. It seems like a bad idea. We are turning implementation details into user interfaces in a space that is undergoing such rapid and extreme change. Like people spent a lot of time learning the stable diffusion web ui, and then flux came out and upended the whole space. But maybe foundational knowledge isn't as valuable as I'm thinking and its fine that people just re-learn whatever UIs emerge, I don't know.

2browser · 2024-08-25T12:12:07.000000Z

You can also learn how a user will approach prompting.

m3kw9 · 2024-08-24T14:37:59.000000Z

washadjeffmad · 2024-08-24T15:45:02.000000Z

It's not that high a bar, and we're still very much publication to implementation. Most recently, I was able to use SAM2, SV3D, Mistral NeMo, and Flux.dev day-one, and I'm certainly not some heady software engineer.

There's just a lot of great stuff you're missing out on if you're waiting on products while ignoring the very accessible, freely available tools they're built on top of and often reductions of.

I'm not against overlays like ollama and lm studio, but I feel more confused by why they exist when there's no additional barrier to going on huggingface or using kcpp, ooba, etc.

I just assume it's an awareness issue, but I'm probably wrong.

ganyu · 2024-08-24T15:45:51.000000Z

While it is most proper and convenient to use these out-of-the-box products for fit scenarios,

Doing so will at the very least not help us with our interviews. It will also restrict our mindset of how one can make use of LLMs through the distraction of sleek, heavily abstracted interfaces. This makes it harder, if not impossible for us to come up with bright new ideas that undermine models in various novel ways, which are almost always derived from deep understanding of how things actually work under the hood.

Tepix · 2024-08-24T08:30:15.000000Z

Neat! Can i use it with Brave browser‘s local LLM festure?

a1o · 2024-08-24T16:39:46.000000Z

What is the recommended system settings for this?

gymbeaux · 2024-08-24T17:10:21.000000Z

It depends on the model you run but generally speaking you want an NVIDIA GPU of some substance. I’d say like a 3060 at minimum.

CPU inference is incredibly slow versus my RTX 3090, but technically it will work.

grigio · 2024-08-24T07:48:53.000000Z

can somebody share benchmarks on AMD ryzen AI with and without NPU ?

Jedd · 2024-08-24T08:45:06.000000Z

It's using llama.cpp, so it's going to be the same benchmarks as almost all other apps (given almost everything uses llama.cpp under the hood).

navaed01 · 2024-08-23T01:17:07.000000Z

Congrats! I’m a big fan of the existing product and the are some great updates to make the app even more accessible and powerful

navaed01 · 2024-08-29T01:31:22.000000Z

Why did this get down voted so much? At all?