> it took me several hours to get llama.cpp working as a server
Mm... Running a llama.cpp server is annoying; which model to use? Is it in the right format? What should I set `ngl` to? However, perhaps it would be fairer and more accurate to say that installing llama.cpp and installing ollama have slightly different effort levels (one taking about 3 minutes to clone and run `make` and the other taking about 20 seconds to download).
Once you have them installed, just typing: `ollama run llama3` is quite convenient, compared to finding the right arguments for the llama.cpp `server`.
Sensible defaults. Installs llama.cpp. Downloads the model for you. Runs the server for you. Nice.
> it took me 2 minutes to get ollama working
So, you know, I think its broadly speaking a fair sentiment; even if it probably isn't quite true.
...
However, when you look at it from that perspective, some things stand out:
- ollama is basically just a wrapper around llama.cpp
- ollama doesn't let you do all the things llama.cpp does
- ollama offers absolutely zero way, or even the hint of a suggestion of a way to move from using ollama to using llama.cpp if you need anything more.
Here's some interesting questions:
- Why can't I just run llama.cpp's server with the defaults from ollama?
- Why can't I get a simple dump of the 'sensible' defaults from ollama that it uses?
- Why can't I get a simple dump of the GGUF (or whatever) model file ollama uses?
- Why isn't 'a list of sensible defaults' just a github repository with download link and a list of params to use?
- Who's paying for the enormous cost of hosting all those ollama model files and converting them into usable formats?
The project is convenient, and if you need an easy way to get started, absolutely use it.
...but, I guess, I recommend you learn how to use llama.cpp itself at some point, because most free things are only free while someone else is paying for them.
Consider this:
If ollama's free hosted models were no longer free and you had to manually find and download your own model files, would you still use it? Could you still use it?
If not... maybe, don't base your business / anything important around it.
It's a SaaS with an open source client, and you're using the free plan.
> If ollama's free hosted models were no longer free and you had to manually find and download your own model files, would you still use it? Could you still use it?
I would absolutely still use it; I've already ended up feeding it gguf files that weren't in their curated options. The process (starting from having foo.gguf) is literally just:
echo FROM ./foo.gguf > ./foo.gguf.Modelfile
ollama create foo -f foo.gguf.Modelfile
(Do I wish there was an option like `ollama create --from-gguf` to skip the Modelfile? Oh yes. Do I kinda get why it exists? Also yes (it lets you bake in a prompt and IIRC other settings). Do I really care? Nope, it's low on the list of modestly annoying bits of friction in the world.)
I dont feel any worse about Ollama funding the hosting and bandwidth of all of these models than I do about their upstream hosting source being Huggingface, which shares the same concerns.
Mm... Running a llama.cpp server is annoying; which model to use? Is it in the right format? What should I set `ngl` to? However, perhaps it would be fairer and more accurate to say that installing llama.cpp and installing ollama have slightly different effort levels (one taking about 3 minutes to clone and run `make` and the other taking about 20 seconds to download).
Once you have them installed, just typing: `ollama run llama3` is quite convenient, compared to finding the right arguments for the llama.cpp `server`.
Sensible defaults. Installs llama.cpp. Downloads the model for you. Runs the server for you. Nice.
> it took me 2 minutes to get ollama working
So, you know, I think its broadly speaking a fair sentiment; even if it probably isn't quite true.
...
However, when you look at it from that perspective, some things stand out:
- ollama is basically just a wrapper around llama.cpp
- ollama doesn't let you do all the things llama.cpp does
- ollama offers absolutely zero way, or even the hint of a suggestion of a way to move from using ollama to using llama.cpp if you need anything more.
Here's some interesting questions:
- Why can't I just run llama.cpp's server with the defaults from ollama?
- Why can't I get a simple dump of the 'sensible' defaults from ollama that it uses?
- Why can't I get a simple dump of the GGUF (or whatever) model file ollama uses?
- Why isn't 'a list of sensible defaults' just a github repository with download link and a list of params to use?
- Who's paying for the enormous cost of hosting all those ollama model files and converting them into usable formats?
The project is convenient, and if you need an easy way to get started, absolutely use it.
...but, I guess, I recommend you learn how to use llama.cpp itself at some point, because most free things are only free while someone else is paying for them.
Consider this:
If ollama's free hosted models were no longer free and you had to manually find and download your own model files, would you still use it? Could you still use it?
If not... maybe, don't base your business / anything important around it.
It's a SaaS with an open source client, and you're using the free plan.