Linux is fundamentally not monolithic like Windows, but maybe some DEs could expose hooks for LLMs to use.
There is also the performance issue. Right now the task energy/memory usage of llama implementations is very high, and it takes some time to load into RAM and/or VRAM. It seems Microsoft is getting around this with cloud inference, and eats the hosting cost (for now).
> little fine tuning on tool use might be all thats needed.
Maybe I am interpreting this wrong, but LORA finetuning is extremely resource intense right now. There are practical alternatives though, like embedding databases people are just now setting up.
The compute requirements are reasonable, but the memory requirements are extremely high, even with some upcoming breakthroughs like 4 bit bitsandbytes.
There is also the performance issue. Right now the task energy/memory usage of llama implementations is very high, and it takes some time to load into RAM and/or VRAM. It seems Microsoft is getting around this with cloud inference, and eats the hosting cost (for now).
> little fine tuning on tool use might be all thats needed.
Maybe I am interpreting this wrong, but LORA finetuning is extremely resource intense right now. There are practical alternatives though, like embedding databases people are just now setting up.