More

purplecats · 2025-01-09T10:50:56 1736419856

nice touch with the images, especially since they are even named contextually.

navigational links don't work, but this is pretty cool. I wonder theoretically if we'll be able to tell the dynamic nature of this given the caching, and assuming LLMs/art generators keep getting faster

purplecats · 2024-12-31T08:26:17 1735633577

interesting perspective. i suppose complex minifiers would also be an attack vector, as they don't as readily afford even eyeballing obvious deviances due to the obfuscation

purplecats · 2024-05-24T01:15:06 1716513306

Disk Inventory X is the closest to as pretty as that I've found. I wish it was maintained

purplecats · 2024-05-01T01:40:13 1714527613

do you have a demo to try without building it first

nomad_ankur · 2024-05-01T03:47:19 1714535239

youtube - https://www.youtube.com/watch?v=gYynr1xRLeU

Live Link - https://demo.sugarai.dev/todo

junto · 2024-05-01T10:23:19 1714558999

Looks great. Had an issue getting it to work though. iPhone, microphone and speech recognition enabled. Says it adds to the list, but the list not changing. Any ideas?

nomad_ankur · 2024-05-01T13:41:40 1714570900

We have released some fixes, give it a try. If not working, if possible share a video link.

purplecats · on Dec 22, 2023

taskpaper is my go to for this

purplecats · on Dec 14, 2023

only if they've read and internalized this paper

purplecats · on Dec 14, 2023

would like this on https://excalidraw.com/

purplecats · on Dec 13, 2023

gpt with context of my entire financial history in real time, to discuss financial decisions with it

purplecats · on Dec 4, 2023

i've never been a huge tesla fan, but this video was certainly entertaining and has me considering the cybertruck now, especially for my family.

however, it's very biased. i'd love to take suggestions for a less biased yet through review/comparison

purplecats · on Dec 1, 2023

when is the new mistral coming out and at what size?

tarruda · on Dec 1, 2023

I'm hoping that they make it 13B, which is the size I can run locally in 4-bit and still get reasonable performance

kybernetikos · on Dec 1, 2023

What kind of system do you need for that?

michaelt · on Dec 1, 2023

If your GPU has ~16GB of vram, you can run a 13B model in "Q4_K_M.gguf" format and it'll be fast. Maybe even ~12GB.

It's also possible to run on CPU from system RAM, to split the workload across GPU and CPU, or even from a memory-mapped file on disk. Some people have posted benchmarks online [1] and naturally, the faster your RAM and CPU the better.

My personal experience is running from CPU/system ram is painfully slow. But that's partly because I only experimented with models that were too big to fit on my GPU, so part of the slowness is due to their large size.

[1] https://www.reddit.com/r/LocalLLaMA/comments/14ilo0t/extensi...

gardnr · on Dec 1, 2023

I can fit 13B Q4 K M models on a 12GB RTX 3060. It OOMs when the context window goes above 3k. I get 25 tok/s.

tarruda · on Dec 1, 2023

I get 10 tokens/second on a 4-bit 13B model with 8GB VRAM offloading as much as possible to the GPU. At this speed, I cannot read the LLM output as fast as it generates, so I consider it to be sufficient.

johnbellone · on Dec 1, 2023

Which video card?

tarruda · on Dec 1, 2023

RTX 3070 Max-Q (laptop)

tarruda · on Dec 1, 2023

Mine is a laptop with i7-11800h CPU + RTX 3070 Max-Q 8GB VRAM + 64GB RAM (though you can get probably get away with 16GB RAM). I bought this system for work and causal gaming, and was happy when I found out that GPU also enabled me to run LLMs locally at good performance. This laptop costed me ~= $ 1600, which was a bargain considering how much value I get out of it. If you are not on a budget, I highly recommend getting one of the high end laptops that have RTX 4090 and 16GB VRAM.

With my system, Llama.cpp can run Mistral 7B 8-bit quantized by offloading 32 layers to the GPU (35 total) at about 25-30 tokens/second, or 6-bit quantized by offloading all layers to the GPU at ~ 35 tokens/second.

I've tested a few 13B 4-bit models such as Codellama and got about 10 tokens/second by offloading 37 layers to the GPU. Got me about 10-15 tokens/second.

disiplus · on Dec 1, 2023

i have lenovo legion with 3070 8GB and was wondering should i use that instead of my macbook m1pro.

tarruda · on Dec 1, 2023

The main focus of llama.cpp has been Apple silicon, so I suspect M1 would be more efficient. The author recently published some benchmarks: https://github.com/ggerganov/llama.cpp/discussions/4167

vunderba · on Dec 1, 2023

On my Mac M1 Max 32GB of ram, Vicuna 13b (GGUF model) at 4bit consumes around 8GB of ram in Oobabooga.

Tried turning on mlock and upping thread count to 6, but it's still rather slow at around 3 tokens / sec.

ekianjo · on Dec 1, 2023

a CPU would work fine for the 7B model, and if you have 32GB RAM and a CPU with a lot of core you can run a 13B model as well while it will be quite slow. If you dont care about speed, it's definitely one of the cheapest ways to run LLMs.

ekianjo · on Dec 1, 2023

Q5_M on Mistral 7B has good accuracy and performs decently on a CPU too