Very excited for access to more RAM in a consumer form factor - by comparison, the most RAM you can get on a DDR5 motherboard is 128GB(actually, there may be 48 GB modules as well, so maybe 192), but at that capacity you're not getting anywhere near the rated speeds. The listed 1TB of capacity sounds super roomy by comparison
I have 128 GB in my computer with a 5850x, it allows me to run and load the 180B falcon and 70B llama2 LLMs in llama.cpp, although with different quantization.
- you don’t get GPU acceleration just by using unified memory. Llama.cpp still only uses the CPU on Apple Silicon chips.
- the difference in tokens/sec is likely attributable to memory bandwidth. Mac Studios with the base Max chip have 400 GB/s memory bandwidth compared to around 50 GB/s for the Ryzen 5000 series CPUs
One underused angle for oodles of memory is the humble ramdisk. If you haven't run into these, you set aside a portion of memory to serve as a disk volume. If you have a temporary work product, some kind of intermediate stage bit you will not save, shoving it in a ramdisk provides some really amazing speedups. You can put a SQLite database in it on the fly, just for analysis, run at blazing speeds, keep the results. Image an optical disk into your ramdrive, chow away at it, keep the work product, and just clear the memory.
Try searching deep into the game tree in go. After a few million nodes you can't actually store them all in 16GB of RAM. That's just a day of searching on a 2060, you can get into the hundreds of millions with a faster GPU and a longer search horizon. But when you put it into swap, it won't be as fast...
It can't be for tabs in chrome, that browser can't even use all 64gb I offer it ("putting tabs to sleep" is disabled on purpose as I need the tabs to stay active)
I do, but they crash for memory insufficiency reasons while half of the RAM is still free. it's a bug, other people have commented on the issue with no solution so far. some suspect hardware acceleration to be a culprit, but I ruled that out.
what I meant to convey in the first place: it doesn't matter how much hardware you throw at an issue if the software can't use it