Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very excited for access to more RAM in a consumer form factor - by comparison, the most RAM you can get on a DDR5 motherboard is 128GB(actually, there may be 48 GB modules as well, so maybe 192), but at that capacity you're not getting anywhere near the rated speeds. The listed 1TB of capacity sounds super roomy by comparison


What are ye using all that ram for? -curious stranger


I have 128 GB in my computer with a 5850x, it allows me to run and load the 180B falcon and 70B llama2 LLMs in llama.cpp, although with different quantization.

Speed is actually not that bad either.


Is there some documentation on how to run this setup?

How fast is your setup?


I'm doing this on a mac studio with 128gb too. I'm using llama.cpp.


Since you get GPU acceleration (because of the unified memory), I imagine this is probably much faster than the PC setup?

Edit: Seems some people are getting 1-2.6 tokens/sec on Ryzen (no GPU acceleration), Llama 70B quantized https://www.reddit.com/r/LocalLLaMA/comments/15rqkuw/llama_2...

Whereas Mac Studio gets 13 tokens/sec https://blog.gopenai.com/how-to-deploy-llama-2-as-api-on-mac...


Friendly internet stranger’s input:

- you don’t get GPU acceleration just by using unified memory. Llama.cpp still only uses the CPU on Apple Silicon chips.

- the difference in tokens/sec is likely attributable to memory bandwidth. Mac Studios with the base Max chip have 400 GB/s memory bandwidth compared to around 50 GB/s for the Ryzen 5000 series CPUs


Llama.cpp defaults to using metal. [0]

[0] https://github.com/ggerganov/llama.cpp#metal-build


What's your generation speed?


One underused angle for oodles of memory is the humble ramdisk. If you haven't run into these, you set aside a portion of memory to serve as a disk volume. If you have a temporary work product, some kind of intermediate stage bit you will not save, shoving it in a ramdisk provides some really amazing speedups. You can put a SQLite database in it on the fly, just for analysis, run at blazing speeds, keep the results. Image an optical disk into your ramdrive, chow away at it, keep the work product, and just clear the memory.


What software do you use to create it ?


Software?

   mount none /ramdisk -t tmpfs



There is some OSS for creating ramdisks on windows, but I don’t remember the name.

In Linux it’s part of the kernel, and you can mount with type tempts.

Not sure about OSX or bsds


Try searching deep into the game tree in go. After a few million nodes you can't actually store them all in 16GB of RAM. That's just a day of searching on a 2060, you can get into the hundreds of millions with a faster GPU and a longer search horizon. But when you put it into swap, it won't be as fast...


It can't be for tabs in chrome, that browser can't even use all 64gb I offer it ("putting tabs to sleep" is disabled on purpose as I need the tabs to stay active)


You just don't have enough tabs open.


I do, but they crash for memory insufficiency reasons while half of the RAM is still free. it's a bug, other people have commented on the issue with no solution so far. some suspect hardware acceleration to be a culprit, but I ruled that out.

what I meant to convey in the first place: it doesn't matter how much hardware you throw at an issue if the software can't use it


Switching to Firefox and using tree style tabs will help increase your total open tab count.


Isn’t it obvious? More browser tabs.


Spillover from my video card's 24GB when doing huge renders in Blender.


64gb -> OS, light gaming, 6 oracle DB's and a few middlewares.


Definitely 192.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: