Very excited for access to more RAM in a consumer form factor - by comparison, t...

jadbox · on Oct 19, 2023

What are ye using all that ram for? -curious stranger

tyfon · on Oct 19, 2023

I have 128 GB in my computer with a 5850x, it allows me to run and load the 180B falcon and 70B llama2 LLMs in llama.cpp, although with different quantization.

Speed is actually not that bad either.

mosselman · on Oct 19, 2023

Is there some documentation on how to run this setup?

How fast is your setup?

rnk · on Oct 19, 2023

I'm doing this on a mac studio with 128gb too. I'm using llama.cpp.

acchow · on Oct 19, 2023

Since you get GPU acceleration (because of the unified memory), I imagine this is probably much faster than the PC setup?

Edit: Seems some people are getting 1-2.6 tokens/sec on Ryzen (no GPU acceleration), Llama 70B quantized https://www.reddit.com/r/LocalLLaMA/comments/15rqkuw/llama_2...

Whereas Mac Studio gets 13 tokens/sec https://blog.gopenai.com/how-to-deploy-llama-2-as-api-on-mac...

stoatmagoats · on Oct 20, 2023

Friendly internet stranger’s input:

- you don’t get GPU acceleration just by using unified memory. Llama.cpp still only uses the CPU on Apple Silicon chips.

- the difference in tokens/sec is likely attributable to memory bandwidth. Mac Studios with the base Max chip have 400 GB/s memory bandwidth compared to around 50 GB/s for the Ryzen 5000 series CPUs

spott · on Oct 20, 2023

Llama.cpp defaults to using metal. [0]

[0] https://github.com/ggerganov/llama.cpp#metal-build

acchow · on Oct 19, 2023

What's your generation speed?

at_a_remove · on Oct 19, 2023

One underused angle for oodles of memory is the humble ramdisk. If you haven't run into these, you set aside a portion of memory to serve as a disk volume. If you have a temporary work product, some kind of intermediate stage bit you will not save, shoving it in a ramdisk provides some really amazing speedups. You can put a SQLite database in it on the fly, just for analysis, run at blazing speeds, keep the results. Image an optical disk into your ramdrive, chow away at it, keep the work product, and just clear the memory.

Havoc · on Oct 19, 2023

What software do you use to create it ?

dmitrygr · on Oct 20, 2023

Software?

   mount none /ramdisk -t tmpfs

at_a_remove · on Oct 20, 2023

This is a good overview: https://en.wikipedia.org/wiki/List_of_RAM_drive_software

ghthor · on Oct 20, 2023

There is some OSS for creating ramdisks on windows, but I don’t remember the name.

In Linux it’s part of the kernel, and you can mount with type tempts.

Not sure about OSX or bsds

iopq · on Oct 19, 2023

Try searching deep into the game tree in go. After a few million nodes you can't actually store them all in 16GB of RAM. That's just a day of searching on a 2060, you can get into the hundreds of millions with a faster GPU and a longer search horizon. But when you put it into swap, it won't be as fast...

taskforcegemini · on Oct 19, 2023

It can't be for tabs in chrome, that browser can't even use all 64gb I offer it ("putting tabs to sleep" is disabled on purpose as I need the tabs to stay active)

nottorp · on Oct 19, 2023

You just don't have enough tabs open.

taskforcegemini · on Oct 20, 2023

I do, but they crash for memory insufficiency reasons while half of the RAM is still free. it's a bug, other people have commented on the issue with no solution so far. some suspect hardware acceleration to be a culprit, but I ruled that out.

what I meant to convey in the first place: it doesn't matter how much hardware you throw at an issue if the software can't use it

ghthor · on Oct 20, 2023

Switching to Firefox and using tree style tabs will help increase your total open tab count.

itsboring · on Oct 19, 2023

Isn’t it obvious? More browser tabs.

Washuu · on Oct 19, 2023

Spillover from my video card's 24GB when doing huge renders in Blender.

swarnie · on Oct 19, 2023

64gb -> OS, light gaming, 6 oracle DB's and a few middlewares.

smcleod · on Oct 19, 2023

Definitely 192.