Hacker Newsnew | past | comments | ask | show | jobs | submit | wetwater's commentslogin

In defense of Byzantines. Their rules and amazing diplomatic prowess is what let them be an empire for so long. The negative connotations to Byzantine comes from the negative perception the west had of them. Byzantines were very practical in regards to who they allied with.

I learned more here and I'm not sure I agree with your comment fully.

https://www.reddit.com/r/etymology/comments/8shw5r/what_is_t...


The claims about extreme complexity of the Late Roman/Byzantine state came into the popular imagination by Enlightenment and other Western thinkers who were deeply biased against the late Romans due to a long history of cultural conflict. The OP is completely correct here, the reddit comment is an extremely incomplete story. It notes that the Roman state was more complex than other Medieval states (correct), but to say that it was "too" complex is a culturally based judgment, not a fact. The origin of the negative cultural judgment about that complexity waen't coming from the Romans, they were coming from the Franks, Venetians, and later Western Europeans who in large part were repeating the old prejudices going back to the schism, but also justifying their own conquest and abuses of of the Roman people.

I'm sure the children who watched their parents get murdered before they themselves were taken into slavery during the fall of Constantinople appreciated those rules and the alliances they supported.

No empire lasts forever. Your sentence could apply to a lot of times and places in the pre-modern era

Its interestine, when you say abstractions. Could you explain what you mean by abstractions in this context and what do you mean by the underlying fundamentals.


One example would be resonant circuits. Ok great you can build resonant circuits, but what for? The fundamentals to understand frequency responses came later in signals and systems. The application came much later when I learned about electric motors, which basically behave like low pass filters (resonant circuits) which enables us to use PWM to generate sine shaped current curves by switching the input voltage on and off. The voltage signal is smoothed by the LPF circuit that is the motors windings.

I think it would have helped me if we talked about the motor or other examples first, and then did some math to show how the resonant behavior can be useful.


It’s crazy that VFDs work! You have to have a really good ground though, or you get arcing through the bearings.


Its a good effort but very pedestrian and a very low hanging fruit. Its just another academia paper that will be published.

https://doi.org/10.1109/GLOBECOM38437.2019.9014297 https://doi.org/10.1109/CCNC.2018.8319181 https://dl.acm.org/doi/abs/10.1145/3286978.3287003 ..... many more.

I'd say this is far more interesting, does not use ML and credits the tech stacks that it leverages . https://people.csail.mit.edu/davidam/docs/WiMic_final.pdf


If only they bring back their "On the metal" podcast. That podcast scratched an itch I didnt knew I had .


Check out Oxide and Friends[0]! We've been doing it for several years now, and it's a much more flexible format that allows the team to be heard in its own voice -- and allows us to weigh in on whatever's on our collective mind.

[0] https://oxide-and-friends.transistor.fm/


One of the best shows I've ever watched.


+1

Halt and Catch Fire, if anyone is curious.



To summarise, photosynthesis increases the amount of molecular oxygen in the atmosphere. It doesn’t create oxygen atoms. Those come, in the recent geological era, from inside the earth in the form of volcanic oxides. The paper suggests that something about the magnetic field influences the rate at which those oxides are belched into the atmosphere.


Are there any good sources that I can read up on estimiating what would be hardware specs required for 7B, 13B, 32B .. etc size If I need to run them locally? I am grad student on budget but I want to host one locally and trying to build a PC that could run one of these models.


"B" just means "billion". A 7B model has 7 billion parameters. Most models are trained in fp16, so each parameter takes two bytes at full precision. Therefore, 7B = 14GB of memory. You can easily quantize models to 8 bits per parameter with very little quality loss, so then 7B = 7GB of memory. With more quality loss (making the model dumber), you can quantize to 4 bits per parameter, so 7B = 3.5GB of memory. There are ways to quantize at other levels too, anywhere from under 2 bits per parameter up to 6 bits per parameter are common.

There is additional memory used for context / KV cache. So, if you use a large context window for a model, you will need to factor in several additional gigabytes for that, but it is much harder to provide a rule of thumb for that overhead. Most of the time, the overhead is significantly less than the size of the model, so not 2x or anything. (The size of the context window is related to the amount of text/images that you can have in a conversation before the LLM begins forgetting the earlier parts of the conversation.)

The most important thing for local LLM performance is typically memory bandwidth. This is why GPUs are so much faster for LLM inference than CPUs, since GPU VRAM is many times the speed of CPU RAM. Apple Silicon offers rather decent memory bandwidth, which makes the performance fit somewhere between a typical Intel/AMD CPU and a typical GPU. Apple Silicon is definitely not as fast as a discrete GPU with the same amount of VRAM.

That's about all you need to know to get started. There are obviously nuances and exceptions that apply in certain situations.

A 32B model at 5 bits per parameter will comfortably fit onto a 24GB GPU and provide decent speed, as long as the context window isn't set to a huge value.


Oh, I have a question, maybe you know.

Assuming the same model sizes in gigabytes, which one to choose: a higher-B lower-bit or a lower-B higher-bit? Is there a silver bullet? Like “yeah always take 4-bit 13B over 8-bit 7B”.

Or are same-sized models basically equal in this regard?


I would say 9 times out of 10, you will get better results from a Q4 model that’s a size class larger than a smaller model at Q8. But it’s best not to go below Q4.


My understanding is that models are currently undertrained and not very "dense", so Q4 doesn't hurt very much now but it may in future denser models.


That may well be true. I know that earlier models like Llama 1 65B could tolerate more aggressive quantization, which supports that idea.


So, in essence, all AMD does to launch a successful GPU in inference space is to load it with ram?


AMD's limitation is more of a software problem than a hardware problem at this point.


But it’s still surprising they haven’t. People would be motivated as hell if they launched GPUs with twice the amount of VRAM. It’s not as simple as just soldering some more in but still.


AMD “just” has to write something like CUDA overnight. Imagine you’re in 1995 and have to ship Kubuntu 24.04 LTS this summer running on your S3 Virge.


They don't need to do anything software wise, inference is solved problem for AMD.


They sort of have. I'm using a 7900xtx, which has 24gb of vram. The next competitor would be a 4090, which would cost more than double today; granted, that would be much faster.

Technically there is also the 3090, which is more comparable price wise. I don't know about performance, though.

VRAM is supply limited enough that going bigger isn't as easy as it sounds. AMD can probably sell as much as they get their hands on, so they may as well still more GPUs, too.


Funnily enough you can buy GPUs where someone has done exactly that: solder extra VRAM into a stock model.


Or let go of the traditional definition of a GPU, and go integrated. AMD Ryzen AI Max+ 395 with 128GB RAM is a promising start.


Go to r/LocalLLAMA they have the most info. There’s also lots of good YouTube channels who have done benchmarks on Mac minis for this (another good value one with student discount).

Since you’re a student most of the providers/clouds offer student credits and you can also get loads of credits from hackathons.


MacBook with 64gb RAM will probably be the easiest. As a bonus, you can train pytorch models on the built in GPU.

It's really frustrating that I can't just write off Apple as evil monopolists when they put out hardware like this.



Generally, unquantized - double the number and that's the amount of VRAM in GB you need + some extra, because most models use fp16 weights so it's 2 bytes per parameter -> 32B parameters = 64GB

typical quantization to 4bit will cut 32B model into 16GB of weights plus some of the runtime data, which makes it possibly usable (if slow) on 16GB GPU. You can sometimes viably use smaller quantizations, which will reduce memory use even more.


You always want a bit of headroom for context. It's a problem I keep bumping into with 32B models on a 24GB card: the decent quants fit, but the context you have available on the card isn't quite as much as I'd like.


Yes. You multiply the number of parameters with the number of bytes per parameter and compare it with the amount of GPU memory (or CPU RAM) you have.


I've only recently started looking into running these models locally on my system. I have limited knowledge regarding LLMs and even more limited when it comes to building my own PC.

Are there any good sources that I can read up on estimiating what would be hardware specs required for 7B, 13B, 32B .. etc size If I need to run them locally?


VRAM Required = Number of Parameters (in billions) × Number of Bytes per Parameter × Overhead[0].

[0]: https://twm.me/posts/calculate-vram-requirements-local-llms/


Don’t forget to add a lot of extra space if you want a usable context size.


Wouldn't that be your overhead var


Thats neat! thanks


I am in a situation right now where I have to deliver on all three platforms. I chose Flutter because I just couldnt do anymore JS.


Since therapy is not an actual science . This whole article and the opinions are hot air.


Why are you posting this to very second comment? What do you mean by "actual science"? Like it's not based on chemistry? Boy, have I got a surprise pharmaceutical industry for you.

It seems you don't have a good understanding of either "therapy" and "science"

Calling other's ideas "hot air" is just the cherry on top.


To be clear, I’m not going as far as you here. I’m not saying therapy is bunk or isn’t scientific at all.

Rather, I’m saying that an individual therapist making anecdotes about their individual clients doesn’t constitute a scientific study.

Believe what you want but don’t mistake what I said for being anti-therapy.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: