Hacker News new | past | comments | ask | show | jobs | submit login
Why is 1 GB equal to 10^9 bytes instead of 2^30? (tarsnap.com)
23 points by EndXA 6 months ago | hide | past | favorite | 46 comments



Because the International System of Units isn't binary? That's why we use GiB in computing.

https://en.wikipedia.org/wiki/International_System_of_Units


"we" should use GiB, but in practice (almost) no one really does. Hence the confusion...


It used to be that we used k, M, G, etc as power of 2s in computing without any confusion because that was the norm.

But, somehow, at some point I believe that hard drive manufacturers started to use "GB" as meaning 10^9 bytes and the confusion started from there.


This is a myth. The very first harddrive was 5,000,000 characters - in the 1950s. The first PC harddrives came in 10 & 20MB. Harddrives have always been base10. Linespeed has always been base10. Clock speed has always been base10. It's RAM that's the odd one out.

I believe that where the disconnect has occurred, is that when home computing started - RAM is pretty much all you had. We didn't have megabit networking, we didn't have harddrives, if we were lucky we had cassettes. So the binary prefixes were absorbed in isolation.


No, 10MB or 100MB, etc. was measured using powers of 2. It's not true that it was always based 10.

It made a big stir when they reached GB range and changed to powers of 10. I remember it.


To quote Jerry, show me the money?

For example, https://www.ebay.com/itm/225315397038

The label of this drive gives us CHS figures for three different models:

           cyl * h  * s  *  B
  127MB =  919 * 16 * 17 * 512 = 127,983,616B (124MiB)
  170MB = 1011 * 15 * 22 * 512 = 170,818,560B (162MiB)
  340MB = 1011 * 15 * 44 * 512 = 341,637,120B (325MiB)
So we're at 127MB and they're already base10.

For some reason everyone remembers it, but no-one's been able to show me.


Why should "kilo-" in programming be different from "kilo-" in physics? It used to be a wrong approach and now people started to unwrangle the mess. SI was invented for a reason.


In the 1980s and 1990s, no-one thought this was a problem. All computer stuff is based on binary numbers, therefore 2^(10*n) makes a lot more sense than 10^n for us.

Do you buy 64 GB of RAM for a new computer? Or is it 68.719476736 GB?

It was only the hard drive manufacturers that wanted to inflate their numbers, and started pushing the 10^n scheme.

This is coming from someone who grew up with the SI system, which makes a ton of sense for physics and mechanics. Not so much for computer memory.


I always assumed it was a sales gimmick: more GB fit in storage device than GiB.


That it did. I can assure you that in the 80's when I was first learning about computers no one thought that a kilobyte was 1,000 vs 1,024... that said right now, without looking it up I cannot tell you how much a megabyte is so perhaps it was a bad idea to begin with.

Edit: Looks like it's 1,048,576


> without looking it up I cannot tell you how much a megabyte is

You just need to remember that you need to multiple by 1,024 instead of 1,000.

So 1 kilobyte is 1,024 bytes, 1 megabyte is 1,024 kilobytes so 1,024x1,024 = 1,048,576 bytes, 1 gigabyte is 1,024 megabytes, etc.


Depends. You know how floppy discs were advertised to be 1.44 MB? That was calculated as 1000*1024 bytes


It was double the previous 720 KB ones, so 1,440 KB, which they wrongly advertised as 1.44 MB, indeed. But that's irrelevant to my previous comment.


Yes, but if you can't recall the number by heart it's not really that useful as a number.


The base unit is actually a single bit, so why aren't we talking about decabits instead of bytes, following the SI argument?


Networks are often measured in bits.

It's more about tradition. And sometimes a sprinkle of marketing lies (it's cheaper to make 1024 GB SSD than 1 TiB SSD).

Disk /dev/nvme0n1: 953.87 GiB, 1024209543168 bytes, 2000409264 sectors


Because deca isn't a good SI prefix, and only gets grandfathered in. Also because some people like weird derived units like moles. Fwiw, in many places ISPs advertise speeds in megabits per seconds, no doubt to sounds right times faster than they are.


> advertise speeds in megabits per seconds

"bits per second" is what it always has been for computer communication.

When I got started, I got a 300 bits/second modem, which later got upgraded to 1200/75 bits/s and then 2400 bits/second.

Later on, we had 57,6 kbit/second modems, 64 kbit/s ISDN lines, 2 Mbit/s ADSL, etc.

All the speedtest websites I've seen also use Mbit/s.

But sure, if I'm downloading the latest Ubuntu distro, I want to know the current speed in Megabytes/s.


But neither bits or bytes are SI units or even derived units. They just adopted prefixes from there...


Gigabytes were renamed to Gibibytes (2^30), and a new "Gigabyte," (10^9) replaced the old meaning of the word.

It is always sad to see an 8TB SSD come back with only 7.2TiB


Do people really ask for a "32 Gibibyte RAM stick" in the store?


Imho it depends on context:

For memory (ROM / RAM etc) the 2^x convention applies. 1GB = 1024MB etc.

For background storage (harddisks, USB sticks, optical, tape etc), that used to be true as well. But long ago manufacturers' marketing corrupted this into the decimal 1G = 1000M meaning.

But distinction can be fuzzy sometimes. And subject to abuse. So when in doubt: read the fine print.


Definitely.

But my point was that Gibibytes did not replace Gigabytes.

It was just that some hard drive manufacturers successfully introduced marketingspeak into the computer world. It wasn't that the computer world all agreed upon switching to SI units.


Because the marketing department highjacked computing's lingo for a quick buck.

There should never have been any confusion because SI doesnt belong in computing because computing is not on a continuum. Therefore it has a natural multiplier, typically 2, making 10 completely arbitrary and capricious.


This is the only real answer. You wouldn't buy a 29.8GB stick of RAM, and because of the addressing it wouldn't work very well. But marketers can get away with a 29.8GB stick of flash, so they do and sell that as 32GB.


Normal world things usually scales by unit. You can have 1 apple, 3 apples, or 10 apples, so the shortcuts for amounts of digits matter.

In the digital world, a lot of things scale by powers of two. You can't have 2 isolated bits. You have 1 byte (that is 8 bits), the numbering system used to put addresses is based on powers of two (1 byte can be used to tell between 256 addresses, because that is 2^8, then 2, 4 and more bytes) and a lot of bases for packing information followed or required that (like storage, with 512 or 2048 bytes per sector). The kilobyte was then the closest binary number closer to 1000 (2^8 * 4 = 2^10) so from there it was the standard for scaling and you get a kilobyte of kilobytes as 1 gb (2^20) and 1 kilobyte of megabytes 1gb (2^30).

With networking it is a bit different, because down to the layer 1, you don't transfer whole bytes, but bits. And you have things like stop bits, parity bits, frames, retries and so on that mess a bit when you are talking about individual bits or higher level of abstraction bytes, but as it is close enough for big numbers you can usually consider that for transferring 1 byte you transferred 10 bits, and more or less match the power of 10 network speed with the amount of (byte packed) information you transferred.

In any case, switching measuring units for things that scale in binary, bytes and higher, to things that scale in individual bits is more a common marketing reason than a technical one. You get the next big number just taking out a letter that most don't notice from your unit name.


It must be mentioned that early PC hard drives actually did have (often even slightly more than) the correct binary size. The "10MB" model that was widely used actually had a capacity of 10653696 bytes.

Don't get me started on the -i prefixes, they sound absolutely stupid.


When I was young, a "gib" was the stuff that appeared when you hit a monster (or yourself) at close range with the rocket launcher in Doom (https://doomwiki.org/wiki/Gibs).


With 1.44 MB floppy discs the size of a MB was 1024*1000 bytes


See also perhaps "Timeline of binary prefixes":

> This timeline of binary prefixes lists events in the history of the evolution, development, and use of units of measure which are germane to the definition of the binary prefixes by the International Electrotechnical Commission (IEC) in 1998,[1] used primarily with units of information such as the bit and the byte.

* https://en.wikipedia.org/wiki/Timeline_of_binary_prefixes

* https://en.wikipedia.org/wiki/Binary_prefix#History


I refuse to say mebibyte or whatever alternative unit. 1024 bytes is one kilobyte, and 1000 kilobytes is not a useful unit (and so on). As far as being a conspiracy by hard drive manufacturers, Western Digital did settle the case rather than win: https://arstechnica.com/uncategorized/2006/06/7174-2/


specifically their 80GB WD800VE [...] that had only 79,971,254,272 bytes (74.4GB)

That's even less than the decimal size, so that's absolutely false advertising. I'm not surprised they settled.


Ugh...

I used to have this conversation on a monthly basis with co-workers. Many people seem to incorrectly believe that network speeds are measured in base-2 (binary) where in fact they are measured in base-10, just like a hard drive. And, just like misunderstanding of hard drive capcity, network bandwidth is misunderstood to be greater than it actualyl is.... because people percieve bandwidth in the same base as the thing they send through the wire... a thing usually measured in 8-bit bytes, which are NOT 10-bit bytes. The correction of 2-bits per unit byte quickly magnifies to be a pretty significant error. That error is something like 1GbE being actually capable of handling 119 MiB persecond, rather than 125 MiB persecond.


The norm used to be that memory/storage used powers of 2 and data throughput used powers of 10. It was clear in the sense that everyone was following that norm. I suppose that the practice to use powers of 2 for memory came to be because binary data buses can address a power of 2 number of memory cells whereas an arbitrary number of bits can be transmitted per second over a channel.

This changed when hard drive manufacturers decided to market products using powers of 10...


What is the difference between a Software Engineer and a Mechanical Engineer?

Mechanical Engineer thinks that 1KB = 1000 Bytes and Software Engineer thinks that 1km = 1024 meters. :-)


The author doesn't even mention the fact that 1 B (byte) = 8 b (bit), which is, at least partly, causing the phenomenon.


That's why GiB was invented. To not mixup SI prefixes with binary maths.

1GiB equals 2^30


When was the last time you said "Gibibytes" out loud in a conversation?


I do it everyday. After it, I usually exit the room saying: "now I'm gonna install Arch Linux on my laptop"


why does the hardware addressability reasoning hold for RAM but not SSDs?


NAND flash is... weird. The old generation of SLC and MLC flash actually had more than the binary size, with the extra being used for error correction and wear leveling information. Newer MLC has larger "spare area", but is sufficiently unreliable that SSDs tend to round down the binary size to a decimal number and use the difference for replacing bad blocks as they wear out and fail.

TLC flash has a capacity that's actually a multiple of 3 times a power of 2, with an additional spare area. I believe a "128GB" TLC SSD might have somewhere around 150GB of writable bits.


Your computer does not have a problem with non-existing LBAs. The SSD controller might not like missing raw flash addresses, but that's abstracted away from the user perspective anyway, even if for no other reason than the FTL.


But spinning hard drive and SSDs are not working at the byte level, but at a power of two block. 512B to 4kiB usually.

You purchase a number of those blocks, and the total amount of byte is a power of two.

The filesystem then stores everything in it's own blocks, composed of one or more disk/SSD blocks.

Files on disk therefore consume a number of blocks, making their storage usage a power of two (the real size can by anything of course).

And you might as well align blocks to memory pages when buffering/reading/writing the filesystem. Since you must align memory access for performances with CPU cache, RAM is loaded/stored itself in blocks of 256 or 512 bits those days (yes reading one byte from RAM will rapatriate that much) etc.

That's a lot of power or two (you could argue power of 12 [4kiB]) designed everywhere.


Because disk manufacturers can produce less drives per TB if they simply change the measurement. Not to mention they have a whole army of useful idiots who are willing to fight for their profits.


This author apparently isn't aware that awhile back, units of storage were changed to include another category specifically for the 2^n way of measuring. His article is wrong as a result. Technically a gibibyte is 2^30 and a gigabyte is 10^9.


> Finally, even for RAM calling 230 bytes "1 GB" isn't really proper; instead, the IEC binary multiplier prefix "Gi-" should be used




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: