Hacker News new | past | comments | ask | show | jobs | submit login

I didn't mention pointers regarding bits, I mentioned addressability - a bit cannot have an address (in any language I'm aware of), though of course you can have any number of ways of accessing it.



Pointers, as a language concept, don't have to correspond to the addressing schemes of the hardware or ISA. On some architectures instructions may only be able to address aligned whole words. Some microcontrollers (e.g. Intel MCS-51) feature bit-addressable memory. Apparently, there's a special __bit type supported by the Small Device C Compiler for using bit addressable memory on such devices, although I don't know if it has support for taking pointers to these.


They do not have to. But then it wouldn't be C, which by design has a straight forward and obvious mapping to the underlying machine.

For example, there are machines (some DSPs) that individual octects are not efficiently addressable and usually a C byte in these machines is 16 or 32 bita.


Pointers are very much a language concept and very much not an architecture concept. I enjoy this particular writeup that touches on some of the distinctions. Of particular interest is the fact that the C standard itself states that two pointers are not equivalent simply by virtue of having the same address value.

https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

I also happen to very much enjoy this piece on how the C abstract machine has very little in common with modern architecture.

https://queue.acm.org/detail.cfm?id=3212479


This exchange was an enjoyable read. C was designed for portability because they had those PSP computers or whatever they were but the problem is that each had its own unique architecture, switch arrangement for operation and maybe even endianess. I don't know. The whole point of the matter was to make a computer language portable enough by a person's desire to write a compiler for the architecture. Why people do not like that I can not comprehend.


They don't have to, but they're commonly understood to refer to memory addresses, which, on most ISAs, are locations of octets.

Even if the ISA only allows word- or dword-aligned loads from memory, the addresses still typically enumerate bytes, not words or dwords.

Based on a quick summary of the MCS-51 that I googled up, it looks like its memory addressing scheme still assigns addresses to bytes, and has special operations that allow you to further specify a bit offset within that memory address.


> it looks like its memory addressing scheme still assigns addresses to bytes, and has special operations that allow you to further specify a bit offset within that memory address.

There are also instructions which use an addressing scheme which takes an 8-bit bit address, with the 0x00 - 0x7f corresponding to lower memory, and 0x80 - 0xff corresponding to 16 specific registers in the Special Function Register set.


The 8051 has bit addressable memory.


Isn't a byte supposed to correspond to the smallest addressable unit of memory?


The original usage of the term "byte" was to refer to fields of variable length consecutive bits on a bit-addressable machine: https://en.wikipedia.org/wiki/Byte#History

Nowadays a byte is conventionally eight bits, especially for measures like "megabyte", but the term octet is often used to avoid ambiguity. Commonly they're used for pointers, yet often only words are addressable by machine instructions (e.g. many ARM instructions take a byte address yet raise a hardware exception on use of unaligned addresses).


Interesting, but I think the notion of a byte in C is different. But I'm not able to look it up at the moment.


It is, but it's defined rather weirdly:

"byte: addressable unit of data storage large enough to hold any member of the basic character set of the execution environment"

Hence why the type that corresponds to it is "char"! Beyond that, the only thing that kinda sorta implies that it's the smallest addressable unit is the definition of CHAR_BIT:

"number of bits for smallest object that is not a bit-field (byte)"


I think in other words what you say is that the C standard defines sizeof(char) = 1; so that 1 is one byte and that char must be one byte however different architectures can have an addressable space of a size different than 8 bits, 1 byte is not always 8 bits.

This might be why the code space alphabet is defined by the standard so it will at least put an emphasis on 8 bits == 1 byte.


C definitely doesn't require bytes to have 8 bits - it only requires them to have at least 8 bits. And there are architectures on which C char has as many bits as int (SHARC).

The question, though, was about whether it's the minimum addressable unit of memory. In the C memory model, it is, but by implication - you can't have two pointers that compare non-equal, but differ by less than 1, so a type with sizeof==1 is by definition the smallest you can uniquely address. However, the C memory model doesn't have to reflect the underlying hardware architecture.


SHARC has no such requirement. Having char and int the same size was not universal. The CPU vendor shipped such a compiler, but that was not the only compiler.

The CPU itself used 32-bit addresses to access machine words, the size of which was determined by what was being accessed. External memory was limited to 32-bit. Internal memory had regions that could be 32-bit, 40-bit, or 48-bit. An address increment of 1 would thus move by that many bits.

Mercury Computer Systems shipped a byte-oriented port of gcc. Pointers to char and short were rotated and XORed as needed to reduce incompatibility. Pointers to larger objects were in the hardware format. This allowed a high degree of compatibility with ordinary software while still running efficiently when working with the larger objects. There was also a 64-bit double, unlike the 32-bit one in the other compiler. Data structures were all compatible with PowerPC and i860, allowing heterogeneous shared memory multiprocessor systems.


You can implement byte addressing on any architecture, of course. That's what I meant by "the C memory model doesn't have to reflect the underlying hardware architecture". But as you point out yourself, this requires pointers which are basically not raw hardware addresses, and which are more expensive to work with, because they require the compiler to do the same kind of stuff it has to do for bit fields. So the natural implementation - with no unexpected perf gotchas - tends towards pointers as raw hardware addresses, and thus char as the smallest unit those can address.


It may well vary depending on which C standard you're talking about. ISO C defines both a byte and a char as at least long enough to contain characters "of the basic character set of the execution environment". They must be uniquely addressable. Although it seems their definitions don't preclude them from being different, or from sub-bytes being uniquely addressable by pointers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: