Many many C/C++ programmers, even many of those employed by google, have a natural tendency to use a old C types like short/int/long etc. for integral types without thinking through X-platform issues or api interactions with other code.
Also, size_t is a frustrating beast. It's meaning is dependent on the platform. The Single Unix spec only calls for size_t to be an unsigned integer type. Now imagine you're writing code to compile over multiple mobile platforms as well as on x86_64 on the server side. Can you tell me what is the largest number you can address with that type -- without getting into a long google/stackoverflow session or hitting the compiler manuals for each of those platforms? If you absolutely want to make sure that your type can handle the values you expect it to handle, better give it well defined types provided by stdint.h (uint32_t is sooo much better than just int or unsigned int or even size_t for this purpose).
Now granted, you'd need to interact with external libraries (including libc/libc++) that'll want to use size_t etc. Not much you can do here but be very careful when passing data back and forth between your code and the library code. But that's been the lot of C coders since time began.
All you need to care about for cases like these, when you're talking about the size of something, is that both malloc() and new[] handle allocation size using size_t.
That, to me, says pretty clearly that "the proper type to express the size, in bytes, of something you're going to store in memory is size_t".
It can't be too small, since that would break the core allocation interfaces which really doesn't seem likely.
You don't need to know how many bits are in size_t all that often, and certainly not for the quoted code.
For cross platform interoperability, API with the exact size type helps remove any ambiguity. Using size_t might be fine for intra-process usage, but as soon as we are dealing with data across platforms, exact size type definition is a must.
I see it the other way around. How many bits you need to address something in memory depends on the platform. Thus `size_t` is the only cross-platform type you can use. A fixed-size integral is going to work on some, but not all.
That is correct. For file formats and packets etc you must use exact sizes.
However for cross platform support using size_t in an API (as in what is exposed via .dll or .so) is a must. It's exactly the correct way to write cross platform code.
Sounds like you are mixing up you data's in-memory representation with their storage/transmission representation. This is risky business.
If you have no requirement that says otherwise, you should have an explicit marshalling and demarshalling steps that transform your live data objects into opaque BLObs. It would be highly desirable if your BLObs have some header that contains metadata to be used exclusively for marshalling purposes, at the very least size of the payload, object type id and format version id will save you lots of trouble.
Now what happens if you need high performance and are willing to trade of code complexity for faster execution. You can just copy your native object's bytes into the BLOB payload, just as long as you can correctly identify the source platform's relevant characteristics in the header. Then when the target host does the demarshalling step, it can decide if the native format is compatible with it's own platform and just copy the payload into a zeroed buffer of the correct size. If that its not the case, it will have to perform and extra deferred marshalling step to put the payload in "canonical" format prior to demarshalling proper.
You can even make the behavior configurable, so that customers running an heterogeneous environment do not suffer a performance hit for the sake of the customers in homogeneous environments.
Of course the data in storage or over the wire needs to be marshalled and unmarshalled (whether explicitly standardizing on a particular wire format or with header based hacks or whatnot). That's not the point.
The point is that a lot of the times, the two machines on either end of the wire need to agree on sizes of various fields you're sending (say in protocol headers). And then you want to work with that data internally in the code on either side. You better be absolutely sure how many bits you have in each type that you're allocating for these purposes.
And going even beyond that, very common, use case -- a lot of code reads cleaner and lends itself to debuggability when you know the exact sizes of the types you're using. It's not something reserved for just network programming.
Sorry, I fail to see the point in your second paragraph. Of course in the business logic level you need to allocate variables that can hold every possible value in the valid range, but as long as this is the case, why does it matter that you use types that have the same byte size in every possible platform?
In your third paragraph, i agree on the debuggability front (if you are actually reading memory dumps, otherwise, why should it matter). About the code reading clearer, I guess this is more a matter of taste.
It matters because of code readability, debuggability and all sorts of code hygiene reasons. If I'm using size_t for a field in my protocol on a 32 bit platform on one end and 64 bit platform on the other, which size wins over the wire? Can that question be answered while in debugging flow trying to track down a memory stomping error?
> Can you tell me what is the largest number you can address with that type -- without getting into a long google/stackoverflow session or hitting the compiler manuals for each of those platforms?
Why does this matter? size_t is intended to be used as an index into a dense array, i.e. for every Index i you may want to store in a size_t, you also store i elements X of some data type. Since that number is limited both by the software architecture and the hardware available at runtime, why would you want to know exactly how many X you can store?
size_t is guaranteed to be able to contain size of any valid object.
You can never ever have a buffer bigger than what size_t allows. If you do then you're no longer talking about the C programming language as it explicitly breaks the C specification.
meta: This is one of those times when downvotes here on HN completely baffle (and I must admit, somewhat amuse) me. Who am I offending by sharing an opinion about integer types in C?
Expressing disagreement is expressly an acceptable basis for downvoting, and given the more general purpose of voting (expressing whether or not a contribution is valuable), while not all disagreement involves perception that the thing disagreed with isn't valuable, that something is inaccurate or wrong certainly can be a reason to conclude that is not a valuable contribution to a discussion.
Not the place to discuss, but 9/10 of my comments drop to 0 or -1 before going positive, even the ones that end up very positive. I don't worry about downvotes until the comment has been up for at least 30 minutes.
Also, size_t is a frustrating beast. It's meaning is dependent on the platform. The Single Unix spec only calls for size_t to be an unsigned integer type. Now imagine you're writing code to compile over multiple mobile platforms as well as on x86_64 on the server side. Can you tell me what is the largest number you can address with that type -- without getting into a long google/stackoverflow session or hitting the compiler manuals for each of those platforms? If you absolutely want to make sure that your type can handle the values you expect it to handle, better give it well defined types provided by stdint.h (uint32_t is sooo much better than just int or unsigned int or even size_t for this purpose).
Now granted, you'd need to interact with external libraries (including libc/libc++) that'll want to use size_t etc. Not much you can do here but be very careful when passing data back and forth between your code and the library code. But that's been the lot of C coders since time began.