Many many C/C++ programmers, even many of those employed by google, have a natur...

unwind · on July 8, 2015

I disagree.

All you need to care about for cases like these, when you're talking about the size of something, is that both malloc() and new[] handle allocation size using size_t.

That, to me, says pretty clearly that "the proper type to express the size, in bytes, of something you're going to store in memory is size_t".

It can't be too small, since that would break the core allocation interfaces which really doesn't seem likely.

You don't need to know how many bits are in size_t all that often, and certainly not for the quoted code.

ww520 · on July 8, 2015

For cross platform interoperability, API with the exact size type helps remove any ambiguity. Using size_t might be fine for intra-process usage, but as soon as we are dealing with data across platforms, exact size type definition is a must.

arximboldi · on July 8, 2015

I see it the other way around. How many bits you need to address something in memory depends on the platform. Thus `size_t` is the only cross-platform type you can use. A fixed-size integral is going to work on some, but not all.

raverbashing · on July 8, 2015

> Using size_t might be fine for intra-process usage, but as soon as we are dealing with data across platforms, exact size type definition is a must.

I don't know why you are downvoted, but this is very important.

Never send anything "on the wire" (or to a file) unless you know its exact size and endianness.

sharpneli · on July 8, 2015

That is correct. For file formats and packets etc you must use exact sizes.

However for cross platform support using size_t in an API (as in what is exposed via .dll or .so) is a must. It's exactly the correct way to write cross platform code.

moonchrome · on July 8, 2015

A big part of the data I work with needs to be serialized and cross platform, having explicitly sized types is the only way to keep sanity.

crpatino · on July 8, 2015

Sounds like you are mixing up you data's in-memory representation with their storage/transmission representation. This is risky business.

If you have no requirement that says otherwise, you should have an explicit marshalling and demarshalling steps that transform your live data objects into opaque BLObs. It would be highly desirable if your BLObs have some header that contains metadata to be used exclusively for marshalling purposes, at the very least size of the payload, object type id and format version id will save you lots of trouble.

Now what happens if you need high performance and are willing to trade of code complexity for faster execution. You can just copy your native object's bytes into the BLOB payload, just as long as you can correctly identify the source platform's relevant characteristics in the header. Then when the target host does the demarshalling step, it can decide if the native format is compatible with it's own platform and just copy the payload into a zeroed buffer of the correct size. If that its not the case, it will have to perform and extra deferred marshalling step to put the payload in "canonical" format prior to demarshalling proper.

You can even make the behavior configurable, so that customers running an heterogeneous environment do not suffer a performance hit for the sake of the customers in homogeneous environments.

chetanahuja · on July 8, 2015

Of course the data in storage or over the wire needs to be marshalled and unmarshalled (whether explicitly standardizing on a particular wire format or with header based hacks or whatnot). That's not the point.

The point is that a lot of the times, the two machines on either end of the wire need to agree on sizes of various fields you're sending (say in protocol headers). And then you want to work with that data internally in the code on either side. You better be absolutely sure how many bits you have in each type that you're allocating for these purposes.

And going even beyond that, very common, use case -- a lot of code reads cleaner and lends itself to debuggability when you know the exact sizes of the types you're using. It's not something reserved for just network programming.

crpatino · on July 8, 2015

Sorry, I fail to see the point in your second paragraph. Of course in the business logic level you need to allocate variables that can hold every possible value in the valid range, but as long as this is the case, why does it matter that you use types that have the same byte size in every possible platform?

In your third paragraph, i agree on the debuggability front (if you are actually reading memory dumps, otherwise, why should it matter). About the code reading clearer, I guess this is more a matter of taste.

chetanahuja · on July 13, 2015

It matters because of code readability, debuggability and all sorts of code hygiene reasons. If I'm using size_t for a field in my protocol on a 32 bit platform on one end and 64 bit platform on the other, which size wins over the wire? Can that question be answered while in debugging flow trying to track down a memory stomping error?

claudius · on July 8, 2015

> Can you tell me what is the largest number you can address with that type -- without getting into a long google/stackoverflow session or hitting the compiler manuals for each of those platforms?

Why does this matter? size_t is intended to be used as an index into a dense array, i.e. for every Index i you may want to store in a size_t, you also store i elements X of some data type. Since that number is limited both by the software architecture and the hardware available at runtime, why would you want to know exactly how many X you can store?

chetanahuja · on July 8, 2015

"hardware available at runtime"

Because you want to communicate with other systems. Not just the system your code is running on at that moment.

sharpneli · on July 8, 2015

size_t is guaranteed to be able to contain size of any valid object.

You can never ever have a buffer bigger than what size_t allows. If you do then you're no longer talking about the C programming language as it explicitly breaks the C specification.

thrownaway2424 · on July 8, 2015

Actually the Google C++ style guide forbids short, long, and long long and recommends int only under the assumption that it is 32-bits wide.

chetanahuja · on July 8, 2015

meta: This is one of those times when downvotes here on HN completely baffle (and I must admit, somewhat amuse) me. Who am I offending by sharing an opinion about integer types in C?

detrino · on July 8, 2015

Your rant against size_t and portability is misplaced as size_t increases portability when used appropriately.

jnordwick · on July 8, 2015

That is called disagreement and shouldn't be downvoted. Moderation isn't to express agreement.

dragonwriter · on July 8, 2015

Expressing disagreement is expressly an acceptable basis for downvoting, and given the more general purpose of voting (expressing whether or not a contribution is valuable), while not all disagreement involves perception that the thing disagreed with isn't valuable, that something is inaccurate or wrong certainly can be a reason to conclude that is not a valuable contribution to a discussion.

detrino · on July 8, 2015

People downvote incorrect information all the time.

chetanahuja · on July 8, 2015

While you may disagree with my opinion on what is the better type to use, I believe everything I said in my comment was factually correct.

detrino · on July 8, 2015

size_t's raison d'etre is portability, you should be using it for things that are indexes/sizes into memory. FWIW, I don't have the karma to downvote.

aidenn0 · on July 8, 2015

Not the place to discuss, but 9/10 of my comments drop to 0 or -1 before going positive, even the ones that end up very positive. I don't worry about downvotes until the comment has been up for at least 30 minutes.