Beware of strncpy and strncat

nneonneo · on June 6, 2018

The advice given in this article is bad. Reducing the length of the copy by one will still fail to null-terminate the string if the source exceeds the destination length. You need to add

    dst[sizeof(dst)-1] = 0;

or memset the array to zero beforehand. You are not guaranteed to have a zero at the end of the input buffer otherwise (neither local variable arrays nor malloc’d arrays are guaranteed to be zeroed).

strncpy sucks for a second reason: it writes n bytes no matter what the source is. That means if you write

    char buffer[16384];
    strncpy(buffer, “hello”, sizeof(buffer)-1);
    buffer[sizeof(buffer)-1] = 0;

this will fill 16K of memory with zeroes despite only needing to copy a 6-byte string.

strlcpy/strlcat do the right thing (copy up to n-1 bytes and null-terminate); shame they aren’t standardized. In their absence, I suggest snprintf instead:

    snprintf(buffer, sizeof(buffer), “%s”, src);

Because snprintf returns the number of bytes that would be written, it can be used to detect overlong input strings, reallocate the buffer as necessary, and also to implement efficient concatenation. It’s also surprisingly fast in most implementations.

beeforpork · on June 6, 2018

Same advice everywhere, very good, my words exactly: use snprintf().

But beware of old MS _snprintf:

https://docs.microsoft.com/en-us/previous-versions/visualstu...

blattimwind · on June 6, 2018

> The advice given in this article is bad.

Thumbing through previous submissions of this domain with discussion, this seems to be a pattern.

belorn · on June 6, 2018

Its a shame that many of the solution to those exist as non-standardized calls and seems split between BSD and GNU. Personally I use asprintf quite a lot, and I recall strndup to be quite useful to create a identical copy of a buffer. strndup should have an identical performance to allocating a buffer and then strlcpy, since it too do copy up to n-1 bytes and null-terminate.

loeg · on June 6, 2018

It's a shame they aren't in ISO C, but strdup and strndup are at least standardized by POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/functions/st...

kdelok · on June 6, 2018

Is there any reason not to use snprintf for almost every application of string manipulation (aside from strduping)?

hxchen · on June 6, 2018

Reason: performance.

If you just do string manipulation, memncpy is faster.

If you need to convert data type, like int to string, then use snprintf.

nneonneo · on June 6, 2018

Yes, it seems I lied when I said sprintf is surprisingly fast. For most practical purposes it's fine, but if you need to do string manipulation in a hurry, strcpy/strlcpy are going to be faster.

See https://gist.github.com/nneonneo/f9dc5c1d1d342d25eae47d56dba... for a test harness. On my Mac, with a 16-byte string, 10 million loops:

    memcpy:     77406 us
    strlcpy:   101956 us
    strncpy:   177247 us
    sprintf:   730819 us
    snprintf:  699844 us

memcpy smokes sprintf - around 10x faster. The vast majority of sprintf's time is spent in setup; for large strings, the gap closes substantially.

hxchen · on June 6, 2018

Or use calloc. It does the same thing as malloc, but init memory to zero automatically.

tinus_hn · on June 6, 2018

If you’re going to allocate the memory you might as well just use the library functions that allocate exactly the space you need and copy the string into it. These tricks are for when you use the stack as an ‘optimization’.

laythea · on June 6, 2018

strncpy would have to assume that it was working on strings if that were not the case.....oh wait....

tlb · on June 6, 2018

Beware of the "correct" solutions proposed here.

  // OK: correctly copy src
  char dest[8];
  strncpy(dest, src, sizeof(dest) - 1);

With a long src, it fails to null-terminate dest. dest[7] will be whatever the contents of uninitialized memory were, so reading dest as a string is likely to run past the end.

strlcpy has a better API, though sadly it's not standard on Linux.

majke · on June 6, 2018

should be either

  char dest[8] = {0};
  strncpy(dest, src, sizeof(dest) - 1);

or

  char dest[8];
  strncpy(dest, src, sizeof(dest) - 1);
  dest[sizeof(dest)-1] = '\0';

On the same note I always wondered what "snprintf" does. That is - will it ALWAYS zero terminate (given non-NULL buffer, of size > 0)?

pascal_cuoq · on June 6, 2018

Yes. “snprintf(dest, sizeof dest, "%s", src);” where dest is a char array almost is the programmer-friend version of strncpy() that does exactly what one would expect, neither more nor less. It always does '\0'-terminate its output as long as it is passed a size >0.

The only issue with that idiom is that it is only defined to pass a '\0'-terminated string for src. Although it will only write the specified number of characters, it will read from src until the end of the string, and invoke undefined behavior if src does not point to a well-formed string:

https://taas.trust-in-soft.com/tsnippet/t/e0538551

A mnemonic is that snprintf needs to return the length of the string that would have been output if there had been enough room (not counting the terminating '\0'), so it needs to compute the length of the string pointed by src.

A close variant, below, avoids this issue but instead has the problem that the printf star argument is typed as an int, which is narrower than size_t on a typical 64-bit compilation platform.

    snprintf(dest, "%.*s", (int)(sizeof dest)-1, src);

loeg · on June 6, 2018

> but instead has the problem that the printf star argument is typed as an int, which is narrower than size_t on a typical 64-bit compilation platform.

If you are using >2 GB buffers and anticipate >2 GB strings, it probably makes sense to track lengths explicitly and use memcpy() etc instead of string routines anyway.

pascal_cuoq · on June 6, 2018

I was more thinking of the case where >2GiB strings are not useful for normal use and the programmer does not anticipate them, but a malicious user can cause such strings to happen, for instance by sending them over the network in minutes or hours, causing unforeseen behavior.

loeg · on June 6, 2018

Sure, that's a good point.

However, while it may also be possible, in some code, for a malicious user to control the buffer size, the int precision argument, as used in this construct, derives from the buffer size, and not the input string.

If the user can control the buffer size, then yes, we get the very undesireable buffer overflow via overflow from positive to negative[0]:

    A negative precision is taken as if the precision were
    omitted.

[0]: snprintf(3) man page from linux

wahern · on June 6, 2018

An annoyance with snprintf is that it has a failure mode, and in particular it can fail with ENOMEM under memory pressure.[1] This isn't merely theoretical as glibc implements snprintf by reusing the stdio machinery--effectively instantiating a temporary FILE structure, buffer, etc. glibc tries to stack allocate these objects using magic constants, but the code is incredibly complex. It's been awhile since I dove into the code, but IIRC snprintf could fail if you try to compose strings longer than the magic internal constants and malloc fails.

strlcpy() has the same semantics as snprintf(buf, sizeof buf, "%s", src) but without the failure mode. Because glibc is so stubborn, and rather than trying to wrap snprintf() to abort on OOM, I invariably include this simple implementation in a common header,

  static size_t
  aux_strlcpy(char *dst, const char *src, size_t lim)
  {
    size_t len, n;
    len = strlen(src);
    if (lim > 0) {
      n = MIN(lim - 1, len);
      memcpy(dst, src, n);
      dst[n] = '\0';
    }
    return len;
  }

which is substantially simpler than the canonical version from OpenBSD.

A related problem with snprintf is the mixture of signed and unsigned types for communicating object size. Mixing signed and unsigned types is, IME, error prone, so if I have code that uses snprintf more than a few times I usually wrap it in a function that separates the status from the [logical] size return values. People complain that strlcpy is problematic because it similarly communicates status (i.e. truncation) and object size through the same channel. But 1) it's not as nearly as error prone as mixing signed and unsigned types and 2) idiomatic use of strlcpy is easy to code and mentally parse and I've never felt the urge to wrap strlcpy in a helper routine. If truncation is always a failure than I'll simply use a two-line routine that returns an error code. But more often than not I rely on silent truncation. IMO, C code shouldn't be doing complex string operations using native C strings[2], and where it does make sense to use C strings it's usually things like configuration values where garbage in is garbage out; if an overlong property name is truncated it's no different than if it was misspelled. (People seriously overestimate the utility of supporting dynamic object sizes everywhere, and underestimate the inherent complexity it causes, which is significant even in low-level languages like C++ or Rust that make it safer and more convenient.)

[1] Linux's overcommit doesn't save you from, e.g., process resource limits.

[2] In fact, IMO complex parsing and composition shouldn't even be done with any kind of generic string API. If you get to the point where you're parsing and composing highly structured text, you should be using proper techniques with specialized data types. Writing ad hoc string munging code is error prone and a maintenance nightmare in any language, including scripting languages. But if I must than not only do I avoid C, I avoid any low-level, statically typed language. Scripting languages were literally invented for writing ad hoc string munging code.

wahern · on June 6, 2018

Another important caveat to snprintf: because glibc's snprintf might use dynamic memory allocation it's not async-signal-safe. OpenBSD notably rewrote their snprintf implementation to be async-signal-safe (with the exception of floating-point formatting). They did this so extension functions like dprintf(), commonly used in signal handlers for debugging or logging, would be easier to implement, and also because a lot of software assumes that (or doesn't even consider whether) snprintf is async-signal-safe.

I wonder whether glibc's dprintf() is async-signal-safe....

masklinn · on June 6, 2018

Can't you just strncpy sizeof(dest) if you're going to zero-out the last byte of the buffer unconditionally?

loeg · on June 6, 2018

Yeah. You've "wasted" an extra byte copy, but that is usually so cheap as to be virtually free.

MBCook · on June 6, 2018

But just adding the null byte isn’t necessarily safe if you’re dealing with Unicode, right?

loeg · on June 6, 2018

Correct. You can't just truncate a utf-8 stream in an arbitrary byte position and expect it to remain well-formed.

BadThink6655321 · on June 6, 2018

The first is inefficient. The declaration statement writes eight zeros to dest. strncpy() writes len bytes of zeros to dest, then copies the string, stopping at fewer than len bytes if a NUL is found in the src.

tlb · on June 6, 2018

Yes, snprintf does the right thing.

wyldfire · on June 6, 2018

> strlcpy has a better API, though sadly it's not standard on Linux.

I think it's because drepper hates us [1].

[1] http://www.sourceware.org/ml/libc-alpha/2000-08/msg00061.htm...

huhtenberg · on June 6, 2018

> Dammit, it is not safe. It hides bugs in programs. If a string is too long for an allocated memory block the copying must not simply silently stop.

This is spot on. There are very few cases when silently truncating a copy of a string is an OK thing to do.

wyldfire · on June 6, 2018

I think it's not genuine to describe the behavior of strlcpy() as "silently truncating". strlcpy() is documented as "If the return value is >= dstsize, the output string has been truncated. It is the caller's responsibility to handle this."

thefifthsetpin · on June 6, 2018

Silent doesn't imply undocumented. strlcpy's truncation is silent because it doesn't inform the caller about whether or not it truncated.

stephencanon · on June 6, 2018

The caller is informed: "If the return value is >= dstsize, the output string has been truncated"

akkartik · on June 6, 2018

How is returning a different value "silent"? It's informing the caller just fine.

This is C we're talking about. Always be checking return values. (And errno! And errno! And errno!)

bripeace · on June 6, 2018

Careful! Errno can be set even when there is not an error.it's value is only relevant when the return value of the function indicates an error.

That said there are some functions where the possible return values can't indicate an error. In those cases errno should be set to zero before the function call and then a change during will alert you to an error.

Why it's like this I have no idea but it's pretty annoying.

nothrabannosir · on June 6, 2018

> Why it's like this I have no idea but it's pretty annoying.

The bit about errno possibly being non-zero but not indicating error unless the function does? I guess because the called function can itself call functions / libs which set errno and properly handle those (or just not care). Without the "errno only has meaning if the function returns error" guard, every single function needs to always reset errno to zero before returning, which is annoying and will almost certainly not happen in the majority of cases.

When the function has no way to communicate error, there is no other option but to use errno. But it's good that it's the exception, I imagine.

Sohcahtoa82 · on June 6, 2018

But....it does inform the caller via the return value...what are you expecting it to do?

raverbashing · on June 6, 2018

I'll take truncating errors in place of non-null terminating errors, thanks.

Can it makes things worse? Yes. But C is broken.

loeg · on June 6, 2018

Ok, but, we've already got that kind of foot-shooting enabled by strncpy(), and glibc can't remove it as it is part of ISO C. If glibc must contain a truncating string copy, I'd much rather have strlcpy than strncpy. Do you find that persuasive?

matheusmoreira · on June 6, 2018

I think he's right. Treating strings as if they were normal arrays makes things easier and more uniform. The size of arrays should always be known.

I still keep the NUL terminator around for compatibility reasons. Linux system calls don't need the NUL, but other C code might depend on it.

gowld · on June 6, 2018

> 08 Aug 2000

Any progress since?

wyldfire · on June 6, 2018

I think he still hates us ;)

loeg · on June 6, 2018

On the plus side, Drepper left Redhat in 2010 and has been out as the glibc DFL since 2012.[0]

Amusingly, a request for strlcpy was the very 2nd comment on that LWN article.

Unfortunately, glibc still hasn't added it — Roland rejected it again in 2015.[1] Also unfortunately, RH hired Drepper back in 2017. Because they hate their other employees, I guess. Fortunately, no one is eager to give him back the reins of glibc.

[0]: https://lwn.net/Articles/488847/

[1]: https://sourceware.org/ml/libc-alpha/2015-11/msg00086.html

bitwize · on June 6, 2018

The C11 Annex K functions, strcpy_s et al., have a better API than even strlcpy, and are (an optional) part of the C11 standard. They are what you should be using.

rurban · on June 6, 2018

Nope. I'm the implementor of this variant, and even the safe variant was designed as unsafe.

In my implementation, the safeclib, I made it safe just recently.

dwc · on June 6, 2018

It's worth noting that strlcpy and strlcat are quite small, very stable, and can be brought into any program that needs them. Yes, it's an extra step but not an onerous one. Of course it's nice to autoconf them if practical.

dig1 · on June 6, 2018

Even strncpy() man page warns about this. Correct behavior is:

  strncpy(buf, str, buflen - 1);
  if (buflen > 0)
    buf[buflen - 1] = '\0';

unpopular42 · on June 6, 2018

If you expect that it's possible for buflen to be 0, you should check before the strncpy call, not after.

DSMan195276 · on June 6, 2018

Especially since `strncpy` takes a `size_t`, meaning that `buflen - 1` will turn from `-1` to `SIZE_MAX` due to the signed to unsigned conversion. I think OP just slightly misremembered what is on the man page:

    strncpy(buf, str, n);
    if (n > 0)
        buf[n - 1]= '\0';

Which still checks `n` after calling `strncpy`, like you mentioned, but doesn't have the underflow problem.

veli_joza · on June 6, 2018

If my count is good, this is correction of correction of correction of original article's correction. And average developer is expected to get it right the first time? It's a miracle anything built in C actually works.

orwin · on June 6, 2018

Yes. An average developper with knowledge in C will. Honestly, around my 3rd year i created a lib containing a struct for strings (well, two, and a couple functions working with them) and i don't think i ever used an strX function again.

I was pretty proud of the "string for already allocated buffer" part of this lib tbh.

Anyway, i'll never take C for a four-hour coding challenge, but for bigger, non-web project, C is really awesome, thank to BLAS and to the fact that most languages implement mostly painless C ffi.

DSMan195276 · on June 7, 2018

I would add that while I do normally use regular C strings rather then my own library, I still can't remember the last time I used any of the random `str*` functions. I don't do much string composition anyway, but the little I do is almost always using `snprintf` or variants (Which have a non-braindead API, unlike `strncpy`). Tons easier to read and much more effective then a bunch of `strcpy`, `strcat` and such.

The only thing I wish the standard library would have is an `asprintf` which would return an allocated buffer. I've written my own, but the easy version utilizing regular `sprintf` requires calling `sprintf` twice (Once to get the length, and once to actually put the string in the allocated buffer). A regular supported version would be much more efficient.

jwilk · on June 6, 2018

dig1 remembered it right. This bug was introduced in man-pages 3.62:

https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/...

DSMan195276 · on June 7, 2018

Huh, well that's embarrassing. It looks like it's still in there too. I googled the man page but I guess I saw an older version. I might submit a patch if I find time. It should really be something like this:

    if (n > 0) {
        strncpy(buf, str, n - 1);
        buf[n - 1] = '\0';
    }

Of course, the error isn't super huge considering if you're using a zero-length buffer as a null-terminated string you're going to have other problems. But they still do check for 0, so IMO it's worth correcting.

jwilk · on June 7, 2018

Matthew Kilgore submitted patch to fix this:

https://marc.info/?l=linux-man&m=152834469929748

dig1 · on June 7, 2018

Huh, I blindly pasted example assuming man pages are always right. Double check next time :D

DSMan195276 · on June 7, 2018

Yes, that would be me :)

kevin_thibedeau · on June 6, 2018

It still wastes time zero padding dest when src is shorter.

nebulous1 · on June 6, 2018

He's assuming that dest is zero-initialized, which I believe it will be as written. However, he should mention that because there will be plenty of similar cases where dest will not be initialized.

edit: As asveikau notes below, this is not the case

docker_up · on June 6, 2018

No, only global variables are zero-initialized. Neither stack variables nor malloced memory will be zero-initialized. OP should have memset to 0 or calloced the memory.

asveikau · on June 6, 2018

> He's assuming that dest is zero-initialized, which I believe it will be as written.

Given this snippet, incorrect. Stack allocated buffers aren't zeroed out as in the above declaration.

nebulous1 · on June 6, 2018

Well, then that's pretty bad!

TimJYoung · on June 6, 2018

I just don't understand why C hasn't been blessed with a proper string type yet. Object Pascal has had one (actually, several) for decades now and it doesn't hinder the language's ability to handle low-level memory manipulation (you can still manually copy string memory, convert them back/forth into raw Char pointers, etc.), and generally serves to make string handling much, much safer for most applications. It does however, result in slightly more memory consumption for the length/reference count tags and does incur some overhead in the form of compiler-generated reference count checks at the end of functions. But, IMO, the advantages outweigh the disadvantages for general-purpose C programming and you're always free to fall back to the more manual methods of handling character arrays.

So, am I missing something and there is some concrete reason why this can't be implemented ?

munificent · on June 6, 2018

I think it's mostly a switching cost problem.

Strings-as-char-* are part of the public APIs of damn near every C library, including the standard library, so there's no way to change that idiom without breaking the world or causing a huge migration tax. The kind of people using C are often doing so specifically because they want to avoid that kind of churn and want a maximally-stable platform, even if what it's stabilized to is sub-optimal.

TimJYoung · on June 6, 2018

Yes, but none of that would need to change. I hate to keep harping on Object Pascal, but it really is a nice implementation: in OP, you can pass a string to a C API, such as the Windows APIs, like this:

   Result:=CreateFile(pChar(FileName),
                      AccessMode,ShareMode,nil,
                      OpenFlags,AttrFlags,0);

where FileName is a String and pChar is the more traditional C-style pointer to an array of characters. The compiler will prevent you from passing a Unicode/Wide string to an API call that expects an ANSI string pointer, or vice-versa. So, interop with the system-level APIs of Windows or Linux is seamless and easy.

TimJYoung · on June 6, 2018

BTW, I almost forgot to mention: the String type in Object Pascal implicitly contains a NULL terminator, so the above call is zero-copy.

More information:

http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Internal_D...

under "Long String Types".

CodesInChaos · on June 6, 2018

A built in slice type (i.e. a (pointer, length) tuple) would prevent many of the issues with C strings without departing so far from the language design.

If done from the beginning, slices could have gotten the `T[]` syntax and array-pointer decay could have been replaced by the easier to use and safer array-slice decay. But of course backwards compatibility prevents this now.

wahern · on June 6, 2018

Backwards compatibility doesn't prevent adding a slice type with specialized syntax. The stumbling block is 1) defining sane semantics agreeable to a majority of people and 2) implementing it in a major implementation.

I hold out hope it'll happen, but I'm probably in denial. The fact that VLA function parameters were made optional in C11 doesn't bode well, but perhaps that's because it's incomplete--syntax and semantics stops short of what's needed to spur adoption of safer APIs. With a proper slice construct that was easy to use and integrate into idiomatic C code then there'd be more demand for implementing and using the necessary VLA compiler machinery, if not VLAs themselves.

pjmlp · on June 6, 2018

Easy, see how Go devs are reluctant to adopt features from other modern languages?

Travel back in time, when they were busy implementing C.

Compare how ESPOL, NEWP, PL/I, PL/S, PL/X, PL.8, BLISS, Algol 68 implemented arrays, strings and unsafe code blocks, and how C decided to go their own way.

BCPL was originally designed as means to Bootstrap CPL, not to be used alone.

See a pattern there?

Then AT&T could not sell UNIX, gave it away at a symbolic price of about $100 and the rest is history.

staticassertion · on June 6, 2018

How often does C add things like this in general? I want to say... basically never?

You're free to roll your own String type that has the length. No clue why people don't just do that though (I assume many do, but obviously a ton do not).

jacquesm · on June 6, 2018

> No clue why people don't just do that though

Most somewhat larger C shops will have their own internal utility library or they'll use one of the more popular batteries included packages which typically takes care of memory allocation, string handling, lists and a bunch of other useful stuff.

TimJYoung · on June 6, 2018

:-) You're certainly right here, but given that everyone thought that C was "going to be sent to the farm" a while ago and that shows zero signs of being true any time soon, it might be in the best interest of everyone to add such improvements, as long as they don't affect any existing C code. As I stated in my other reply, the only issue that I can see with rolling one's own String type is the handling of the reference counting for implicit deallocation.

scruple · on June 6, 2018

> given that everyone thought that C was "going to be sent to the farm" a while ago

For certain definitions of "everyone," maybe... I am no longer primarily a C programmer, and haven't been since around the end of 2013, and I'm still firmly in the "C isn't going any where" camp. I just happen to liken this stance to reality.

TimJYoung · on June 6, 2018

Just to clarify, I am definitely also not in the camp that thought that C was going away, rather that my general feeling is that most developers are afraid of it because of issues like this, and more sane semantics regarding one of the more widely-used aspects of any language would allow for greater adoption while not sacrificing reliability or backwards-compatibility. IOW, I think it would be a win-win.

pjmlp · on June 6, 2018

We cannot send C to the farm and keep UNIX like OSes around, due to their symbiotic relationship.

I really would like that something like SafeC or CheckedC would win the hearts of C devs

blub · on June 6, 2018

C doesn't really support properly encapsulated user-defined types, so yes, one can create their own String type but it will still be a pain to use and error-prone.

One example is the stretchy buffer library that I saw posted recently here.

vortico · on June 6, 2018

There are many object oriented string types for C in various independent GitHub repositories, but they're not really interesting for C programmers because they enjoy the simplicity of null-terminated strings and moving data around with `for` loops.

TimJYoung · on June 6, 2018

I understand that (one of my favorite books on my shelf is "C Interfaces and Implementations" that shows some of the cool stuff that you can do with any C implementation, including "proper" strings), but they're not something supported in the compiler, which is, unless I'm mistaken, necessary for the reference count checks.

My point is that implementing a new string type has zero effect upon existing C programs if they don't use the new string type, so I'm confused as to why it hasn't been done. If a C developer doesn't want to use them, then "no harm, no foul".

vortico · on June 6, 2018

Well, then you have two string types. The language will instantly become complicated as we indecisively choose between two different string types each function in our APIs.

I see no harm in adding more functions around the string type we already have, but as I mentioned in https://news.ycombinator.com/item?id=17248446, `snprintf` is the mother-of-all-string-functions that does everything you need, so not much else is needed.

TimJYoung · on June 6, 2018

I would argue that you still have one string type, while the traditional C "string" type is actually an array of characters, or a pointer to an array of characters. :-)

Re: snprintf - yes I saw that and it definitely does do most of the heavy lifting, but it still is something that the developer needs to handle manually (I know, I know, not everyone should be using C...).

vortico · on June 6, 2018

No, char* is definitely a string type. By having another one, that would be two.

Believe me, working with C and C++ code and converting back and forth between std::string and char* is a nightmare. Let's not design that into the language itself.

TimJYoung · on June 6, 2018

In C++, can't you just get a reference to the first character in the std::string and use it like a C-style string ?

jcranmer · on June 6, 2018

std::string isn't null-terminated. You need to use .c_str(), which allocates a copy that is destroyed when the string is changed (or is destructed).

thestoicattack · on June 6, 2018

It seems[1] that since C++11 .data() and .c_str() are the same function. c_str() is also documented as having constant complexity. If it made a copy, wouldn't it have to be linear?

[1] https://en.cppreference.com/w/cpp/string/basic_string/data

gpderetta · on June 6, 2018

Yes. Allocating on demand was a legal implementation before c++11. No standard library made use of the option and the latitude was removed in c++11.

I think it is still allowed to null terminate on demand.

hermitdev · on June 6, 2018

It doesn't have to make a copy. std::string is always null-terminated as of C++11.

vortico · on June 6, 2018

In practice, all modern C++ compilers and standard libraries just set a null character to the c_str()[len] position and reallocate the string if the capacity of the string buffer cannot contain the extra byte. It is never a linear operation unless you're working with old or niche C++ compilers. In C++11 this is required in the standard.

TimJYoung · on June 6, 2018

Ahh, yes, forgot about that - thanks !

vortico · on June 6, 2018

Yes, it is easy to go back and forth between std::string and char* using the std::string constructor and ::c_str(). The "nightmare" part is that when interacting with C from C++, you can never work with the internal std::string data, so you have to manage copies of buffers and copy it back into a std::string every single time you interact with C functions. It's really nasty to look at in large quantities.

TimJYoung · on June 7, 2018

Ahhh, okay, now I see the objection. Object Pascal allows you to use them interchangeably as you see fit with simple typecasts.

cozzyd · on June 6, 2018

There's also asprintf which is even easier to use (if you don't care about the dynamic allocation).

vortico · on June 7, 2018

If only this existed in the C standard...

You can call snprintf twice, the first time with a NULL buffer and zero-sized length, and the second with a newly allocated buffer with the size equal to the return value of the first snprintf call.

loeg · on June 6, 2018

Use strlcpy/strlcat instead.[0] strlcpy takes the full size of the destination buffer, limits the copy to N-1, and nul-terminates the result for you. It's like the "correct" example in TFA, but with less annoying boilerplate.

Some more verbose design/rationale for the really curious.[1]

Another thing to keep in mind is the sometimes surprising behavior of strncpy(large_buffer, short_string, sizeof(large_buffer))[2]:

    If the length of src is less than n, strncpy() writes additional null
    bytes to dest to ensure that a total of n bytes  are  written.

Strlcpy doesn't do that. Just use strlcpy. On Linux, it can be found in the libbsd package.

[0]: https://www.freebsd.org/cgi/man.cgi?query=strlcpy&sektion=3

[1]: https://www.sudo.ws/todd/papers/strlcpy.html

[2]: https://linux.die.net/man/3/strncpy

nine_k · on June 6, 2018

Probably back in 1970s `(characters, \000)` looked "elegant", and `(counter, characters)`, wasteful.

This decision, if not the single most prolific, is likely one of the top 3 sources of security exploits in C code. All for the want of saving a few bytes.

beagle3 · on June 6, 2018

Up until the early '90s, almost all the different Pascals had a (byte counter, character) string, which meant that if your string was longer than 255 bytes you had to do exactly the same kind of acrobatics that C did, and then some (for various reasons related to Pascal's type system).

Then they added 16-bit strings. And eventually 32-bit strings. Which are useless to keep that 5GB file in memory you process these days. I'm not up-to-date - does pascal have a 64-bit string type?

C, on the other hand, still uses the same error-prone and exploit inducing strcpy, strcat and friends - and the work for those 12GB gene texts and .csv files.

C never looked elegant, but it was extremely effective in the limited memory / limited CPU days of yore, and that's why it won against e.g. Pascal.

cornholio · on June 6, 2018

> I'm not up-to-date - does pascal have a 64-bit string type?

Seems like an easy fix is a single type of string with dynamic size depending on the first byte, like UTF-8:

0bbbbbbb : 1 byte header for a string up to 127 bytes long

10bbbbbb + hh : 2 byte header for a string up to 16384 bytes long

110bbbbb + hhhhhh: 4 byte header for a string up to 512 MB long

1110bbbb + hhhhhhhhhhhhhh: 8 byte header for a string up 1.15 EB long

... etc. (so as not to repeat the mistakes of Pascal, who knows how long will 1.15 EB be enough for everybody)

You would optimize the functions for the fast path with a single bit test for short <128 byte strings, and end up with a sane, safe string until the end of time. With minimal memory impact and much faster than testing for nul terminators, strlen() all over the place etc.

samatman · on June 6, 2018

This is the intuitively right answer, and has always been my favorite.

It's so puzzling that this is a data structure I can't recall ever seeing in the wild.

JdeBP · on June 7, 2018

You both might want to review how lengths are encoded by the Distinguished Encoding Rules and the Basic Encoding Rules of ASN.1.

samatman · on June 7, 2018

Sure, and source maps as well.

I meant as the builtin string type of a systems language, thought that would be clear from the context.

pjmlp · on June 6, 2018

C won because it came with an OS free of charge, with source code.

tntn · on June 6, 2018

Unix was neither open source nor free of charge. C won long before Linux came around.

pjmlp · on June 6, 2018

Sure it was, just not with the FOSS licenses of today, apparently you are too young to remember.

AT&T was forbidden to sell UNIX, so they provided the source code to universities at the symbolic price of $100, which was basically free when compared at how much something like VMS or z/OS would cost in licensing.

When Linux came into the scene, C was already on its way out, being slowly replaced by C++ on OS/2, Windows, Mac OS, Symbian, BeOS, NewtonOS as the way to go for writing applications. The later three were even written in C++ (Newton also had NewtonScript).

In 1994 we were already teaching C++ to first year CS students at my university. Proper C++, following Bjarne's ARM book, already talking about RAII and stronger type checking.

It was UNIX FOSS with the assertion that portable software had to be written in C that eventually pushed C everywhere, even to the platforms that were already getting cosy with C++.

tntn · on June 6, 2018

The price may have been symbolic for universities, but AT&T licensed Unix commercially for many tens of thousands of dollars.

pjmlp · on June 6, 2018

Learn the history of UNIX, that was almost a decade later, which was when they decided to sue BSD.

https://www.johndcook.com/blog/2010/11/16/why-att-licensed-u...

https://en.wikipedia.org/wiki/UNIX_System_Laboratories,_Inc.....

scruffyherder · on June 7, 2018

Indeed, it was quite fun to see that in order to bootstrap Research Unix v8, you needed 4.1BSD

https://virtuallyfun.com/wordpress/2017/03/30/research-unix-...

zerr · on June 6, 2018

> that's why it won against e.g. Pascal.

More like AT&T and Unix monopoly?

dpark · on June 6, 2018

Is a 5 billion character string really the way you want to read a 5GB file?

beagle3 · on June 7, 2018

No, but some people do; And some C programs written 30 years ago do support it, but no Pascal program does (not from 30 years ago, not those written 20 years ago, not those written 10 years ago, and not those written yesterday unless Pascal now has 64-bit length strings).

Furthermore, using mmap (which was always the right way to do it), it doesn't matter if it's 512KB or 1GB or 5GB or 20GB, and is supported by C string practices but none of the Pascal strings (or those suggested on this thread).

dpark · on June 8, 2018

On the other hand, the Pascal code that can't handle 5GB strings also doesn't have the same pervasive security risk from buggy string manipulation that C code does. Not a bad tradeoff, especially given that you probably want to work on substrings within that 5GB and slicing in C essentially doesn't work because null termination either doesn't exist for your substrings (making them incorrect and potentially unsafe, depending on what you're doing) or the substring null termination breaks your larger string and prevents it from actually behaving like a 5GB string.

Safe string manipulation in C devolves into always passing a char array and a count anyway, which is not coincidentally exactly what Pascal strings look like.

beagle3 · on June 8, 2018

Indeed safe strings in C are hard to do, but the nul termination is not the end-all-be-all of C strings; indeed, there is often a way to specify maximum string length (e.g. %.#s format in the prints family, strncpy). So mmapped strings are actually useful.

The bigger problems is embedded nuls, which can’t easily be sidestepped in the C strong framework.

Regardless, C won when the world was not yet well connected, and vulnerabilities mattered much less. It might not have won if the match was held today. The wake up calls were the Morris worm (intentional) and the AT&T outage (unintentional, and more of a logic error iirc) but by then it was too late.

asdfman123 · on June 6, 2018

What, you're saying that features that made sense for a programming language designed in the 70s don't make sense in 2018?

nine_k · on June 6, 2018

Note how it's not a feature of a programming language per se, but rather a detail of implementation of a common data structure. Likely strings not being their own type but being directly represented as `char[]` also looked elegant and commendably parsimonious back in the day.

Can't blame people who worked on machines with 128 kilobytes of RAM for that. They likely did not expect that their well-meaning hacks will take over most of the computing world.

pjmlp · on June 6, 2018

The people that worked in machines 10 years older than PDP-11 managed to have safer systems languages.

Still being sold today as Unisys ClearPath mainframes.

berti · on June 6, 2018

Plenty of systems still have less RAM than that (hint: microcontroller). Unsurprisingly they're overwhelmingly C.

pjmlp · on June 7, 2018

Those systems also have Basic and Pascal compilers, when one bothers to look around.

https://www.mikroe.com/compilers

Also many of those micro-controllers are more powerful than the computers where we ran CP/M on, where we had plenty of system languages to choose from.

They are have become overwhelmingly C due to synergy effects.

phanikaran · on June 6, 2018

Can you tell what are the other two though? I'm curious.

nine_k · on June 6, 2018

No default runtime checks for array bounds (allows to alter arbitrary data and often code), and no default runtime checks for bounds of stack allocation ("smashing the stack"). Both easily lead to RCEs.

jhallenworld · on June 6, 2018

There are many worse string formats than C's NUL terminated:

Last byte of string has bit 7 set.

First byte of string has length, so strings are limited to 255 bytes in length.

First two bytes of string has length, so strings are limited to 65535 bytes in length.

Strings are stored in fixed length buffers with space padding to the end.

Length prefixed strings are stored in a fixed length buffer, so you are limited to the buffer length. I think this was the case for PL/I "varying" strings.

Back in the day, C was better than PASCAL because it had strdup, meaning it had a heap and you could put strings in it.

C++ string is mediocre. I solves some problems, but what if:

You have very long strings and you are worried about heap fragmentation. So you are better to have something like a linked list of segments each in their own malloc block. But can you extend std::string? Nope, oh well.

You want strings to be semipredicates. I mean that strings should be able to have a NULL value, as I can do with C. (return NULL for 'char *'). Can std::string do this? Nope. Can it be extended? Nope.

thestoicattack · on June 6, 2018

> You want strings to be semipredicates. I mean that strings should be able to have a NULL value, as I can do with C.

A null value different from "emptiness"? Would you be okay with a std::string* or a std::optional<std::string>?

jhallenworld · on June 6, 2018

"since C++17", I see.. I'm curious if it uses more space than "char *"

Also it's not great because I should be able to pass such a string through functions that expect std::string. A NULL string should act just like an empty string except that you can test it for NULL.

thestoicattack · on June 6, 2018

Yes. std::optional<T> allocates the T in-place. Just checked compiler explorer, and on g++ 8.1, sizeof(std::string) == 32, sizeof(std::optional<std::string>) == 40.

(of course sizeof(char* ) == 8)

codehero · on June 6, 2018

If your strings are > 65535 bytes in length, consider buffering the string data or using another data structure.

spicymaki · on June 7, 2018

What is the ideal low level string implementation in your opinion?

jhallenworld · on June 7, 2018

There isn't one (because what you want for long strings makes short string operations too slow, or because you need strings stored in a certain format or in a certain place for other reasons).

So I like the idea of polymorphism so you can adjust the implementation. C++ has polymorphism, but for some reason not enabled for strings (they needed to use all virtual member functions to allow it).

bio_end_io_t · on June 6, 2018

The author should've been wary of strncpy, because his solution is wrong.

"If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated."

So copying sizeof(dest)-1 will not append a NULL byte as the author implies. You'll have to do that manually.

mturmon · on June 6, 2018

Article's solution is wrong, but at least the headline is right!

GuB-42 · on June 6, 2018

I usually avoid strncpy() and strncat() altogether

  char buf[256];
  size_t sz = strlen(str);
  if (sz < sizeof (buf)) {
  	memcpy(buf, str, sz + 1);
  } else {
  	/* error processing */;
  }

Truncation is not as bad as a buffer overflow. However, it is still not correct. You have to properly handle the case. And if truncating is the correct answer, make that explicit.

In practice, I almost never use fixed size buffers for strings unless I know the size at compile time.

Edit (for strcat):

  char buf[256] = "blah";
  size_t sz1 = strlen(buf);
  size_t sz2 = strlen(str);
  if (sz1 + sz2 < sizeof (buf)) {
  	memcpy(buf + sz1, str, sz2 + 1);
  } else {
  	/* error processing */;
  }

Someone · on June 6, 2018

”C programmers should use the newer strncpy()”

Newer? I thought strncpy dates back to the time Unix filenames were 14 characters, max, adding padding zeroes when needed in some fixed-length kernel structures.

That’s also the reason strncpy always writes len bytes; not keeping garbage content in those 14-byte buffers allows the system to use memcmp to compare file names.

hawski · on June 6, 2018

There is no idiomatic way to use strncpy, unless you're running the 7th edition of Unix [0].

[0] https://stackoverflow.com/a/1454071/6561829#6561829

ebikelaw · on June 6, 2018

C++ basic_string::c_str always returns a null-terminated string. C++ is the solution to numerous C pitfalls.

AnimalMuppet · on June 6, 2018

True. But not everyone can use it. If you can, it's (almost always) the right answer.

MrBingley · on June 6, 2018

Looking at the comments in this post, I'm resigning myself that there simply is no correct solution for copying or concating strings in C. Null-terminated strings are a fundamentally broken concept. I think the long-term solution is simply to move to a different language (Rust, C++, D, Go, whatever) where we have the benefit of hindsight and have (pointer, length) string types, which solve all the problems null-terminated strings introduce.

blub · on June 6, 2018

It's quite sad to behold how hard it is to work with strings in C.

All those functions seem broken by design, forcing the programmer to clean up their mess in edge cases.

mjevans · on June 6, 2018

You could also validate on interface transition and use your own type internally; which is often what happens when using another language and exporting C style library bindings.

Analemma_ · on June 6, 2018

I'm just shouting into the void here, but why does anyone find it acceptable that C is almost fifty years old– a half-century– and we still have new articles published about the correct way to copy memory. And then, immediately following them, comments in responds to those articles saying the article is wrong and that you should actually do it this other way. Nobody has figured this out in 50 years?

kazinator · on June 6, 2018

A few new people have to learn this stuff. That's not just to maintain legacy code. Computers do not have nicely behaved, safe, garbage collected strings. Someone has to understand the code for how that stuff is bootstrapped, and that code is going to have gunky memory copies in it where a one word mistake will bring down the show.

pjmlp · on June 7, 2018

It runs deep than that.

C suffers from a macho culture, where many developers deeply believe that only the others do mistakes.

Even Dennis was quite clear that C was not supposed to be used without help from lint (developed in 1979).

"Although the first edition of K&R described most of the rules that brought C's type structure to its present form, many programs written in the older, more relaxed style persisted, and so did compilers that tolerated it. To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions."

-- https://www.bell-labs.com/usr/dmr/www/chist.html

Guess how many do run lint or any other static analyzers as part of their build.

krapp · on June 6, 2018

The problem is - what even is a "string" in C?

An arbitrary number of arbitrary bytes that you hope ends in a null, but you'll never know unless you check, and even when you check, do you really know? Was that null really where that string was supposed to end?

There's never going to be a single, universally valid and correct way to deal with a "type" that is structurally just a step above random garbage.

legohead · on June 6, 2018

I'm with you. I started with C some ~20 years ago, and you learn very early on about null terminating strings. You are constantly thinking about it for every manipulation you do. This article isn't exactly ground breaking news.

It smells more like an amateur just got bit and decided to write up a blog about it.

vortico · on June 6, 2018

Don't use `strncpy()` and `strncat()` at all, use

    snprintf(buffer, buffer_len, "%s Stuff %s", str1, str2);

It's safe (C99, C++11) and easily extendible. Format strings are fun! Not the fastest, but if the bottleneck of your program is concatinating strings, just do it manually.

Hello71 · on June 6, 2018

This is a safe, if slow, alternative for strncpy, but it does not safely replace strncat. The C standard does not define the behavior if the code "sprintf(buf, "%s some further text", buf);" is used.

vortico · on June 6, 2018

Right, don't use `buf` as an input and output with `snprintf`. A lot of people use `str(n)cat` and `str(n)cpy` together like

    char *a;
    char *b;
    char *c;
    strcpy(c, a);
    strcat(c, b);

and I meant that `snprintf` can replace that. But if you actually only have two strings, one with a larger buffer than what it contains, and one to be concatinated, then you can't use `snprintf` like that.

legulere · on June 6, 2018

Strncpy gives back the length copied which you can use to append more text: char buf[1024]; size_t pos = 0; int ret = snprintf(buf, sizeof(buf), src1); if(ret >= 0) { pos += ret } else { error() } ret = snprintf(buf + pos, sizeof(buf) - pos, src2); ...

dmitrygr · on June 6, 2018

#define strncat(a, b, maxlen) snprintf(a + strlen(a), maxlen - strlen(a), "%s", b)

kazinator · on June 6, 2018

Too many evaluations of a. If you have snprintf, you probably have inline functions.

One day someone will do something like strncat(a += previous_delta, ...).

dmitrygr · on June 6, 2018

OPTION 1 (c):

  char* mystrncat(char *a, const char *b, size_t maxlen){
    snprintf(a + strlen(a), maxlen - strlen(a), "%s", b);
    return a;
  }

OPTION 2 (gccism):

   #define strncat(_a, b, maxlen) ({char* a = _a; snprintf(a + strlen(a), maxlen - strlen(a), "%s", b); _a;})

haldean · on June 6, 2018

It's a good solution for strncat for sure! I'd guess that the majority of strncpy calls look something like:

    char *x = calloc(n, 1);
    strncpy(x, y, n - 1);

Which can be replaced with:

    char *x = strdup(y);

Which is both one line shorter and 100% less error prone!

kazinator · on June 6, 2018

C99 also introduced swprintf for wide char strings, with a different return value convention. Just to add to the pain when you change char to wchar.

Under swprintf, and related functions, %s still takes a char pointer, not wchar_t! So when you make everything wide, you have to edit all the %s to %ls.

GlitchMr · on June 6, 2018

Worth noting that strncpy doesn't stand for secure string copy or anything like that. Using strncpy for copying strings would be a mistake, even if technically you can do that.

Rather, it's a fixed size string copy function. This structure is very rare in regular environments, but they can happen in embedded environments. For instance, if you want to have a string in binary file which is at most 10 bytes, you may want to avoid storing the termination byte when the string is exactly 10 bytes long. For instance, such a structure was used in UNIX to store file names, as they used to be limited to 14 bytes, and storing terminator would be a waste of space.

mlthoughts2018 · on June 6, 2018

I like learning about these caveats, but I have been asked tricky stuff like this in interviews before with gets() and the like.

As a person who interviews other people, I find that it's waaay more valuable that someone is generally aware that they should watch out for this class of pitfalls than that they know any specifics about a given function.

I've met people who basically had memorized the description of this phenomenon for gets(), but then their preferred solution was just to replace it with fgets() but then they don't know about checking for newlines or have any thoughts on what to do when individual lines are too long.

I'd much rather hire someone who says to herself, "Oh, I need to read some characters from an input source using C. RED ALERT! Let me really research the specifics here."

Instead of someone who thinks, "Oh, I need to read some characters from an input source using C. Good thing I memorized that trivia about gets() and can totally solve this in the best way immediately with the highest upvoted Stack Overflow solution of fgets() that I didn't bother to deeply grok."

I find that when interviews are geared towards puzzle solving or esoteric trivia, the people who do well are mostly of the second type (the ones I wouldn't want to hire).

Whereas someone of the first type might flounder around and struggle in a 20-minute programming task to process strings in C, directly because that person cares more about having a bigger picture point of view of what's actually going on rather than esoteric memorization of specific function signatures and usage mechanics.

In other words, if I gave some kind of C string processing question in an interview for 20-30 minutes, one very excellent answer should be, "sorry man, not gonna try to do this in 20 minutes because in reality I know there are string handling landmines I would need to research and slowly process, and I would never believe this is worth committing to memory for a short interview."

dsamarin · on June 6, 2018

Interesting how the "solutions" to the buffer overflow problems don't provide for all of the modern assumptions of programming with strings. I would love to know the history of the development of strncpy.

kazinator · on June 6, 2018

I suspect that strncpy was intended for filling in fields in records to be written to files, where you want all the bytes to be clean, with no random junk (potentially sensitive) after the null byte. For instance struct utmp in Unix or something of its ilk.

slrz · on June 6, 2018

Exactly. Or the fixed-length directory entries in old Unix file systems. If the file name is exactly 14 chars long, you don't care about NUL termination but if it is any less you want to zero out the remaining bytes. Strncpy is made for that.

It's not restricted to writing to disk. When these structures cross the userspace/kernel boundary on a system call, you really don't want to leave uninitialized bytes following the NUL terminator and return them to some user process.

wahern · on June 6, 2018

The modern use case for strncpy is for filling in the .sun_path member of struct sockaddr_un. Most people assume that the path needs to be NUL terminated, but the BSD Sockets API actually relies on the declared sockaddr length parameter. It's not superfluous and the kernel will only read .sun_path up to the end of the declared size of the sockaddr structure; it doesn't expect NUL termination though it will obey internal NUL termination.

Moreover, the statically declared size of .sun_path in the libc headers doesn't limit the maximum length of the path. On most implementations you can create domain socket paths larger than this. Indeed, when you use an API like getsockname() you normally should check for truncation by comparing the returned sockaddr length with the size of the buffer you passed. Just like with snprintf() and strlcpy(), if the returned logical length is greater than your buffer size the path was truncated. IIRC, not all implementations (or any?) include a NUL byte as part of the length so you can very well end up with a .sun_path that isn't NUL terminated if your buffer only barely fit the path. Likewise if you didn't 0-initialize the path buffer and the actual path was shorter, though IIRC kernels handle this second case differently--some might NUL terminate for good measure if there's space.

kazinator · on June 6, 2018

Furthermore, on Linux, there is an extension: the first byte of sun_path can be null. In that case, the rest of the path is still valid up to the given length and specifies an "abstract address": it's a namespace outside of the filesystem. Sockets bound to abstract addresses automatically disappear on the last close.

This drives home the idea that "damn it, this is not a null terminated string; here is the null-byte-based extension to prove it!"

rurban · on June 6, 2018

Interestingly even the Annex K strncpy_s and strncat_s are unsafe by design, that's why I only added them to the safeclib via --enable-unsafe.

But recently I got fed with all this unsafety nonsense with the truncating variants and changed the implementation to always terminate the asciiz string properly. https://github.com/rurban/safeclib/blob/master/src/str/strnc...

docker_up · on June 6, 2018

Not even this works. They forgot to force the last byte as a NULL, which is a classic bug in C. Either that or memset the char array before using it. But what the blog poster did is a pure bug.

kazinator · on June 6, 2018

NULL is a pointer constant; a byte is never described as NULL (in the context of C).

lkjalksdjfasdf · on June 6, 2018

I don't use pure C for string handling anymore. I use C++ and extern C ABI. Rust can also work. C++ can infer the destination size via templates.

pjmlp · on June 7, 2018

Some of us were already doing it in 1992, with our own string classes as passage rite, but adoption takes time.

kazinator · on June 6, 2018

Never memcpy structs, except if you need to ensure that the padding bytes are copied. C has had a = b assignment for structs since long before ANSI C.

chris_va · on June 6, 2018

For those similarly curious:

"""

char * STRNCPY (char s1, const char s2, size_t n) {

  size_t size = __strnlen (s2, n);

  if (size != n)

    memset (s1 + size, '\0', n - size);

  return memcpy (s1, s2, size);

}

libc_hidden_builtin_def (strncpy)

"""

... which is actually different than how I thought it would be implemented (it ends up with an extra loop to figure out the size of the string).

(side note, how does one format code in HN?)

kazinator · on June 6, 2018

> how does one format code in HN?

Two space indent.

rini17 · on June 6, 2018

1. Avoid sizeof, lest someone comes and changes the array into a pointer. Use constant parameter instead:

#define BUFLEN 8

char buffer[BUFLEN];

strncpy(...etc... BUFLEN-1);

2. Check length of the string to be copied BEFORE copying and if too long, fail (using assert, exit or so) instead of silent truncation or worse.

Why it is so hard and instead new "safe" string functions must be invented?

kazinator · on June 6, 2018

There is no absolutely foolproof way to code in C such that no matter how someone changes the program, they will be spared from making a mistake.

If we just program for today, sizeof buffer is much better than proliferating a preprocessor constant that may or may not correctly reflect the object being overwritten.

For the silly mistake of changing an array to a pointer without taking care of sizeofs, GCC gives us some diagnostics:

       -Wsizeof-pointer-memaccess
           Warn for suspicious length parameters to certain string and memory
           built-in functions if the argument uses "sizeof".  This warning
           warns e.g.  about "memset (ptr, 0, sizeof (ptr));" if "ptr" is not
           an array, but a pointer, and suggests a possible fix, or about
           "memcpy (&foo, ptr, sizeof (&foo));".  This warning is enabled by
           -Wall.

       -Wsizeof-array-argument
           Warn when the "sizeof" operator is applied to a parameter that is
           declared as an array in a function definition.  This warning is
           enabled by default for C and C++ programs.

professorTuring · on June 6, 2018

This link has nothing different to offer than "man 3 strncpy"...

Worse, some examples are invalid.

charlchi · on June 6, 2018

People complaining about memory safety in functions whose man pages explicitly state memory safety is not provided...

Hello71 · on June 6, 2018

In addition to tlb's point, the article's description of strncat is not correct.

> As with strcat(), the resulting string in dest is always null-terminated. > Therefore, the size of dest must be at least strlen(dest)+n+1.

jwilk · on June 6, 2018

They wrote:

  // XXX: after this buf may not be null-terminated
  strncat(buf, another_buffer, BUF_SIZE - strlen(buf));

This code is indeed bad, but not because the result won't be null-terminated, but because it's an off-by-one buffer overflow.

Despite this misunderstanding, the proposed fix is correct:

  // OK: ensure buf is null-terminated after concatenation
  strncat(buf, another_buffer, BUF_SIZE - strlen(buf) - 1);

jacquesm · on June 6, 2018

Bad advice is worse than no advice at all.

beeforpork · on June 6, 2018

Well, this is not very helpful advice, because strncpy(a, b, sizeof(a)) is in no way more safe than strncpy(a, b, sizeof(a)-1), because the latter is not 0-terminated either. And from malloc(), as in the examples, comes no 0-termined buffer, but random garbage memory. What would be safer is to alway 0-terminate the buffer after copying, and using the simplest copy possible:

    strcpy(a, b, sizeof(a));
    a[sizeof(a)-1] = 0;

But this is more boilerplate and hence more error-prone.

Even safer, use strlcpy() (if available) or snprintf() which both 0-terminate (except under Windows, maybe). (But beware when preparing something for copying from trusted to untrusted: strncpy() clears the rest of the buffer while strlcpy() and snprintf() do not, so you might leak info via uninitialised memory behind the end of the string if you copy out that buffer across a trust boundary. Actually, the authors 'sizeof()-1' solution is less secure in this context.) So, use:

    snprintf(a, sizeof(a), "%s", b);

And don't tell me anything about speed, please. Your main concern with C is not micro optimisations but robustness and avoiding undefined behaviour (and that snprintf() is not too slow).

And for multiple concats, use multiple snprintfs(), like so:

    char *i = a, *e = a + sizeof(a);
    i += snprintf(i, e-i, "%s", b1);
    i += snprintf(i, e-i, "%s", b2);
    i += snprintf(i, e-i, "%s", b3);

This is the most concise way I know to write this that works without buffer overflow (your main enemy, even more vile than missing 0-termination), without thinking too much, without writing too much boilerplate, and that is relatively robust against breaking in code restructuring (like, appending more stuff in the middle). The idiom also resembles a bit old style C++ iterators ('i' and 'e').

Oh, and a truncated string is usually not good anyway, be it 0-terminated or not. So you do need to check for that after all that stringing stuff:

    if (strnlen(a, sizeof(a)) >= sizeof(a)-1) {
        /* ... error ... */
    }

Don't miss that '-1' there. Off-by-one is another enemy to know well. And dispite that check handling missing 0-termination, do not be tempted to fall back to strcpy(), because missing 0-termination is bad(tm).

Phew!

C is bad with strings. The above resembles C++ iterators ('i' and 'e') and works fine with any good snprintf implementation (i.e., probably not under Windows).

And do not copy structs with memcpy, just assign them! memcpy() is for arrays only. This is not going to go away, is it?

nneonneo · on June 6, 2018

Mostly good advice, but the multiple `snprintf` example is wrong - it contains a buffer overflow. snprintf returns the number of characters that would be written, so when you do

    i += snprintf(i, e-i, "%s", b1);

i would end up past e if b1 is overlong. Then in the next line

    i += snprintf(i, e-i, "%s", b2);

e-i is negative, but snprintf takes a size_t so this will overflow badly.

The best solution, AFAICT, is the following:

    int res;
    res = snprintf(i, e-i, "%s%s%s", b1, b2, b3);
    if(res >= e-i) {
        /* handle overflow */
        return -1;
    } else {
        i += res;
    }
    /* subsequent snprintf's here */