should be either char dest[8] = {0}; strncpy(dest, src, sizeof(dest) - 1); or ch...

pascal_cuoq · on June 6, 2018

Yes. “snprintf(dest, sizeof dest, "%s", src);” where dest is a char array almost is the programmer-friend version of strncpy() that does exactly what one would expect, neither more nor less. It always does '\0'-terminate its output as long as it is passed a size >0.

The only issue with that idiom is that it is only defined to pass a '\0'-terminated string for src. Although it will only write the specified number of characters, it will read from src until the end of the string, and invoke undefined behavior if src does not point to a well-formed string:

https://taas.trust-in-soft.com/tsnippet/t/e0538551

A mnemonic is that snprintf needs to return the length of the string that would have been output if there had been enough room (not counting the terminating '\0'), so it needs to compute the length of the string pointed by src.

A close variant, below, avoids this issue but instead has the problem that the printf star argument is typed as an int, which is narrower than size_t on a typical 64-bit compilation platform.

    snprintf(dest, "%.*s", (int)(sizeof dest)-1, src);

loeg · on June 6, 2018

> but instead has the problem that the printf star argument is typed as an int, which is narrower than size_t on a typical 64-bit compilation platform.

If you are using >2 GB buffers and anticipate >2 GB strings, it probably makes sense to track lengths explicitly and use memcpy() etc instead of string routines anyway.

pascal_cuoq · on June 6, 2018

I was more thinking of the case where >2GiB strings are not useful for normal use and the programmer does not anticipate them, but a malicious user can cause such strings to happen, for instance by sending them over the network in minutes or hours, causing unforeseen behavior.

loeg · on June 6, 2018

Sure, that's a good point.

However, while it may also be possible, in some code, for a malicious user to control the buffer size, the int precision argument, as used in this construct, derives from the buffer size, and not the input string.

If the user can control the buffer size, then yes, we get the very undesireable buffer overflow via overflow from positive to negative[0]:

    A negative precision is taken as if the precision were
    omitted.

[0]: snprintf(3) man page from linux

wahern · on June 6, 2018

An annoyance with snprintf is that it has a failure mode, and in particular it can fail with ENOMEM under memory pressure.[1] This isn't merely theoretical as glibc implements snprintf by reusing the stdio machinery--effectively instantiating a temporary FILE structure, buffer, etc. glibc tries to stack allocate these objects using magic constants, but the code is incredibly complex. It's been awhile since I dove into the code, but IIRC snprintf could fail if you try to compose strings longer than the magic internal constants and malloc fails.

strlcpy() has the same semantics as snprintf(buf, sizeof buf, "%s", src) but without the failure mode. Because glibc is so stubborn, and rather than trying to wrap snprintf() to abort on OOM, I invariably include this simple implementation in a common header,

  static size_t
  aux_strlcpy(char *dst, const char *src, size_t lim)
  {
    size_t len, n;
    len = strlen(src);
    if (lim > 0) {
      n = MIN(lim - 1, len);
      memcpy(dst, src, n);
      dst[n] = '\0';
    }
    return len;
  }

which is substantially simpler than the canonical version from OpenBSD.

A related problem with snprintf is the mixture of signed and unsigned types for communicating object size. Mixing signed and unsigned types is, IME, error prone, so if I have code that uses snprintf more than a few times I usually wrap it in a function that separates the status from the [logical] size return values. People complain that strlcpy is problematic because it similarly communicates status (i.e. truncation) and object size through the same channel. But 1) it's not as nearly as error prone as mixing signed and unsigned types and 2) idiomatic use of strlcpy is easy to code and mentally parse and I've never felt the urge to wrap strlcpy in a helper routine. If truncation is always a failure than I'll simply use a two-line routine that returns an error code. But more often than not I rely on silent truncation. IMO, C code shouldn't be doing complex string operations using native C strings[2], and where it does make sense to use C strings it's usually things like configuration values where garbage in is garbage out; if an overlong property name is truncated it's no different than if it was misspelled. (People seriously overestimate the utility of supporting dynamic object sizes everywhere, and underestimate the inherent complexity it causes, which is significant even in low-level languages like C++ or Rust that make it safer and more convenient.)

[1] Linux's overcommit doesn't save you from, e.g., process resource limits.

[2] In fact, IMO complex parsing and composition shouldn't even be done with any kind of generic string API. If you get to the point where you're parsing and composing highly structured text, you should be using proper techniques with specialized data types. Writing ad hoc string munging code is error prone and a maintenance nightmare in any language, including scripting languages. But if I must than not only do I avoid C, I avoid any low-level, statically typed language. Scripting languages were literally invented for writing ad hoc string munging code.

wahern · on June 6, 2018

Another important caveat to snprintf: because glibc's snprintf might use dynamic memory allocation it's not async-signal-safe. OpenBSD notably rewrote their snprintf implementation to be async-signal-safe (with the exception of floating-point formatting). They did this so extension functions like dprintf(), commonly used in signal handlers for debugging or logging, would be easier to implement, and also because a lot of software assumes that (or doesn't even consider whether) snprintf is async-signal-safe.

I wonder whether glibc's dprintf() is async-signal-safe....

masklinn · on June 6, 2018

Can't you just strncpy sizeof(dest) if you're going to zero-out the last byte of the buffer unconditionally?

loeg · on June 6, 2018

Yeah. You've "wasted" an extra byte copy, but that is usually so cheap as to be virtually free.

MBCook · on June 6, 2018

But just adding the null byte isn’t necessarily safe if you’re dealing with Unicode, right?

loeg · on June 6, 2018

Correct. You can't just truncate a utf-8 stream in an arbitrary byte position and expect it to remain well-formed.

BadThink6655321 · on June 6, 2018

The first is inefficient. The declaration statement writes eight zeros to dest. strncpy() writes len bytes of zeros to dest, then copies the string, stopping at fewer than len bytes if a NUL is found in the src.

tlb · on June 6, 2018

Yes, snprintf does the right thing.