Hacker News new | past | comments | ask | show | jobs | submit login

should be either

  char dest[8] = {0};
  strncpy(dest, src, sizeof(dest) - 1);
or

  char dest[8];
  strncpy(dest, src, sizeof(dest) - 1);
  dest[sizeof(dest)-1] = '\0';

On the same note I always wondered what "snprintf" does. That is - will it ALWAYS zero terminate (given non-NULL buffer, of size > 0)?



Yes. “snprintf(dest, sizeof dest, "%s", src);” where dest is a char array almost is the programmer-friend version of strncpy() that does exactly what one would expect, neither more nor less. It always does '\0'-terminate its output as long as it is passed a size >0.

The only issue with that idiom is that it is only defined to pass a '\0'-terminated string for src. Although it will only write the specified number of characters, it will read from src until the end of the string, and invoke undefined behavior if src does not point to a well-formed string:

https://taas.trust-in-soft.com/tsnippet/t/e0538551

A mnemonic is that snprintf needs to return the length of the string that would have been output if there had been enough room (not counting the terminating '\0'), so it needs to compute the length of the string pointed by src.

A close variant, below, avoids this issue but instead has the problem that the printf star argument is typed as an int, which is narrower than size_t on a typical 64-bit compilation platform.

    snprintf(dest, "%.*s", (int)(sizeof dest)-1, src);


> but instead has the problem that the printf star argument is typed as an int, which is narrower than size_t on a typical 64-bit compilation platform.

If you are using >2 GB buffers and anticipate >2 GB strings, it probably makes sense to track lengths explicitly and use memcpy() etc instead of string routines anyway.


I was more thinking of the case where >2GiB strings are not useful for normal use and the programmer does not anticipate them, but a malicious user can cause such strings to happen, for instance by sending them over the network in minutes or hours, causing unforeseen behavior.


Sure, that's a good point.

However, while it may also be possible, in some code, for a malicious user to control the buffer size, the int precision argument, as used in this construct, derives from the buffer size, and not the input string.

If the user can control the buffer size, then yes, we get the very undesireable buffer overflow via overflow from positive to negative[0]:

    A negative precision is taken as if the precision were
    omitted.
[0]: snprintf(3) man page from linux


An annoyance with snprintf is that it has a failure mode, and in particular it can fail with ENOMEM under memory pressure.[1] This isn't merely theoretical as glibc implements snprintf by reusing the stdio machinery--effectively instantiating a temporary FILE structure, buffer, etc. glibc tries to stack allocate these objects using magic constants, but the code is incredibly complex. It's been awhile since I dove into the code, but IIRC snprintf could fail if you try to compose strings longer than the magic internal constants and malloc fails.

strlcpy() has the same semantics as snprintf(buf, sizeof buf, "%s", src) but without the failure mode. Because glibc is so stubborn, and rather than trying to wrap snprintf() to abort on OOM, I invariably include this simple implementation in a common header,

  static size_t
  aux_strlcpy(char *dst, const char *src, size_t lim)
  {
    size_t len, n;
    len = strlen(src);
    if (lim > 0) {
      n = MIN(lim - 1, len);
      memcpy(dst, src, n);
      dst[n] = '\0';
    }
    return len;
  }
which is substantially simpler than the canonical version from OpenBSD.

A related problem with snprintf is the mixture of signed and unsigned types for communicating object size. Mixing signed and unsigned types is, IME, error prone, so if I have code that uses snprintf more than a few times I usually wrap it in a function that separates the status from the [logical] size return values. People complain that strlcpy is problematic because it similarly communicates status (i.e. truncation) and object size through the same channel. But 1) it's not as nearly as error prone as mixing signed and unsigned types and 2) idiomatic use of strlcpy is easy to code and mentally parse and I've never felt the urge to wrap strlcpy in a helper routine. If truncation is always a failure than I'll simply use a two-line routine that returns an error code. But more often than not I rely on silent truncation. IMO, C code shouldn't be doing complex string operations using native C strings[2], and where it does make sense to use C strings it's usually things like configuration values where garbage in is garbage out; if an overlong property name is truncated it's no different than if it was misspelled. (People seriously overestimate the utility of supporting dynamic object sizes everywhere, and underestimate the inherent complexity it causes, which is significant even in low-level languages like C++ or Rust that make it safer and more convenient.)

[1] Linux's overcommit doesn't save you from, e.g., process resource limits.

[2] In fact, IMO complex parsing and composition shouldn't even be done with any kind of generic string API. If you get to the point where you're parsing and composing highly structured text, you should be using proper techniques with specialized data types. Writing ad hoc string munging code is error prone and a maintenance nightmare in any language, including scripting languages. But if I must than not only do I avoid C, I avoid any low-level, statically typed language. Scripting languages were literally invented for writing ad hoc string munging code.


Another important caveat to snprintf: because glibc's snprintf might use dynamic memory allocation it's not async-signal-safe. OpenBSD notably rewrote their snprintf implementation to be async-signal-safe (with the exception of floating-point formatting). They did this so extension functions like dprintf(), commonly used in signal handlers for debugging or logging, would be easier to implement, and also because a lot of software assumes that (or doesn't even consider whether) snprintf is async-signal-safe.

I wonder whether glibc's dprintf() is async-signal-safe....


Can't you just strncpy sizeof(dest) if you're going to zero-out the last byte of the buffer unconditionally?


Yeah. You've "wasted" an extra byte copy, but that is usually so cheap as to be virtually free.


But just adding the null byte isn’t necessarily safe if you’re dealing with Unicode, right?


Correct. You can't just truncate a utf-8 stream in an arbitrary byte position and expect it to remain well-formed.


The first is inefficient. The declaration statement writes eight zeros to dest. strncpy() writes len bytes of zeros to dest, then copies the string, stopping at fewer than len bytes if a NUL is found in the src.


Yes, snprintf does the right thing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: