Hmm, I've always viewed K&R as kind of a paragon of C perfection -- I first learned C from K&R when I was 16 or so and have used it as a trusty companion to systems C programming all the way through grad school. Not only do I view it as a good book to learn C, I also view it as one of the best computer science textbooks ever written, partially because it manages to encompass so much of the C and UNIX style along with the programming language itself.
So I guess my question is: why not K&R? Learn Python and Learn Ruby always made sense to me because there's a serious lack of definitive texts on learning those languages, especially if you've never programmed before. However, in my opinion C is not the best first language and there already exists a fairly definitive text on it. So I would love it if someone could let me know what I'm missing.
Wrong, I make a clear example of the copy() function being broken, give a demonstration of fuzzing it to break it, and show how to do it yourself. And, if you think the copy() function is valid, then you also think strcpy() function is valid, and therefore you don't know what you're talking about. Everyone who is aware of secure C coding knows strcpy() is buggy and the cause of most buffer overflows.
For higher level languages where we have intelligent string objects, yeah, bounds checking is assumed--but this is something originating in assembly-level stuff. If you call it with broken memory, of course it won't work correctly.
You're doing a good job spreading knowledge--don't spread misinformation.
Everyone who is aware of secure C coding knows strcpy() is buggy and the cause of most buffer overflows.
If strcpy() was truly buggy and unpredictable in its implementation, it wouldn't be nearly so useful as an attack vector. Be accurate--strcpy() is unsafe, not buggy. Sheesh.
He's right though. This really is a silly argument. Your life will almost certainly be better if you forget strcpy exists.
Really, C strings are broken in general. Computing their length is O(n) when it could be O(1), and a missing null terminator results in undefined behavior (a crash if you're lucky). Worse, copying an oversized string into an undersized buffer results in a buffer overflow, which gets you a nice mention here: http://www.kb.cert.org/vuls.
strcpy is about as safe as a hand grenade: you'll be fine if you know what you're doing, but God help you if you don't!
Realistically, if you're reading "Learn C The Hard Way," you don't know what you're doing yet. Don't use strcpy.
For that matter, you're human. You screw up sometimes. We all do. Don't use strcpy without a damn good reason.
Is there ever a legitimate reason to use strcpy over strncpy?
No, it's defective and buggy. You can't prove logically that the while loop inside will end given any random input. An implementation with a length will end given any input. That is a bug. If you wrote code like that then I'd flag it as a bug, so how is it that strcpy is somehow reclassified as "unsafe" but yeah totally not a bug?
It's so poorly designed that it should be considered wrong, buggy, defective, unsafe, and removed. To go around saying "it's totally alright" when even Microsoft, bringer of more virii than a whore house, has deprecated it:
Is idiotic thinking. Go ahead and come up with all the distinctive classifications you want, it's a defect to use strcpy because it's buggy, and copy() in that code is the same.
> You can't prove logically that the while loop inside will end given any random input.
C is all about layers, and at the bottom you just assume that "all input given to this function is safe". If you're passing random input (without any error checking) to pretty much any of the built-in functions, You're Doing Something Wrong.
Whether you call strcpy/copy buggy or unsafe doesn't really matter. The implementation on all platforms pretty much follows the standard, with its well-known issues. Sometimes it's the right tool for the job; sometimes not.
It's also important to remember that strcpy_s doesn't just magically solve all your problems. Someone might think that this code is safe:
strcpy_s(src, 10, dst)
But if dst is shorter than 10, you'll have a problem.
Going from ASCIIZ to length-encoded strings isn't something you can just do in the middle of a function; it requires changes all over the place. K&R was written with ASCIIZ and low-level programming in mind. There's nothing inheritly wrong about this; it has both its advantages and disadvantages. Your book is written with length-encoded strings in mind. (Which I think is the best way to teach today).
I love the concept of this chapter, and I pretty much agree with you in your critisms to strcpy/copy, but suddenly you go from "this code has issues" to "this code is bad, bad, bad; never use this code; only silly people would write this code" and so on (at least that how I feel when I'm reading it). I think you should place more emphasis on "this code was written to only work with certain input; look how bad it performs under fuzz-testing; see how easy we can fix it!".
I'm going to have to side with Zed here. If I'm teaching an introductory class on Chemistry, you'd better believe that when I reach the section on cyanide I'm going to tell students: "bad, bad, bad; never use this chemical"! If those introductory students were to take a more advanced class, then I would probably tell them: "well, ok, cyanide isn't going to kill you instantly and is actually really useful for a wide range of applications".
Part of being a good teacher is recognizing that there are limits to how much you can expect a student to learn at a given level, and then making sure their knowledge is as "complete" as possible within those limits.
> If I'm teaching an introductory class on Chemistry, you'd better believe that when I reach the section on cyanide I'm going to tell students: "bad, bad, bad; never use this chemical"!
I agree. So would I. However, what Zed is doing in the last chapter is showing code written by other people. If you taught your students about an experiment done by other (widely regarded) researches, would you say "bad, bad, bad; they should never have used these chemicals"?
I would say: "See, they used it here, but only because they were very, very, very careful. Let's explore different ways this breaks down. … As you can see, I'd recommend you to not do what these people did."
Teaching people (especially in natural sciences!) to immediately disregard something, just to be safe, is not conducive to learning. Sure, at later levels, you can teach them the truth of things, but what if they don't get there?
Then you have a bunch of people freaking out over incomplete information that they learned in school, and influencing dumb policies (e.g., hooplah about nuclear plants).
This is a borderline straw-man argument. Size (length) is one of the few things that you must be very aware of and control when programming C. You cannot "malloc" without knowing the length to malloc. You cannot create an array on the stack without knowing its size. In each case, you may not use the full memory allocated, but at least you can set an upper bound on the memory that you own. Thus, however you created or ingested the string to copy (and the memory to copy it to), you will have an upper bound on how much memory it is safe to copy.
This is about maintaining invariants, and I interpret the "any random input" from the parent question as "input that breaks the preconditions the function expects of its inputs."
With str* functions the assumptions are that the string is null-terminated and stored within a memory block of large enough size.
When providing string+length, "any random input" means that, e.g., length may be arbitrary; maybe it became garbage as a consequence of series of unfortunate integer overflows (when did you last check for those when checking that len1+len2 < buffer_size ?).
So, how DO you write a string-handling function that can safely handle ANY random input?
It would've been nice if C came with real string support, but it didn't. Instead, we're stuck mucking around with character arrays. All the built-in functions expect null-terminated strings, but many functions don't guarantee they'll generate these strings in all cases.
Look at strncpy. If the string you're copying fills the destination buffer completely, the function won't write the null terminator; the resulting string will blow up several C standard library functions.
If you're using null-terminated strings, it is your job to make damn sure those strings are always null-terminated.
It's so poorly designed that it should be considered wrong, buggy, defective, unsafe, and removed.
Look, think about the context here. Consider that the language is glorified assembly against a (somewhat) defined virtual machine.
The functional description of the routine is "Hey, so, look at this byte in memory and copy it to this other byte, leave if the byte copied was zero, otherwise increment both addresses and do it again."
There is nothing there that need be "proven"--it's all right there! If you have a sequence of bytes with a null, it terminates; if you don't, it doesn't. Boom. Done.
You can't prove logically that the while loop inside will end given any random input.
It's not intended to be used with any random input. Zed, programs are not written to turn arbitrary data into what you want--that entire notion is absurd. You have to make assumptions somewhere. strcpy() et al. are written with the knowledge that those assumptions are being made elsewhere in the system. With that being the case, the routine can be written as short as is needed to perform its task.
If you wrote code like that then I'd flag it as a bug, so how is it that strcpy is somehow reclassified as "unsafe" but yeah totally not a bug?
Perhaps because your bug detector is broken, and everyone else knows that tools can be dangerous and useful?
It is not a bug because it will do exactly what it claims to do, and nothing different--it is deterministic. It will copy bytes from one place to another through it hiting a null (or segfaults). This is not a bug--the domain of behavior is well known. The implementation is correct and the algorithm is simple.
It is unsafe because if you are incorrect in fulfilling the preconditions for using it (i.e., your source buffer isn't null terminated and your destination buffer is too small) it may have undesirable behavior.
It is unsafe because in the real world you get strings from other libraries whose authors may or may not have done a good job of terminating them in all annoying use cases.
It is unsafe because in the real world you can have another part of the program screw up and corrupt the string pointer it is being given, landing it in the middle of the heap far away from a saving terminator.
~
There are plenty of very good reasons not to use strcpy() when you don't control the entire flow into and out of it. You don't have to complain about it being buggy (which it isn't, at least in the distro of libc I'm using) or defective (which it isn't, as it does exactly what the spec says it should do).
You could just point out these issues and then show how to wrap them. The use of strncpy() would be an acceptable substitute here, as another user has pointed out, as it addresses the safety concern.
~
Stop encouraging new programmers to be afraid (blindly and without reason!) of their tools, and stop setting a bad example for being pigheaded and imprecise in the language of software development.
I don't really get what you're trying to say. I would understand if you said that strncat is broken (since it can output non \0 terminated strings, unlike the non-standard strlcat, breaking the principle of least astonishment) but I fail to see the problem here.
It's just garbage-in/garbage-out: strcpy expects a C string, and the char array you give it is not a C string, so it fails by triggering an undefined behaviour. The function has no way to detect this problem, so it's not even laziness.
Pascal strings embed the count within the string, but then you could forge a bogus pascal string with an incorrect size and trigger the same kind of problem.
He isn't aggressive, he simply is the inimitable Zed Shaw :)
> The function has no way to detect this problem, so it's not even laziness.
His point : therefore you should not use this function, but strlcpy instead. Or course the K&R has the excellent excuse of predating about every alternative secure version of strcpy.
> Pascal strings embed the count within the string, but then you could forge a bogus pascal string with an incorrect size and trigger the same kind of problem.
I don't think that you could easily forge such a string using Pascal standard functions, but last time I wrote Pascal there was neither syntax highlighting in the editor nor hard drive in my PC.
Anyway, strlcpy should be immune, because it will truncate whatever string of bytes to the required length, see:
The charge of being excessively clever is, I think, spot on, and exactly points to pedagogical issues. I think a lot of the examples in K&R are more amusing to those glancing through it who already know C, than to the novice trying to learn it.
I've always been ok with no braces for one line statements. But they have to actually fit on one line and not be another nested block. I recently found a style guide for javascript that recommended only leaving braces out for statements that could be fit on the same line eg:
if (something) doSomething();
for (...) repeatSomething();
What I will disagree with is that braces are completely free. I believe that code which is vertically succinct is a good thing. It allows you to see more context at once which is particularly useful with well designed, well structured short methods and classes. I've always been opposed to excess white space, like the random blank lines within methods that programmers seem to put arbitrarily with no rhyme or reason, other than their personal aesthetics of course. Also spaces around arguments, putting the opening brace on it's own line (which I find terribly redundant anyone can see when a block has started).
I've always liked the K&R style aesthetically, but again I'll agree that Zed has found a decent example of where not using braces just because you can is taken too far.
I don't think it's intuitively clear to a non-programmer and maybe not even to a non-C-like-language programmer why that while loop encompasses the first if and not the second.
I'll admit that I don't think I've ever used K&R C code verbatim[1] so I never saw how outdated all the stylistic code was. I would never recommend this primitive style in modern projects. Thanks for your insight Zed!
[1] Really, if you start dealing with the complexities of C at all K&R will no longer do it. First you'll need to read the ANSI C specification to understand what's supposed to be going on, then you'll need to read your compiler's documentation to figure out how they have actually done what the specification says they should do.
So I guess my question is: why not K&R? Learn Python and Learn Ruby always made sense to me because there's a serious lack of definitive texts on learning those languages, especially if you've never programmed before. However, in my opinion C is not the best first language and there already exists a fairly definitive text on it. So I would love it if someone could let me know what I'm missing.