Hacker News new | past | comments | ask | show | jobs | submit login
C Craft (stanford.edu)
148 points by fogus on Dec 18, 2009 | hide | past | favorite | 58 comments



I think it's telling that the very first example to use dynamic allocation has a memory corruption bug.

  struct {
    int n;
    char c[0];
  } *foo = malloc(16); /* sizeof(*foo) probably 4, so 12 bytes follow */

  foo->n = 16;
  for(int i = 0; i < 16; i++) { /* puts 16 bytes after foo->n. oops. */
    foo->c[i] = 'a' + i;
  }
Not that I don't love C for terseness and power or appreciate the systems one can build in it, but seriously.


I noticed that too, and what's weird is that's not at all how you should do it. Since the char c[0] has zero size you can do sizeof on the struct so that you can do the malloc correctly.

  struct thing {
    int n;
    char c[0];
  };

  int i_need = 16;
  struct thing * foo = malloc(sizeof(struct thing)+i_need);
  foo->n = i_need;

  for (int i = 0; i < foo->n; i++ ) {
    foo->c[i] = 'a' + i;
  }
And if your compiler won't let you do char c[0] you can do char c[1] and either blow a byte, or do the calculation correctly. Note that you have to be careful with this sort of allocation if you are on a machine that has certain alignment requirements (e.g. Sun).


Well, being pedantic, that should be

  struct thing *foo = malloc(sizeof(struct thing) + sizeof(char) * i_need);
since sizeof(char) isn't guaranteed to be 1 byte on all systems.


Actually, the C standard guarantees that char is equal to one byte.


More to the point, it defines a byte in terms of a char. sizeof(char) will always be 1, but UCHAR_MAX need not be 255.


His warning against using #ifdef is also misplaced. I have written assembler (GAS) using the GAS x86 asm preprocessor to write portable x86/x86-64 assembly code in one file. Far simpler and shorter than writing the same code in two files.

Basically, whenever he says "never" do something he means he doesn't like to do it. For example, his answer to not having a multi-level function break is a nested helper function. God no. That's one of the few times when it's correct to use goto.

Edit: Ahh, he also advocated always using mmap instead of the standard file IO functions. Wow that's bad.


I'm curious: are the standard IO functions that much more efficient than a mmap'd file? What makes you choose one over the other?

I usually just use standard IO, unless I have a large file that might be randomly accessed by multiple processes. In that case I'll use a mmap. But, I don't code in C very much anymore, so I was wondering how much of a difference there really is.


I did benchmarks on OS X (10.5) between asynchronous IO, mmap, stdio, and normal unix open/read (disabling caching) — it all came down to about the same after I had fiddled with page size, prefetch size, etc. to optimize each.

This of course depends on your OS (and version). My advice would be to pick API based on “semantics” (or need for portability). If you need to read through a file, use the API intended for that because mmap will not be optimized for that access pattern, but stdio will.


Mostly because mmap is not portable. If you're writing a Linux specific version of your code then mmap is fine. If you foresee running your code on other platforms then you shouldn't use it.

Basically, there's a reason why it's called "standard" IO. Generally you shouldn't use system calls unless you are absolutely, 100% sure that your code will only ever need to work on one system.


I've worked on a project that successfully uses memory mapping on all the major OSes.

mmap is (allegedly) portable across Unixes:

http://opengroup.org/onlinepubs/007908775/xsh/mmap.html

and you can do the same thing (with a few more hoops) on Windows.


Many people may not care, but mmap() often doesn't exist on hardware without a MMU.


There's no guarantee that it will be portable and act the same (even across the POSIX systems). I'm not saying you should never use it, just that it shouldn't your automatic answer to everything.


I'm curious about this too - why is mmap so bad?


It's not always bad -- very often it's the right thing to do. Just not "always".


Thank goodness. I haven't programmed C in a long time, and thought I was missing something when I saw that code.


Give a man enough rope to hang himself, and he'll shoot himself in the foot.


The criticism of Java language is very much to the point (re: verbosity, inheritance). There's some confusion where he calls anonymous classes implementing interfaces "abstract classes with virtual methods"; that's not to say that it's not an awkward approach to closures and function pointers. There's also criticism of synchronized and the Thread class, but there's no mention of the strong points (Doug Lea's java.util.concurrent).

However, the criticism of JVM seems like it's written in 1999 about the Blackdown JVM (Josh Bloch in his interview in Coders At Work attributes much distrust of Java earlier on at Google to experience with this platform). Hot-spot can, many times, be within 10-15% of C's performance. In many cases code running on the JVM will outperform C++'s runtime (and particularly boost).

That's not to say there aren't issues: garbage collection pauses, the stack based nature -- rightly -- is pointed out as an impediment, but it's odd that LLVM isn't mention. However, excluding JVM due to performance (especially when used with modern languages, e.g. Scala, Clojure, Ruby, Javascript) seems short-sighted. See Cliff Click's post on JVM performance:

http://blogs.azulsystems.com/cliff/2009/09/java-vs-c-perform...

Repeat after me: HotSpot is not Blackdown, 1.5+ is not 1.2 (you have non-blocking I/O -- even though NIO is still painful to use, you have autoboxing, you have foreach, you have -- for all their limitations -- generics, you have annotations), you can run more than Java on the JVM. Even if you do choose Java (which has many flaws, not the least of them being a marketing-driven language, aimed at enterprise C++ developers, rather than a language made by hackers for themselves), you don't have to use Spring and J2EE. In most cases performance is a MacGuffin when it comes to Java and JVM.


Thanks for the link, I always like an excuse to read what Cliff writes.


C is not simple and brief for all things, though - for example, details of resource management leak across library interfaces, complicating things. Sometimes using C is a reasonable trade-off, sometimes not. Still, it's small and simple, and it's weaknesses are reasonably well known. It's a useful tool, though prototyping and exploratory programming are not its strong suit.

I find I'm happiest working in a language that allows very easy integration with C, but provides garbage collection, some kind of module system, better strings, a REPL, etc. You have more flexibility while prototyping, but can rewrite the parts in C that need it, and all system calls (fork, dup2, etc.) are still accessible. Having a pervasive associative array ("dict" or "table") type in the language helps, too - it's an incredibly versatile data structure.

Lua works particularly well for this (and it's simple and clean, like C), though Python and Lisps/Schemes that compile to C work well too. (I can't vouch for Ruby here, since I already know Python and Lua and haven't bothered with it.)

See also: Andrew Koenig's "C Traps and Pitfalls" (http://www.literateprogramming.com/ctraps.pdf) and the book, which expands on the paper.


I think that what's particularly hard with C is not the details about pointers, automatic memory management, and so forth, but the fact that C is at the same time so low level and so flexible.

So basically if you want to create a large project in C you have to build a number of intermediate layers (otherwise the code will be a complete mess full of bugs and 10 times bigger than required).

This continue design exercise of creating additional layers is the hard part about C. You have to get very good at understanding when to write a function or not, when to create a layer of abstraction, and when it's worth to generalize or when it is an overkill.

Of course the same thing happens in other languages as well, and even in higher level languages, but usually in C there are more layers of abstractions required compared to a similarly complex project written in an higher level language.

Another difference with higher level languages is that with C the abstraction layers at the "bottom" usually have to deal with low level details that require some practice and knowledge. For instance it's common to implement data structures, some automatic memory management stuff like reference counting and so forth.

This is the reason why I think programmers should learn C and try to write at least a large project with it: it's a good exercise.


To an extent, this is the hard part about any programming language. As soon as you're writing programs that don't completely fit into the given system and language libraries, you will have to start writing those layers.

C just starts requiring these layers at a slightly earlier point in the abstraction continuum. On the other hand, the low-level abstractions are the easy ones: my experiences verify that in a medium-size project and up, with C you will have quickly built your own vocabulary and primitives and you get to the meat only a bit later than with some higher-level language.

I don't mean that C is all you need but that it's not the big problem in practice. I've seen so many large C projects where most of the code is about the problem domain itself and only a minor part is dedicated to overcome the C's lack of features and primitives.

I somehow recognize it as a good thing in C, forcing the programmer to build these layers early. You'll have to do that eventually and if you're so used to doing it already, you'll have more brain left for the actual problem itself.


Indeed. FWIW, _C Interfaces and Implementations_ by David Hanson touches on this (and a great C book besides).


The first section I looked at contains a buffer overflow:

http://crypto.stanford.edu/~blynn/c/ch03.html#_when_are_size...

Great C craft :-)


Why is this downvoted? It is a serious bug in the example code.


It might be because it says the same thing as another comment, which was probably posted earlier. As such, it's noise in the conversation.


I'm a huge proponent of C programming, but I have to say this article is not very compelling. Object-oriented programming is just as important in C as it is in C++, it's just the syntax is different.

I think my high regard for C comes from work experience: the C projects I've worked on have been less of disastrous messes than the C++ projects I've worked on. It's hard to pinpoint why, because on a small scale C++ is very nice. But I've noticed that once > 10 people get involved, C++ projects become more bug-prone and difficult to understand and fix.

I especially love ADTs in C. C++ encourages you to declare public and private and protected class members in a single class declaration in a single header file. Yes, you can work around this (ie, have a single private member m_priv), but I haven't seen it done much in practice. In C, on the otherhand, it's much easier to separate implementation details from the public interface. Put "typedef struct Foo * FooHandle" in the public header and the structure definition in the .c file. This is common practice in C and very elegant. As a coworker once put it, "I don't like the way people put private members in header files in C++... it's like seeing the class's underwear".


This is great. Every developer should be familiar with writing C and how to hook into it from higher-level languages (Ruby, Python, etc). This is a VERY powerful combination. C isn't appropriate for everything these days, but it can kick ass and take names at very specific tasks that need performance or access to OS primitives.


Or you can just write your performance sensitive stuff in Ocaml or Common Lisp or Forth, or if it really has to be super-fast, in Assembler.


I think the author of this document would really appreciate a lot of things in Go. Given his wish for := as the assignment operator, CSP semantics and some other things...


The author works for Google, and his most recent blog post is actually about the Go language:

http://benlynn.blogspot.com/2009/11/it-go-time_6644.html


And links back to the topic of this thread:

"Most of my wishes for C have been granted."


C provides my daily bread and I absolutely love the language.

That said, sometimes(I mean often) it does hurt when I have to work at a higher abstraction level in some fancier language and I keep thinking on what is going behind the scenes.


Running a profiler can reassure you that while yes, your HLL code may be doing decadently inefficient things for sake of developer convenience, most of the time it has a negligible effect on overall performance. Just rewrite those parts in C if you have to and you can get the best of both worlds.


If only it were that simple. Rewriting in C means a lot more than coding in C. It means that you're dealing with an entirely different deployment scenario and different platform dependencies.

Your build system needs to support it. Your testing procedures need support it. In some cases the platform does not support it at all (Google App Engine).

But most importantly, how many Ruby/Python/PHP developers are able to write safe C code? How many of them would you trust to write server side C code that could bring down the system or introduce hard to find memory leaks?


Honstly, OOP isn't that bad. This would be a lot better of a read if he focused on how C is good, not how he hates OOP. OOP is a valid way to solve any problem, though certainly not the only valid way.

Incidentally I wrote an article in defense of OO: http://www.zideck.com/blog/article.php?id=1


From http://www.zideck.com/blog/article.php?id=1:

> Well, not everything is an object. And I'm not talking about Java

> primitives, either, which should be objects. For instance, functions

> are not objects in most languages, and when they are it is confusing.

> See Javascript for further info on how this doesn't work.

....

I stopped reading here. It works very well.

(how the heck do we do preformatted text on HN?)


Indent 2 spaces at the start of the line on everything you want preformatted, and put it on a new paragraph

  preformatted code


This would be a lot better of a read if he focused on how C is good

Then it would be a Twitter post, not a blog post.


This isn’t a blog post. It’s the preface to a longer work – contents at the left.


My comment should have read, "there isn't much good about C."


Really, you should have just skipped it, because it’s a tired cliché that adds nothing to the discussion. There’s plenty to learn from C, if only in how to drive adoption of a language by making it an indispensable part of a useful system (in this case, UNIX). But the other side of this argument is also pretty tired after decades, so I’ll leave it there.


Marketing a language is of little use to hackers writing internal applications. What matters is good programming practice, which is often ignored in favor of "but ls uses it". This is not good.


The footnote on the entry page is telling: This classic shows why the design of the data structures is the true heart of programming, and why language choice is almost irrelevant. (In hindsight, this is obvious after considering how we program humans: textbooks can be written in any language; the tricky part is presenting the content clearly.)

I recently have concluded the same thing. I suppose all programmers reach that conclusion at some stage in their education. However, in context its pretty entertaining: a programmer explaining why C is better -- because for him, certain benefits of C outweigh certain deficiencies -- while simultaneously pointing out that programming is the art of manipulating data, which is largely an abstraction that lies above the particular representations in language X (well, mostly).

That being said, I thought he did a good job. I would have been proud to write this page.


> This classic shows why the design of the data structures is the true heart of programming, and why language choice is almost irrelevant

The `almost' is important. E.g. on the one hand you won't use any of Okasaki's beautiful purely functional data structures in a language without garbage collection like C. And on the other hand you will shy away from data structures that require mutation in a pure language like Haskell.


well, he mentions hating OO languages because it makes it hard for him to program in a style similar to C. That's a bit like saying you hate hate hammers because they don't work with screws very well.


No, his probelm with OO is that inheritance is problematic as a primary means of abstraction. It sounds nice on paper, but can lead to horribly unmaintainable code and wasting time arguing over specifics of tangled inheritance hierarchies. (This is a bigger issue in statically typed OO languages - being able to just send a message and get an exception if the recipient doesn't support that interface helps quite a bit.)

Also, he says that OO languages (really, only some) have weak support first-class functions and closures, since they place so much emphasis on objects.

His criticisms sound more accurate for Java and C++ than, say, Python or Smalltalk. (He mentions Eiffel specifically, but I have no experience with it.)


The point still remains that he arrived to this conclusion by misusing his tools. The problems of inheritance are not a universally recognized truth. There are plenty of OO advocates who believe that inheritance is perfectly fine. Maybe the issue with inheritance in OO languages is people misusing their language.


The problems of C are not a universally recognized truth. There are plenty of C programmers who believe that C is perfectly fine. Maybe the issue with C is just people misusing the language. /snark

Whether a truth is universally recognized has no bearing on whether it is true, quite irrespective of what OO advocates believe or who/what they choose to blame. The big issue with C? It's tough to use correctly. The criticism that is being made against OO? It's tough to use correctly. Why does OO[1] just get to opt-out of the criticism by blaming the user, but C can't?

[1] Not even OO, but inheritance in particular.


Perhaps because it isn't a coding issue, it is a design issue and the fact that advocates of any stripe tend to gloss over the problems with whatever they are advocating.


I agree with you, but designs are often constrained by what the tools expose or consider important. Believe me you dont get consulting jobs by saying Java inheritance is over-rated and that maybe more functional and less OO would be a better choice.


Do you want to imply that you get the consulting jobs by saying the opposite --- or by being more intelligent than those statements altogether?


Working on any reasonably large project in a (mainstream) OO language makes dealing with someone's code that misuses inheritance incredibly likely, and design errors in class hierarchies can inflict their problems on everything they touch. OO has been sold as a tool to reduce complexity, but can become a major source of it as projects develop.

Also, reading code with several levels of inheritance (more than three, perhaps) means dealing with several levels of spaghetti code - the code becomes a mess of "this does that, but no, wait, it doesn't anymore here, though here it does that except it does this first."


So design errors and language feature misuse are less likely in procedural languages like C? Come on, that's ridiculous. You can't judge a language by the people who misuse it, otherwise nobody would be using JavaScript and HTML these days.

Sometimes complexity is intractable. There are no silver bullets.


Sometimes complexity is intractable, sure, but I'd argue that on the balance, inheritance makes it worse. Programming to object interfaces (whether or not they're known at compile time) brings most of the same benefits without having to pigeonhole everything into trees and potentially rewrite tons of code when the design changes.


Ah, but hammers actually work pretty well on screws. It all depends on the depth of the thread.


Sounds to me like he was trying to do low-level programming using an OO language, and he finally "comes home" to a good low level language.


C is for simple low level system libraries. For all other kinds of computation one has scheme.


the -2 is harsh. although smugness isn't required to extol the virtues of scheme.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: