There's nothing magic about learning C versus any other language.
- Read an authoritative source (K&R is good; there are better ones)
- Read a bunch of good code (I mostly read tools and kernel sources)
- Write crappy code and get better
Generally I want to write 10K lines of code in a new language before I probably don't suck at it. Varies on the language and paradigm, going to C++ from C took like five years (figuring out OOP, and painful lessons on what to avoid in C++), and apparently I'm never going to understand Haskell.
I'm a self-taught programmer. I started with Pascal (what seems like an age ago), then C, then VB5/6, then C# and JS and a bunch of other programming languages. At this point I felt like I could pick up about any mainstream programming language very quickly.
And then I tried to tackle Haskell. That was about five years ago, and I think I'm still at beginner/intermediate level with it. But I realized something. These days I have a career and family. Back when I was learning Pascal, I had a lot more time and mental focus on my hands. All the programming languages I've learned since (besides Haskell) were not much different than Pascal, not fundamentally so.
Haskell is fundamentally different. I definitely feel like my past programming experience helped me learn Haskell faster than if I had no experience, but that past experience was also a handicap. I had to discard a lot of assumptions and get to a more basic mindset of what the art of programmming entails. And like a lot of other folks, I found it to be a very rewarding (and useful) experience.
Modern C# does provide some FP constructs (in the form of LINQ, lambdas etc.). I know that having used them a lot made it easier for me to grasp Scala.
Read an authoritative source (K&R is good; there are better ones)
K&R is a classic, but is it a good way to learn proper C programming technique?
This makes one wonder -- from Zed Shaw's online version of "C The Hard Way" [0]:
I myself believed that until I started writing this book. You see, K&RC is actually riddled with bugs and bad style. Its age is no excuse. These were bugs when they wrote the first printing, and the 42nd printing. I hadn't actually realized just how bad most of the code was in this book and recommended it to many people. After reading through it for just an hour I decided that it needs to be taken down from its pedestal and relegated to history rather than vaunted as state of the art.
He goes into more detail in his critique, e.g.
Where K&RC runs into problems is when the functions or code snippets are taken out of the book and used in other programs. Once you take many of these code snippets and try use them in some other program they fall apart. They then have blatant buffer overflows, bugs, and problems that a beginner will trip over.
There's more -- and he goes through some K&R code too.
Otherwise, it's just a different way of thinking about problems. Instead of building things structurally as an object of objects, and then creating functions that relate, transform, or describe the relationships between the object structures (like taking a pixel object, altering it's x coordinate, and returning the new pixel object with only the x coordinate altered), one has functions that can be composed with other functions to produce more complex object structures. A single base unit with a sequence of functional compositions can describe a very fancy, complex unit, with many fancy, complex characteristics.
>> Monads have also been explained with a physical metaphor as assembly lines, where a conveyor belt transports data between functional units that transform it one step at a time.[2] They can also be seen as a functional design pattern to build generic types.[3]
Before learning Haskell, try learning lisp through SICP. Haskell is wonderful but it's mostly (not really but bear with me) syntactic sugar over lazy application, which is explained in SICP (as opposed to eager evaluation used almost everywhere else). Syntactic sugar makes the whole thing appear like dark magic but it's not. It's a few minimalistic rules applied over and over (at both language and meta levels).
I've been programming (dabbling, to be honest) in LISP since about 1980, I've never shipped any products in it. I wrote a couple of toy interpreters in college (after reading Allan's Anatomy of LISP) and some small projects, but nothing massive.
I read SICP when it first came out and did most of the exercises. Yet Scheme was a terrible language to ship software in (commercial implementations were basically toys). I assume that Common LISP was a lot better, but the real packages were expensive.
I guess the real problem is that I didn't have cow-orkers who understood my obsession with LISP, nor did I work on any projects that I could really use it for. Could have switched companies, I suppose, but I liked working on operating systems . . .
Aight, to each his own, a lot of Haskell mystery vanished when I saw it explained in terms of simpler languages (pattern matching, lazy evaluation, curryfication etc etc) but apparently that's not what's bothering you.
I think FP languages' reputation of being hard to learn is partly owed to intimidating terminology that makes certain concepts appear more difficult than they actually are.
I really think C is much easier to get good at than other languages. There is no mystery with what's going on in the computer when you learn C, the builtin functions, how big something is, how things work, etc.
When you learn higher level languages there is significant mystery in everything. It means you can write a lot of code having no idea what's going on under the hood, which is good. That's the point of a higher level language, but it is certainly something you SHOULD know in order to call yourself an expert. These mysteries are sometimes what causes Good programmers to get hung up on odd edge cases .
Depends on the quality of the runtime. For instance, C# and Python are really good and well-documented. Your Java environment can vary quite a bit, but you probably have control over it.
Howlers like PowerBuilder and InstallShield are just hopeless amateurville and leave me giggling with awe at how fantastically bad they are; they are worth looking at if you want to get a sense of how lucky you are.
10K lines is very fast to make it to proficiency. It takes me a lot longer to 'not suck' at a new language, I'm not at all sure what a reasonable value here is but I don't usually consider myself slow. Probably I'll have to adjust that self image now ;).
Those don't seem like "these days" things. In fact, some is much more "old days".
Glib is just another library like any other, and POSIX is just a library plus specification of the details any platform would have to specify. And if you're not on something that tries hard to look like a *nix, POSIX mastery isn't going to help you.
Autotools is a plague slowly being eradicated, the problems it ostensibly solved have become far less serious to the point that they can be effectively dealt with by less insane means.
Make is increasingly replaced or supplemented with other tools (cmake, scons, ninja, ant if you're crazy), especially when builds start getting complex.
Heavy macro usage is not some modern idiom. In fact, "these days" it's indicative of laziness, poor design, premature optimization, or a religious refusal to use an OO language for OO code. Sadly there is a lot of this code out there, but yours doesn't have to be part of it. If writing your application involves many and/or complex macros, you should reconsider your approach.
I wouldn't regard POSIX as "just another library". It is the interface between your program and the operating system. In order to gain a healthy understanding of things like memory mapping, signals and process control, you have to go beyond the corresponding man pages.
Most higher level languages do have wrappers around these calls, but you rarely have to worry about things such as async-signal-safety or EINTR, for example. Knowing why such things are problematic isn't of much interest to the average Python programmer, while essential for writing correct and safe C code.
I wasn't talking about "higher level languages", really. Still, you have made a strange statement. Most obviously, if you don't worry about EINTR in Python, your code crashes. This was, in fact, the subject of a recent draft PEP (475) that generated a fair bit of discussion but not much consensus.
To the extent you're better off in Python or any other high-level language than in C, it's because things are removed from your control. Obviously, if you can't do something in a language, you can't do it wrong. That is uninteresting.
And of course EINTR and much signal behavior is POSIX-specific -- as in, you may not even have it (or necessarily anything like it) on other platforms. This, again, is not a "C" thing.
You make an excellent point. Perhaps more so than with other languages, getting to grips with making C actually useful relies a lot on quite extensive knowledge of the build environment.
You also have to know about how computers work, how they represent numbers, how and where they store memory, how computers process information, you have to know about interrupts low level os stuff, networking protocols, physical hardware limitations, debugging low level byzantine failures. You have to know a lot more than just C syntax. I haven't even mentioned computer science, best practices, working with others, architecture and UI topics.
It's not only the build environment. Mastering C goes way beyond the superficial level of just learning the language and the libraries. Being a very thin abstraction over the machine, you will inevitably encounter some quirks which can only be explained from the perspective of a compiler writer or a hardware designer. Also, POSIX is not just another library, it's the interface between your program and the operating system. Knowing what to call and how to call it ultimately requires a good foundation in OS theory.
How is knowing useful libraries for C any different to knowing useful library for other languages? Perl (about the only other language I have any experience with) seems to be dependant on modules which you struggle to compile through cpan.
Not mentionned, about being an expert C programmer, is knowing the pitfalls of C, (cf. undefined behaviors), and reading and knowing the ANSI C standard.
Of course, just knowing the language in and out is not enough, you also have to be a good programmer in general (algorithms, "design patterns", software architecture, software engineering, etc).
But writing C code without undefined behavior, and avoiding its numerous pitfalls is absolutely necessary.
"Not mentionned, about being an expert C programmer, is knowing the pitfalls of C, (cf. undefined behaviors)"
I am an expert in C, I had been decades writing on it and other languages, and managing teams of coders. We created a company that used it a lot.
I can't understand what undefined behaviors C has, because it is the most simple and defined language I do know of. I have lots of experience writing assembler,fortran, Lisp, c++, python, objective C and also use C#, java, javascript and other web languages and functional languages from time to time.
c++, java could be extremely undefined because the behavior depends on conventions, committees and implementations. E.g we had to change some code because of different compilers interpretation of the standars.
But c? c is basically portable assembler. If you understand how computers work it is extremely reliable bar none. We have ported years of work of dozen programmers in one day. With c++ and "write once, run everywhere marketing" java we spent months.
It was also terribly frustrating for the team. Fontforge author had a similar experience writing c++ compilers and that experience was so traumatic the interface of Fontforge is so ugly as he does not want to use c++ with a ten foot pole.
We try to avoid pitfalls more in high level languages because the programmer has the ability to write code without understanding what is really happening in the processor, or even in the program. Recipe programmers population is growing a lot this days.
Wasn't part of the scope of the language. It's a doble edged sword, but I could see that being a good thing too. If you want raw speed, and know what you're doing, it could be OK not to have overflow guards.
> I can't understand what undefined behaviors C has, because it is the most simple and defined language I do know of.
C has lots of undefined behaviors because the language was designed to be easy for compiler writers to implement. As such, a lot of decisions were left to the compiler writers, which is what "undefined" means. Here is a (probably partial) list of undefined behaviors in C. (For a full list you'd have to search through the standard): http://blog.regehr.org/archives/213
Edit: Linked to a C++ article by mistake, so I changed it.
Sure, there is a lot of code you could write that results in undefined behaviour. However, none of it is code that you should write, nor is it code that an expert would write.
While your sentence can be true until we get to defining “expert”, the problem is that there are not going to be many experts if to be an expert at C, one has to be able to write useful code that does not accidentally invoke undefined behavior.
- a famous undefined behavior was found in OpenSSL recently, but there were and remain plenty more. Sure, the authors of OpenSSL aren't experts, the problem is, they are the people writing and maintaining OpenSSL.
- The Linux kernel has had its share of undefined behaviors. Usually, the developers blame it on the compiler, of which the kernel admits only one (GCC—at least as of recently). There was the time when GCC was blamed for taking advantage of strict aliasing rules, and there was GCC removing of a NULL test on an execution path that dereferenced NULL. If the developers wrote for more compilers, they would realize that the optimizations practiced by GCC are practiced by other compilers too, because they are justified by undefined behavior in the source code.
- I could go on.
I have quoted three useful, widely deployed C pieces of software that have contained undefined behavior, and likely still contain more. Can you name one nontrivial C program, written by an expert according to your definition, that you are confident does not invoke undefined behavior?
Undefined behavior is defined as 'it does what it apparently does without guarantees to future behavior'. So that code that you should write and that an 'expert' would write that implements something that depends in a non-obvious way on undefined behavior (and it is surprisingly easy to do that) will possibly break in hard to detect ways (if it is detected at all, which is more dangerous) at some point in the future when said undefined behavior is changed.
The only way to work around that is not just by becoming an expert at the language but also by becoming an expert at the implementation details of the language and that's a domain that not many programmers are comfortable in. That way you can make sure you stay away from the undefined behaviors as much as possible.
Bad car analogy: not only do you have to learn how to drive the car, you also need to know that the combination of wind across the left front+rain+loud music on the stereo will sometimes cause a wheel to fall off. This is not usually considered a useful level of knowledge and so C programmers (experts too) all find that their knowledge of the limits of C has a partial overlap with the real limits of C as implemented by their particular toolchain. And in between the cracks lots of nasty stuff can happen.
Dead code elimination, different sequences of optimization phases (changing from one compiler to the next) can introduce very subtle bugs in your code and I highly doubt 'most experts' would know how to distinguish between some very innocent code that will almost certainly bork on some standards compliant compiler with optimization 'on' or 'max' versus similar looking code that will work just fine.
Then the only 'expert' is a computer, for even the best human will occasionally make undefined behavior inducing mistakes. And 'occasionally' is frequent enough to be a problem.
I wouldn't call myself an expert, but I avoid calling C portable assembler. I tend to think of C as having low level data manipulation and high level flow.
It might not be applicable anymore, as the compiler optimizations become more advanced to the point that it is hard to predict the translated machine-code. This is one of the most important points with regards to tool-chain specific knowledge. The other is proper usage of the debugger.
> I can't understand what undefined behaviors C has, because it is the most simple and defined language I do know of.
"Undefined behaviour" is a very simple and well-defined term when it comes to languages like C. Undefined behaviour is code who's behaviour is specified to be arbitrary. And languages like C is famous for having them.
(I'm assuming that you were being honest about not knowing about undefined behaviour means in C, and not trying to re-frame the word to mean something else.)
An example is overflow on signed integers. Strictly speaking it does not help that you know "how computers work" since you have no guarantees about what will happen if an overflow happens in your program. You can't assume that a wrap-around will happen and base your program on that, since that might be a faulty assumption on some compilers.
Undefined behaviour allows the compiler to assume that it never happens and introduce optimizations based on that assumption. For example, use of initialized variables is undefined. This allows the compiler to avoid having to initialize variables (like arrays) to some default value when they are declared.
Now in this, it is undefined whether g() is going to be executed first or h(). The precedence in this case, I mean. Precedence is well defined for operators, but there are certain things, which I still haven't figured out yet.
> But writing C code without undefined behavior, and avoiding its numerous pitfalls is absolutely necessary.
I rarely (as in unicorns) see C code that consistently checks for unsigned integer overflow. Or signed integer overflow for that matter, but that's defined behavior.
Point being, I also encourage people to avoid undefined behavior, but whether and how one does that has subtleties in practice.
Signed overflow is undefined, unsigned overflow is defined. But I agree with you. Not enough people take integer overflow seriously.
Unsigned underflow, perfectly well defined behaviour, is actually a worse problem in practice, IMO. I think too many C programers think about unsigned types like they're a bounded type, e.g. they think "this variable can never be negative, therefore I'll give it an unsigned type". But unsigned types have a cliff of surprising behaviour right where they're commonly used - most integer values in programs are low. Whereas you really need to work on a signed value to get it to overflow.
If you want to become an expert C programmer, you have to know C As She Is Spoke: Compiler quirks, inclusive of bugs and non-standard enhancements, which can either trip you up or give you a lot more expressiveness, if you're tasteful about where and when you use them. For example, the GNU C typeof() operator makes certain macros a lot cleaner, but it's nowhere in the actual standard. I'm sure other compilers have similar extensions.
Id's Quake and Doom sources are a classic in this matter. I looked a bit at wsw, a fork of Quake 2. Now, my code is full of structs with function pointers. It's like I just internalized the more elaborate syntax and now it's all Hammers and Nails. I mean, function pointers to functions returning function pointers, for example.
typedef int (*foo) ();
foo * bar ();
foo * p = bar;
not rocket science, but ambiguous at first sight when compressed to one line. I hope that kind of syntax is as complicated as it get's, because I'd like to think C semantics aren't really difficult. That is until implementing customized data types on top.
As mentioned in another comment, learning the toolchain is just as important and it's not without reason, that Don Knut takes forever writing about compiler design.
Never mind the ventures into the algorithmic side of things.
For me c is kind of a wierd language. Wierd in that the language itself is incredibly simple, I think anyone could learn c syntax and usage in about 2 days max. But I think where most people get hung up with on c are the concepts, you do need to know about compiling and linking, static vs dynamic libs, lots of details about how computer architecture / memory works, and to get anything done you need know POSIX and the concepts behind any libraries you use. Its not like ruby or other dynamic languages where you could basically have no idea how HTTP works and still write a webapp.
The development of a (good) web-server in pure python likely still requires knowledge of language internals. If depending on libraries, it just depends on the libraries. Web frameworks with "Batteries included" seem rather rare on the C shore, though, maybe because statically compiled languages are in general not used as frequently in web-development. Therefor it's an unfair comparison.
To leverage C's power, libraries should also tend to stay general enough, too, no?
Most compilers or interpreters are complicated under the hood. Fixing a bug in the compiler to get your code working (and submitting a patch), that's mastery.
C is still my favourite language. Sure it has issues but it works how my brain works or is it my brain works how C works? Either way it just makes sense to me.
In my experience, one of the easiest ways is to just get employed by a company, where coding is done in C. You will have to code in that language a lot and the colleagues can help you. I also learned a few languages myself in my spare time (PHP, Python), but that is not nearly as effective.
If you want to really be an expert on C (or in anything imho), you just have to do it fulltime.
> Always, always, always write code as if it will last 30 years.
Well said. But if i do 'git blame' and see a code is like 30 years old. First thing to come to my mind is it must me some kind crappy old legacy code
I know it is wrong. It is substance that matters. But lately somehow i programmed to myself to like newer code then older code. I need to step out that misconception.
"DBell is one of the best programmers on the planet."
How would one verify this?
I tried a couple of his sample C programs.
One of them was one source and one header file and compiled easily and quickly. A+. But then I looked at what the program did and realized I had written several iterations of the same utility myself years ago, using only the shell, sed, tr and ed or vi. I guess maybe his point of writing this in C is that he envisions a system that lacks those programs?
Then I tried another sample, which was a little more complex.
It failed to compile. Looks like he assumed Linux but failed to state the program is not portable. F.
I am always on the lookout for truly great C programmers.
Look at some of DBell's code:
http://www.tip.net.au/~dbell/
http://www.isthe.com/chongo/tech/comp/calc/index.html
Learn from it. DBell is one of the best programmers on the planet.
I have nothing against Mr DBell, whom I don't know, but... don't learn from it, please.
Weather 1.9 - Java application to plot weather observations from the Australian Government's Bureau of Meteorology (BOM) web site.
I did look at it.
Controller.java is over 1200 lines long. Don't learn from people who create classes that big (aka God objects).
SunriseSunset.java contains nearly 100 lines of commented out code. That's another bad practice.
What's more, it consists mostly of a main method commented as "Simple main to test the class". This is a so-called poor man's unit test, it doesn't use assertions, it just prints some results out to be verified by the programmer manually. Which is another antipattern.
His FileInfo.java (part of his FileSelection 2.0) contains tautological comments like:
//
// Get the suffix path.
//
String suffixPath = fullPath.substring(prefixPath.length());
Gee, you don't say. Redundant comments, another antipattern.
isDirectoryEmpty (in the same file) happily ignores an exception:
Meaning that if an IOException occurs (for whatever reason), the method will tell you that the directory is empty, even if that's not true.
Setting stream to null after closing it makes no sense whatsoever (it's a local variable, it's not going to survive exiting the method).
Closing it in the try block makes no sense to me either, because Utils.safeClose is going to execute either way; that's how the finally clause works. It looks like the author assumed that finally only executes if an exception happens. No, it is run in either case.
So, what's the point of closing the stream twice? And if it does serve some purpose, now that would require leaving a comment, because it's totally unobvious. Not "getting the suffix path", which is blatantly obvious.
I could go on. This is poor quality code, I'm hardly a great programmer (working on it!), but it's not up to my production standards.
If the man is "one of the best programmers on the planet", the planet can't be Earth.
The C sources are quite baroque as well -- inconsistent naming, lots of repetition. Things that could have been neatly done with a lookup table are coded into enormous repetitive switch statements. No goto for error handling, hence a repetition of the same error handling code before each return.
If you want to see examples of consistent and disciplined C programming, look at Git and Nginx.
The whole point of a finally block is to have the cleanup code in it that should be called regardless of whether or not an exception occurred in the first place.
So the first stream.close should not be there if the 'finally' block already contains one, and it will be executed twice if an exception does not occur.
Of course, but the second one DOES execute whether the exception occurs or not. So the first call is redundant.
If no exception occurs (which is most of the time, probably), the code will call stream.close(); stream = null; and after that it will pass that null stream to Utils.safeClose, which can't do anything to it at this point, as the stream is closed by now and we've dropped the reference to it anyway (by setting it to null in meantime) :)
The code reads as if its author was under false impression that finally only executes if catch executes. But that's not true, and it would make no logical sense (why would the language distinguish between "catch" and "finally" at all in such case?)
The article advises learning to play music because a lot of good programmers play music. A lot of programmers are also white. Please become white if you can.
That's true, but the point is that this correlation could be pure coincidence, or at least not causative. (If it actually occurs to start with - it would be interesting to create a poll and verify whether most reputed programmers are indeed more musical than general population.)
Many programmers have stereotypical geek interests like RPG, Star Wars etc. but it doesn't mean that these contribute to their programming skills in any way. Being socially awkward doesn't make you Sheldon Cooper :)
- Read an authoritative source (K&R is good; there are better ones)
- Read a bunch of good code (I mostly read tools and kernel sources)
- Write crappy code and get better
Generally I want to write 10K lines of code in a new language before I probably don't suck at it. Varies on the language and paradigm, going to C++ from C took like five years (figuring out OOP, and painful lessons on what to avoid in C++), and apparently I'm never going to understand Haskell.