Hacker News new | past | comments | ask | show | jobs | submit login
The 5-minute Guide to C Pointers (denniskubes.com)
110 points by denniskubes on Aug 16, 2012 | hide | past | favorite | 71 comments



30 second guide to C pointers, from Alice in Wonderland:

`It's long,' said the Knight, `but very, very beautiful. Everybody that hears me sing it -- either it brings the tears into their eyes, or else -- '

`Or else what?' said Alice, for the Knight had made a sudden pause.

`Or else it doesn't, you know. The name of the song is called "Haddocks' Eyes."'

`Oh, that's the name of the song, is it?' Alice said, trying to feel interested.

`No, you don't understand,' the Knight said, looking a little vexed. `That's what the name is called. The name really is "The Aged Aged Man."'

`Then I ought to have said "That's what the song is called"?' Alice corrected herself.

`No, you oughtn't: that's quite another thing! The song is called "Ways and Means": but that's only what it's called, you know!'

`Well, what is the song, then?' said Alice, who was by this time completely bewildered.

`I was coming to that,' the Knight said. `The song really is "A-sitting On A Gate": and the tune's my own invention.'

The song that the Knight is referring to is this one: http://en.wikipedia.org/wiki/Haddocks%27_Eyes


Over the past four months I wrote a library in C. In said library exists the following chain:

typedef hidden pointer -> object -> dictionary -> attribute object -> linked list -> object

Then the whole thing can start again from the last object. At both the linked list and dictionary points the pointers have been cast to void. During development there was linked list in place of the dictionary. Since this was before I wrote output debuging involved manually walking the two linked lists.

Else where a function takes a void * array of structs yet needs to access the contents and thus requires a pointer to an accessor function.

I love C but it can get confusing. My plan is to spend the next few weeks in recovery learning RoR.


You might want to look at Go. It always felt like a revised version of C with great standard library and sane defaults.


Every time someone tries offering a simplified explanation of pointers, I've countered with the old Buddhist saying that, "The pointing finger is not the moon," followed by a brief foray into syntax and operators, e.g.,

  moon* finger = &luna;
As often as not, enlightenment occurs.


To me your line of code seems a bit too clever and complicates the concept of pointers. But we all learn differently so it's cool if it flips a switch for people who are, perhaps, more philosophical than I. I tend to use more boring examples like a card catalog vs a book when I'm explaining the idea of pointers.


Why do you say its clever/complicated? The example he mentioned seems like a fairly basic example of a pointer.


well mainly because it uses a kinda abstract/poetic concept that to explain something concrete. In order to understand the example you have to understand the little puzzle about the moon & the finger as well. Maybe I do understand the saying fully - maybe I don't fully understand the depth of that saying..? Now I'm confused about both!

Also using the term moon & luna. Are they the meant to be the thing? Where was luna mentioned in the saying? Is luna representing the moon?


I agree that the buddhist saying confused me at the start..

For the moon vs luna thing: For the line to be valid, 'moon' must be a datatype in C, so it can't be the name of the variable. It also makes sense literally because moon is a generalized body orbiting a planet whereas luna refers specifically to earths moon.


void* finger = &luna;

Don't force the finger to only point to the moon.


Pointers are sometimes not arrays. :)

    extern int *x;
    extern int y[]
x is a pointer to an int and y is a pointer to an array of ints of unspecified size. It's equivalent to saying float x and then extern int x somewhere else. They're type mismatched.

If you said 'x is a char pointer', you mean lookup symbol table for address of x, then do a memory address dereference and then get the contents of the memory at dereferenced address.

If you said 'x is a char array', you mean use the symbol table to get the address, then calculate offset and get contents from that address.

When you define it one way and then do it another, you end up doing both of those above. get contents of x, then get value of offset, add it to the contents of x, then get contents of the resulting address+offset.

The issue is the declaration which can happen many times and the definition which occurs just once.

Now, you can have an array and a pointer be equivalent if it is used in an expression because the compiler converts array references to pointers.


Arrays and pointers are completely different entities in C. The only reason people think there's any sort of equivalence is because:

1. The "array indexing operator" [] actually works only on pointers.

2. Arrays are automatically converted to a pointer to the first array element whenever necessary.

3. Arrays are really uncommon in C, and many things we think of as arrays are not arrays as C considers them. Declare a function parameter with []? That's a pointer, not an array. Malloc a bunch of memory? Not an array.


> Imagine an array variable like a pointer that cannot be changed that holds the memory address of the first element of the array it points to.

Nope. Arrays are not pointers; pointers are not arrays. This is perhaps the most common misconception about C, and a "Guide to C Pointers" should not propagate it.

For more information about why this is wrong, read section 6 of the [comp.lang.c FAQ](http://www.c-faq.com).


Well for a quick sum up:

An array is like a constant pointer, it's always pointing at the first element of the the array. To try to point it at something else is an error.

    array[2] 
is short for

    *(array + 2*sizeof(<type of array>)) 
which takes the address of the first element and adds the appropriate number of bytes to it in order to access the requested element, then it dereferences the "pointer" and you get the value of the element.

If this isn't 100% correct, please let me know.


As RegEx implies, pointer arithmetic implicitly includes the sizeof(<array type>) term, i.e. given

  int* array;
then

  array + 1
points to the next int, not the next byte. And so array[2] == *(array + 2). (The fact that addition commutes means that using 2[array] instead is valid and works in C.)


From the C standard:

> The definition of the subscript operator [] is that E1[E2] is identical to (((E1)+(E2))).

So array[2] == (array + 2)


Just as with pointer arithmetic (because that's what this actually is) you don't multiply by the size of its elements. This:

  array[2]
is the same as this:

  *(array + 2)
The compiler knows what the type of data the pointer refers to and can produce the byte offset itself. Also:

  "...it's always pointing at the first element of the the array"
Eh... an array can degrade into a pointer when needed, but what does the following produce?

  char arr[10];
  ??? x = &arr;
Is "x" a pointer to pointer to char? From your assessment it would seem so, but in reality the type of "x" is

  char (*)[10]
i.e., pointer to array of char 10. An array is an array, and arrays can degrade into pointer types.


Additionally, (as you know, but merely pointing out for the curious), `sizeof arr` returns the size of arr in bytes, not the size of a pointer to the first element of arr.


  > ??? x = &arr;
Does this compile? I thought an array was treated like a constant pointer, inexistent in memory so you cannot take its address, increment it, or attribute it another value. Although the point made by RegEx about sizeof, which I didn't remember, convinced me that an array is not actually a constant pointer.

To make my thoughts clear, if arr is equivalent to &arr[0], then wouldn't &arr be equivalent to &&arr[0]?


When printfing pointers you need to cast them to (void\). %p only prints void\ and there is no guarantee two pointer types have the same size. And I can't figure out how to write an astrix.

I also think it is a mistake to bring up memory addresses so early. The first paragraph is incredibly confuses.

A pointer is simple. It's a variable that points at something. You can change where it points and you can change the value that it is pointing to. (&) is the pointer-to operator and gives you a pointer to something.

I think that is roughly all that needs to be said. The all of locations, memory, addresses is just confusion.


Pedantic notes:

- "The cast isn’t explicitly needed in C but it does make things more readable" - I think many people would disagree with this, including me. Casting is ugly, error prone, and conceals errors. Don't do it, there is no need. If you think it is helping you because you can see what type things are then your functions are too large and incoherent, fix that problem instead.

- "Here we print out the values of our uninitialized and NULL pointers. Notice that our uninitialized pointer has a memory location" - This behaviour is completely undefined and it is doing injustice to those who would benefit from this blog post. All code people run on an example site should be well-defined. "Worst case the program crashes badly" - No, worst case is you have no idea what could happen! Old versions of gcc ran nethack when it detected undefined behaviour at compile time. In practice, crashing is actually the best case since you can see the error. In worst case it silently corrupts data in your program and you can't find it until it's so far from the error site that it's near impossible to track down.

- "using the NULL keyword" - NULL is not a keyword, it's a macro defined in stddef.h (maybe stdlib.h? I forget).

- "doing so will cause a segmentation fault" - No, it's undefined behaviour. On DOS systems the memory addresses 0 was valid and writable.

- "That being said an array variable does point to the memory address of the first element of the array." - No, it IS the first element. It decays to a pointer to its first element under various operations, though.

- "char * fullname = "full name";" - const char * . Modifying string literals is undefined and the fact that the compiler does not require fullname to be a const char * is merely a historic oversight.

I understand the desire to learn-by-writing, but I don't feel this tutorial offers much to the discussion and confuses some of the same things the many other pointer tutorials confuse. I don't think someone new to pointers will actually come out with a superior understanding of pointers after reading this. Much of the content is not technically wrong the wording and method of introduction isn't any less confusing.


Here's another one with more information on pointers in C.

http://www.thegeekstuff.com/2012/01/advanced-c-pointers/


Another tutorial about pointers that I found quite nicely written is http://boredzo.org/pointers/.


Yet another broken C pointers guide out there to confuse people.


What is broken about it? I am happy to make corrections.


Minor points:

- & is called the "address of" operator, and saying "The & is the reference operator and is used to reference a memory address" sounds just strange.

- In C, a reference is just a way to indirecty access an object, which means that a pointer is a reference. The C language doesn't define references like C++ does (as another name for an object). In C, references are much informal (the conceptual definition is used; as in a means to access something else).

I am going to be honest with you, I read the beginning. Found it strange, then looked at the topics. Saw you had a "pointers and arrays" section. Read it (it's the part where people get confused more). You seem to make the same mistakes. I didn't read the rest.

These are not minor problems, IMO.

Here is a quote from the pointers and arrays section:

"Imagine an array variable like a pointer that cannot be changed that holds the memory address of the first element of the array it points to. Even though the array variable holds a memory address, you cannot assign a pointer variable to an array variable, even if the pointer variable actually points to the same or a different array. You also cannot assign one array variable to another."

Arrays are not pointers. And they don't hold memory addresses. An array is a continuous block of memory holding as many objects as you specified, all of the same type. You should explain "value context" and "object context" and tell people that when an array name is used in a value context, the value you get is a pointer to its first element. This is not due to "arrays are pointers" or "arrays hold pointers". Arrays are not pointers; and they only hold pointers if the element type is of a pointer type (however, in this case, the pointer(s) is(are) its element(s)).

You should explain what it means "arrays values are pointers to their first elements", and focus on "value". If you take the sizeof operator or the address of operator, they operate on object context (not on value context), then you will "reveal" the "true nature" of the array.

To really be able to explain pointers, you should explain other things first. Which is very hard to do in a small blog post. I suggest you make a series of posts; and try to make then correct, w/o getting bogged down with definitions of C terminology (which idk how to avoid). Terminology such as lvalue, rvalue, name, value context, object context, value of something, object, identifier (in C, identifiers and names are 2 different terms), and some others.

Out of curiosity... Can I ask you why you're posting about C?


You have some valid points. I clarified the arrays section. I am just attempting to show some simple examples. Wasn't trying to do anything in depth. Not the easiest things with pointers of course.


Very few people have reasons to use C. As far as I know, must people do not want to use C. Those who want or have reason to, should go in depth. If you're trying to avoid going deep on the subject, then, forgive my intrusion, maybe you should be focusing on something else.

Some languages you can use by just learning a little bit here and there, supperficially. C is not one of these.


I'd wager a good number of people don't want to use C because many intro tutorials are too pedantic, too clever, and/or too intimidating. C doesn't have to be hard. I agree that the OP should work to retain 100% accuracy in his article, but I disagree that he should fledge out "the truth" to its fullest extent for a beginner. There's always a level deeper you can go in learning how pointers/memory/hardware/transistors work. Deciding the extent of information you wish to impart before letting the beginner do some programming is a very delicate decision.


One major problem, which I think is related to this you're saying is that C is used as if it was a beginner's language, but it isn't.

C was made so that it could be simply compiled to assembly. Also important, it had high portability "in mind". The mix of these two design goals is that compilers could exist for many platforms. And not just that, it's a language that was made also so that a large part of the UNIX operating system could be re-written in.

As some additional, although you can interpret C, that's not how it's usually used. In teaching/learning, people use optimizing (professional) compilers such as GCC. Which leads to a much non-interactive environment.

Put these 2 design goals together and you're bound to realize a language which is not beginner friendly because lots of things are there, or not, for reasons (a) to help the compiler writers; (b) due to portability on some machine of the 70's or 80's; and (c) of being a language for OS development. Add this with the fact that the tools around the language are not so interactice, you get something really beginner unfriendly.

C is not simple to learn, to use or to get good at. Compilers take advantage of what was put in the standard in very bizarre ways. Which means that if your code is wrong, even a little bit, it's likely that you get bugs, terrible ones. The optimizing compilers (good for professional use; bad for leaning) are usually the ones which will lead to the strangests bugs. Again, not beginner friendly.

It's often that people think they're good at C, and that they write portable C code: they usually are not and don't. C is very good at deceiving people.

Unfortunately, C is not a language that you can just pick and start solving your problem right away. Many more ordinary statements can fail for non-obvious reasons, which is not so common in other languages.

I am not implying that you don't know all this. I am though, implying that you should take this into consideration when you talk about learning C, and beginner programmers approaching C, and even beginners in C from other languages.

There are good books which are not difficult, and do not let go of preciseness, like K&R2.


"C is not simple to learn, to use or to get good at." The R&K is a very pedagogical book. This is really easy to read. The exercises at the end of chapter are excellent. Even a beginner can reach a very good level in C by studying this single book.

There are many trickiness (most of them result from recent standards), but beginners should simply avoid them using a sane programming style.


Honestly, why don't you re-iterate the known tutorials about these things? Make a blog post saying "here are these old C tutorials, which are amazing; and some people seem to have forgotten these things" and post them. Lots of comp.lang.c posts are interest Chris Torek's stuff are interesting. If you go to the ##c channel, at irc.freenode.com, they have a wiki link on the topic. That wiki has links to many tutorials. Re-iterate these things. And, please, recommend people using something other than C =)


It would be super cool if you would just point to what you think is the better stuff with a link.


Sorry. I don't know why I didn't do it.

http://www.iso-9899.info/

This is that wiki page I mentioned. It has links to all sorts of places. All articles I was talking about, I found in there.

It has an articles page:

http://www.iso-9899.info/wiki/C_gotchas

And it has some recommendations on many things, including other artigles.

http://www.iso-9899.info/wiki/Usenet

Check out the "Additional materials part":

http://www.iso-9899.info/wiki/Main_Page#Additional_materials

Anyway, wander a little bit in that page. The books recommendations are great, even though they recommend Deitel & Deitel C book, which is not bad, but it's sort of shallow.


Thanks, I appreciate that.


This Chris Torek guy has a bunch of articles. Series called "C for smarties". It's __very__ good.

http://web.torek.net/torek/c/

The link is in there in the wiki somewhere, but I think it's worth placing it separately.

The wiki mentions some names in the Usenet page. Google these names.


Some thoughts:

>> A pointer is a variable that holds, literally points to, a memory address

What is a memory address?

How does a pointer literally point to a memory address?


C assumes some sort of object space addressable to the byte (each object is composed of one or more bytes -- no such a thing as a 0 bytes object in C). It's the memory pretty much. Afaik, what is an address is not defined as part of the language. I guess people should use the knowledge they got from their "how computers work" classes =D. A pointer holds an address. I think that's as deep as you can get with this as clarifying goes.

You can though, say, to help learners that the address of an object is a value which can be used to indirectly access the object. You could say that it's a value that you can keep with you, to access the object later, indirectly.

And, as any language that I am aware of. Things are usually defined abstractly (in terms of what you want to be true about them). C is not different. You won't find concrete definitions (things may look concretely defined, but they're not). As far as we're concerned, addresses are these things that

- the address-of operator returns

- is help by pointers

- can be indirected

- ...

It could be represented as a character string for all we know.


It's the guide to pointers, not the guide to memory addresses—I think readers are expected to google unfamiliar terms.


I think if I understand memory addresses and assembly language, then I understand pointers. If the guide doesn't explain why pointers are useful and efficient on real hardware, then it's missing the point. Which was my point.


Thanks to everybody for the suggestions. I made some changes to denote addressof and better explain arrays. Also stated that the explicit cast wasn't needed.


A couple more things:

- There's nothing special about uninitialized pointers in C. Any uninitialized variable, whether it's a pointer, int, double, etc., contains unpredictable bytes left over from whatever was previously in that memory location, and should not be referenced. Not sure why this belongs in an article that's specifically about pointers.

- You should proofread the whole thing and fix the typos that a spelling checker won't pick up, e.g., "garage" should be "garbage".


Exactly, as soon as I saw "pointers can be null or uninitialized" my thought was "this is a really confused soul who wants to teach others."

I made my own compilers for living but I find the text very confusing. I'm glad I didn't have to learn anything from this text. I still believe people should learn C from this book:

http://en.wikipedia.org/wiki/The_C_Programming_Language

It's not so big and it's really good written.


> On lines 15-16 we assign our void pointer back to our castptr int pointer. Notice the explicit cast needed.

The explicit cast is not needed in C.


Correct. This is also why you do not need to cast the result of malloc(). An assignment of a pointer to void to another pointer initiates an implicit conversion.


And it is actually somewhat dangerous to cast the return value of malloc. Aside from being redundant it can hide an error on compilers which implement an older version of the standard (not uncommon. Anything pre-C99).

If you forget to include stdlib.h the cast hides the error and malloc will be assume to be a function which returns int. makes for interesting runtime errors.

Never cast the return value of malloc in C, and don't write redundant code. C is not C++.


That is exactly what I was thinking when I started reading the link. But in fact I have found only really minor points.

I still think there is already far too many beginner material on C.


Nice tutorial!

Maybe it shows my age, but I am yet to understand why so many developers nowadays have such a hard time grasping pointers, regardless of the language being used to teach them.


I learned BASIC, then Pascal, then C. I had a terrible time understanding pointers, until one day it clicked, and ever since that point I have a terrible time understanding how anyone could not understand pointers.

I don't think having difficulty understanding pointers is anything new. I was struggling with them in the early 90s, and there was plenty of literature out there talking about how to understand them and how people had trouble with them. But I can't help you understand why people have trouble, because I don't know myself.


I had the same opinion at one point. Like you said, the concept of pointers is very simple, but it can get confusing in application, as you start dealing with more levels of indirection. Try coding something/understanding code that requires triple star pointers, then you'll see why they're confusing.


Easy, just grab a pencil and paper, draw the data structures and you're done.

Again, maybe it just shows my age, as I started coding with BASIC and Z80 assembly.


My first programming language where I used pointers was 68000 and they were easy for me to grasp, but for some reason it was harder when I started using C and everytime I get back to C or C++ I need a little refresher.

For me it is the syntax used by C for pointers that makes it confusing.


> For me it is the syntax used by C for pointers that makes it confusing.

Personally I don't see much difference between * (C way), ^ (Pascal way) or ref (Algol way).

Or do you mean the declaration order ?

int* ptr; (right to left)

ptr = ^Integer /ref int ptr (left to right)


Yes, it is probably the declaration order which confuses me.


> The * operator is used to both declare a pointer variable and to dereference a pointer depending on where it appears.

You seem to be suggesting that this is an inconsistency, it isn't:

  int *ptr;
means that

  (*ptr)
is of type ptr.

  *ptr = 3;
Means the same and affect 3 to (*ptr)


Don't you mean

  (*ptr)
is of type int?


yes, of course.


I expected a repost of a repost of this: http://www.youtube.com/watch?v=f-pJlnpkLp0


thanks, also recommend to read "What do people find difficult about C pointers?" http://stackoverflow.com/questions/4025768/what-do-people-fi...


I never really had much of a problem with the syntax, but the point at which the behavior of pointers really clicked for me (after a night of many segfaults of course) was when I realized that when you pass a pointer to a function, you're sending a copy of that pointer, just like with any other primitive type. E.g:

    void foo(int *p)
    {
        /* assignment won't be persistent after foo() returns; need to send **p */
        p = (int *)malloc(1024 * sizeof(int));
    }
Not sure why my brain had decided to make an exception for pointers for the rule that all variables are passed as copies when I first learned C, but after that I never had any problems.


I experienced the same frustration. The exercise that gave me the "ah-ha!" moment I needed was prepending to a linked list via the head instead of the tail: You need to pass the address of `head` into the function so that the new head is reflected in the calling environment when you say `head = np`.


... or have the list operation return the new list head, which can make it way cleaner. See glib's list APIs, for instance.


Well I was simply meeting exercise requirements, which aren't necessarily the most practical.


Just like any other type. Everything is passed by value in C. Everything.


Nothing on pointer pointers :( so sad.


I added a pointers to pointers section. :)


You rock.


So... What do you use them for?


There are many uses. It greatly helps to understand the difference between the heap and the stack, and to understand the difference between static and dynamic allocation.

Functions in C are "pass by value" rather than "pass by reference". To simulate "pass by reference", the "value" you would pass is the memory address of the data in question, ie the pointer.

Also, if you put data on the heap, you must keep a reference to its memory address, otherwise you wont be able to access it again. You use a pointer for this. The following bit of code allocates some space on the heap, sticks an integer (5) on it and then returns a reference to that memory address, which is stored in a pointer variable named x.

  int * x = new int(5);
To understand why you would do the above instead of "int x = 5", you really need to understand what the stack and heap are, and why/when to use them.

Also, you can create read-only versions of variables this way. Eg:

  int x = 5;
  const int * y = &x;
At this point, y points to the address containing x, but can only be used to read. printing out "x" and "* y" will both give you the same result. If you want to update it, you can do "x=6", but "* y=6" will cause a compile time error, because of the const.

There are probably loads of other uses too. I only started learning C/C++ last month, in my spare time. Hmm, I seem to have learnt a few things.


Correct me if I'm wrong, but I'm pretty sure C doesn't use the 'new' keyword, it uses 'malloc'.


You are correct, as you know. That was a C++ example, as you know.


The nontechnical guide to C pointers:

They're post office box numbers.

In this analogy, the post office is all your RAM. The big place where my analogy breaks down a bit is that, in this little world, the post office will store things bigger than one box in multiple adjacent boxes, so to get anything into or out of the post office you have to specify which box you want them to start with and how many boxes they'll have to use.

The star (dereference) operator takes as its argument a post office box number and instructs the machine to go to that post office box and start taking things out of enough boxes beginning with that box to satisfy the type of the pointer. The only things that only take one box are char values; the number of boxes everything else takes up depends greatly on the specific kind of hardware.

It's possible to put a slip of paper containing a number into a post office box; in 32-bit x86, a number long enough to encode a post office box number takes up four boxes. In 64-bit x86, it takes up eight boxes.

Using two stars means 'Go to this box, take out enough stuff from the next few boxes to make a pointer, go to the box specified by that pointer, and take enough stuff out from that box (and possibly the next few) to satisfy the type of that pointer.' Using three stars involves another go-to-box step, using four stars another, and using five stars is usually a sign of gross mental derangement.

(Oh, and if you're running under an OS much more featureful than MS-DOS, the post office box is a total lie told to your application by the OS. Getting into that gets a bit complicated.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: