Actually, this is how arrays are handled in C.
A C array is a set of consecutive memory addresses.
The first value is pointed to by a pointer.
int a[] = {1, 2, 3}; // create an array
// a is just a pointer to the first element...
No, "a" is an array, not a pointer. Defining an array object does not create a pointer. The expression "a" is implicitly converted to a pointer to the array's first element in most but not all contexts.
If "a" were nothing more than a pointer to the first element of the array, then "sizeof a" would yield the size of a pointer rather than the size of the array object.
This is all explained very well in section 6 of the comp.lang.c FAQ, http://www.c-faq.com/.
It's also worth pointing out that there are no array parameters. If you write
void foo (char bar [42]) {
// ...
}
then within the scope of foo, bar will be a pointer to char, not an array of char. You can, on the other hand, write
void foo (char (*bar) [42]) {
// ...
}
in which case bar will be a pointer to array of 42 char (and notably, not a pointer to a pointer to char!).
C does some very strange things. There usually is (or at least was) a good reason for it to do so, but especially if you're a beginner, you need to be on your toes if you really want to use it well.
I think the problem with learning C is that you need to learn stuff like make, autoconf, how the compiler + preprocessors work (what do all those flags even mean!?), how making "cross-platform" stuff works, how to pull in and use "libraries", C-isms, how to test, etc. C itself is a very small and simple language, but the tooling and patterns are old and mysterious.
In college I learned how to program embedded systems with C (and ASM). Once I knew how to debug, build, and push it onto the device, it was surprisingly easy and beautiful. There's no confusing and magical abstractions, it's just you and the hardware. You pull up the microcontroller's manual, and as long as you have an idea of what the lingo means, you can make it dance to your will.
Something I disliked was that with both microcontrollers I used, there was dark magic involved in building and uploading your binary. One of the devices provided examples for a specific IDE, so I imported that example and used it as a base. I could keep modifying it and keep adding files, but I never managed to setup a "new" project. The other device was similar, but instead of using an IDE it provided a Makefile, which was nicer.
Does anyone know any good resources on learning and writing real-world C? I've looked at C projects here and there over the years, with the most interesting being GNU Coreutils... But even if I can eventually understand what some code does, how do I learn why it does it that way?
I'd love for a guide that showed things like: "do X and Y because of this and that", "test X and Y by running the following", "write tests using XYZ", "debug X using Tool A, and Y using Tool B", "pull in this library by copying these files into these places, and use it by adding the following lines to the following files", "generate docs by pulling in this tool and set it up by doing the following", etc.
In the web world there's a massive set of problems, but it tends to be easy to find "boilerplate" generators that will get you up and running. And after you've tried out a few of them, you can usually pick up how people are mixing and matching different tools.
EDIT: Another comment linked to "Learn C The Hard Way" [0], and after browsing through the chapters it seems to cover a lot of the topics I'm interested in.
Javascript has the same problem, but I think its worse. At least there are platform standards in C (autotools in GNU, msbuild on Win).
Trying to figure out how Grunt/Gulp/Broccoli, LESS/SASS/Stylus/Jade, Coffeescript, Uglify, Bower, Browserify, Require.js, AMD/CommonJS, NPM etc all work together is a nightmare.
It's all too hard, so people added Yeoman, Brunch, or other things to generate application configs - but now you need to decide which skeleton/generator you want, which takes you down the JS and CSS framework rabbithole.
Of course the only sane response to all this is to write another build system that doesn't repeat everybody else's mistakes.
I'm a C guy, and I never "worry" that if I have to recompile a project I worked on 1 or 2 years ago that I'm going to have to fight the toolchain to get it going again. Make will be make, and it will work.
I have a huge concern when I do anything in Javascript about what happens in 1 or 2 years from now when I have to modify a project I built using some Yeoman scaffolding. Is it still going to work? Will those node modules still be there? Can I update them without it breaking everything?
It's really weird for me, coming from the world of C, CVS/Git, Make, Linux, etc. where it's almost _unthinkable_ to introduce changes that would break older versions. Hell even the world "old" generally means 5+ or 10+ years.
I don't understand. You can configure your node module to use specific versions of libraries just like a C project. Surely you've heard of problems people have with dynamic linkage and their programs not working? This is why distros spend so much time getting things to work with each other - I think you're just sweeping that under the rug.
The tools you're working with aren't different on this manner - it's not like your tool will magically recode itself to work differently.
Yep. I'm more comfortable with Make than I am with JSPM/NPM and System.js. The process to minify my javascript and CSS and then replace the paths in the HTML so my pages actually work seems to be needlessly complex.
I'm seriously considering whether I can use Make for my production website builds - the only issue is Windows support.
Try out webpack. Webpack makes handling all of your assets into something trivial. I converted around 3,000 lines of Gulp files into a 200 line webpack config. (Most of which ended up being alises, and this is a reasonably big project.)
For an example of what makes it so awesome:
var img = require('./foo.png')
// "/output-path/0dcbbaa701328a3c262cfd45869e351f.png"
Webpack will copy this file (foo.png) to your output folder, and rename it using the file hash, so it does cache-busting.
You don't need to use other build tools, you can just use the webpack CLI. I personally use npm scripts.
Webpack also allows you to setup aliases for modules, as well as load pretty much anything you can imagine. You wanna pull in a module that doesn't use CommonJS and instead exports a global? Webpack has global-loader. AMD is supported as well.
Oh, and this includes a sane development environment, with reload on save, as well as hot loading assets that support it. (Check out react-hot-loader: http://gaearon.github.io/react-hot-loader/, but it works with css as well.)
And you can require css files in your components, and then add the extract-text-webpack-plugin so it'll rip the css from the generated JS bundle!
Aaaaand it handles SourceMaps, so you don't have to worry about some plugins (looking at you gulp) not playing well together.
Finally, it also handles minification, either through a CLI option or in the config.
You have a couple of options with make under windows.
There are native versions of make. Where you are using dos shell commands to do stuff. MinGW provides Posix compatible native commands. (you can use ls, cat, etc commands). You can use msys which gives you a Posix compatible build env. Finally something like Cigwin provides both a Posix compatible build env and Posix compatible runtime as well.
I never found C's transition from source code to hardware to be confusing. I struggle to comprehend why so many people, some longtime professional programmers, have trouble understanding what a linker does.
On the other hand, I downloaded the CUDA SDK. I couldn't even figure out where the GPU compiled code even resided. I suppose I was just supposed to take it as it "just works" (and it did), but it all left me highly uncomfortable.
The OS is big and scary. When you're working directly with the metal, it's SUPER simple, there's no magical abstractions. But when you're using libraries and doing system calls you don't really know what's going on. Sure, you can dig into em sometimes, but it's more than a bit daunting.
I don't think you need to learn the build system until after you're comfortable with single-file programs, where "gcc yourfile.c" is enough. Then add compiler options, and get to know Makefiles after your programs grow large enough to require multiple files.
GNU coreutils (and in general, a lot of the GNU projects) are rather excessively complex and certainly not what I'd advocate "learning by example" from. Take a look at the BSDs' standard utilities for simpler, more straightforward code.
But even if I can eventually understand what some code does, how do I learn why it does it that way?
I believe that the best way to learn "why" is to ask "why not". You will see that a lot of programmers, IMHO unfortunately really don't know why and are just doing what they were taught to. If you don't do X, then either [1] it doesn't matter and you don't actually need to, or [2] it does and you realise the reason why, when you see how X makes things simpler/more efficient for either the programmer or the machine, or both.
Definitely. Not only is it worth doing the wrong thing first to understand why its wrong, its also often worth revisiting after you have more experience with the other alternatives.
Especially when it comes to programming paradigms and stuff like 'best practices'. Theres a lot of cargo-culting in programming culture, and you really shouldn't take it as dogma.
Personally, the hardest part for me was keeping track of the size of everything. Coming from a higher level language, keeping track of the bits and bytes takes some getting used to. Working with arrays in C is much tougher especially when the compiler will compile almost anything you give it, and even a small mistake is catastrophic.
Before C, I was pampered and took everything for granted.
Now I appreciated my life more after C and feel blessed every time my IDE gives me a warning.
My pet hate is #include files. The whole way that C handles multiple source files just seems archaic to me, having worked in higher-level languages. I wish C had a proper package system that was standard, so I don't have to mess around with things like include file path order (or my favorite, the C++ template definitions having to be in the header files thing I only recently learned about).
Interestingly, I first started with C (although I haven't written a line of C code for a long time), and when I first move to higher-level languages, I dislike the fact that I have no idea where the file I just imported is. Moreso when I'm playing with obscure/ new language: if I can just import whatever files I wanted (rather than at package level), it seems that would be much easier to hack on the language/std itself.
A sufficiently long include path can give you this problem anyway. I recently tripped over this when I created a "reason.h" and discovered that Windows had a file of the same name deep inside MFC.
I don't quite get this lament about being pampered in higher level languages. To me, it feels like someone saying they feel pampered for having indoor plumbing or running water.
Frankly, I don't use a lot of crazy libs or IDEs when writing C code. Most of my projects consist of one or two external libs and a few simple makefiles. I use Vim and clang/gcc for compiling and lldb/gdb for debugging.
As for compiler flags, the only ones I ever worry about are `-O`, `-g`, `-c`, CFLAGS and LDFLAGS.
What I've learned is that the way C includes other files/libs is extremely simple. The header files are simply a simple mapping of the code in the `.so` or `.a`. If it's a n `.so`, you can't make it a static executable and if it's `.a` you have to.
I've never worked directly with low-level microcontroller programming, and the extent of my electronics knowledge is some fiddling with arduino. When I did that I used ino (http://inotool.org/) to compile and upload code to the board.
EDIT:
Projects that have good C code include: http://suckless.org/ and the linux kernel (and other stuff by Linus Torvalds). Avoid anything with GNU (most of it's over-engineered)
I reccomend "Programming in the UNIX Environment" by Kernighan and Pike. Partly because it was written so soon after UNIX and C themselves, it has very little of the modern 'cruft' in it. It's at the level of "cc program.c -o program".
Make is fairly easy to learn, at least in its basic form. Autoconf is horrendous.
> Make is fairly easy to learn, at least in its basic form. Autoconf is horrendous.
I completely agree. Make is extremely flexible on it's own. I don't understand the need to abstract the build system to generate thousand-line makefiles that are impossible to hand edit.
As benwaffle says in adjacent comment, make is the 80% solution that works most of the time. Autoconf is the 100% solution that's supposed to work everywhere, no matter how weird or long-dead your UNIX is. In order to do that it does a vast number of compatibility tests. The result is complex enough that simple substitution of makefiles doesn't quite cut it.
Of course, that imposes the cost of 100% compatibility on every developer, when most would be happy to just build on today's Linux and call it a day.
> Autoconf is the 100% solution that's supposed to work everywhere
> Of course, that imposes the cost of 100% compatibility on every developer, when most would be happy to just build on today's Linux and call it a day.
The reality is that when you use programs built with autotools on systems that aren't mainstream, you'll have troubles. Because the scripts aren't right and were only tested by Linux developers on Linux.
And much of the time, I find it faster and easier to fix a broken makefile than to fix broken autohell.
> And much of the time, I find it faster and easier to fix a broken makefile than to fix broken autohell.
Exactly. 99% of the time, a broken makefile simply has an incorrect linker path or cflag. When it's not a path, the Makefile is structured in a way that makes sense and is easy to fix. If an autoconf project is broken, I just scrap it and don't even bother trying to build it.
The other issue with autoconf is that it's not standardized. So many of the autoconf projects I've seen have shell scripts (to install it or download deps) mixed in that only add more confusion. Some of them have a configure script. Some have a configure.in, so you have to generate the configure yourself.
100% of the makefiles I've seen have a build, install and clean task. Sure, it's not required, but everybody does it. You can't say the same for autoconf.
The idea is to make it work on all platforms, only requiring minimal POSIX compatibility. It also checks for any requirements you specify, and sets up stuff like make install, make dist-check, make check. It handles compiling your code into a library regardless of the os
unit tests are more effective than using a debugger. I use a debugger a couple times a month and unit tests with sprinkled asserts and debug prints in the code.
BTW: just want to say that you have a fascinating blog! I've just spent last hour only skimming through some of the articles and bookmarking them for later. (Link for the lazy: https://nickdesaulniers.github.io/)
I switched to CMake for all of my C and C++ projects and i never looked back. Takes care of a bunch of makefile issues for you once you get used to it.
C is quirky, flawed, and an enormous success. While accidents of history surely
helped, it evidently satisfied a need for a system implementation language
efficient enough to displace assembly language, yet sufficiently abstract and
fluent to describe algorithms and interactions in a wide variety of environments.
-- Dennis M. Ritchie (in "The Development of the C Language" [1])
These slides were full of dangerous, elementary errors when they were submitted to /r/c_programming a couple of days ago. The author doesn't know C enough to have worthwhile views on the language.
I was quite impressed how many important concepts are covered in such a small presentation. He explicitly doesn't cover the things in C that programmers from other domains will understand easily (functions, conditionals, loops etc.) and goes straight to the things that make C different.
Hey author of the slides here, glad you liked it :D
I used http://remarkjs.com/ to make the slides. All you have to do is include the script, add a textfield with your content in markdown and it automatically converts to a slide show.
I was hoping this would actually tell me how to accomplish something in C. I know all about pointers and memory, but I don't know anything about the current state if C development. What libraries do people use? What are common memory management strategies? Etc.
Libraries depend on what you need to do. A lot of work is done with just the std lib.
For domain specific stuff you use domain specific stuff. For generic stuff, well, you tend not to want generic data structure libs that come with other languages stdlibs because its hard to do those both efficiently and cleanly (you can only pick one). Its easy enough to hack up a dynamic array or a hash table with a fixed capacity that can only insert and get, maybe delete, so you tend to do this when you need it. (Also, a sorted array usually does very well in place of a hash table with not much code)
Memory managment it depend what your doing. I use a lot of arenas, and pools, and as a result very rarely have to worry too much about memory management. Some things become harder here like dynamically sized arrays, but you can do this with chunks of fixed length arrays. (or whatever)
I have to do very little string processing most of the time (and when i do they usually have small, known maximum lengths), and i imagine this memory management technique would work less well for that.
The issue with undefined behavior is it's kind of hard to... well... define. It's an odd combination of unsafe memory, float/integer rollover, and rules with memory allocation. I wasn't quite sure how to clearly state it.
All you really need to say is: There are some rules in C which neither the compiler nor the runtime is required to check if you've broken. On the contrary, the implementation is allowed to assume that you haven't broken them - which means that if you do break these rules, all bets are off as far as the behaviour of your program goes, which may lead to all sorts of strange and apparently inconsistent results.
Yeah I tend not to get hung up on the different ways it can trigger. It's more like, if you trigger it your program might run fine for now and then suddenly act up one day. It might die before the undefined behavior. It might launch nukes. Still sure you want to get into this? No? Want to go back to Ruby? Well, Ruby is built atop C. That's the five states of grief I try to get them through.
Unfortunately, UB extends far beyond that. To the point of being (very) logically inconsistent.
People not familiar with C would naively expect that, for instance, `x[10] == x[10]` is always true even if 10 is out-of-bounds for x (or rather: it may crash, or it may be true.) . But compilers can - and will - assume that this situation is false if it makes the code faster.
This sort of thing is my major pain point with C - you quite literally have to know the entire code to make any judgements about any one piece of code, even trivial ones. You end up having to fight the compiler at every turn.
Not only can the compiler make it false. It can assume that the code would never make the comparison in the first place, and optimize away any cases where the undefined behavior is guaranteed to be triggered.
So in a sense undefined behavior can travel back in time :O
What exactly do you mean by modular code in your context?
edit: Oh, you probably meant modules integrated directly in the language. I was surprised to read that since modular design is highly successful and useful in C.
command line utility development, writing a replacement to systemd
Those don't require direct use of C, necessarily - but rather some form of FFI to POSIX and the syscall interface. Hell, if one is writing a systemd replacement, I'd encourage use of OCaml. It'd spare you lots of boilerplate because you're writing relatively high level userspace logic, anyway.
Red Hat's Richard Jones (who is also often commenting on Hacker News) does a lot of system programming in OCaml (mostly VM management-related I think):
Why do presentations always get interpreted vs compiled wrong? It's not a property of a language it's a property of the runtime. For example Java is interpreted on old versions, JIT compiled on the desktop and compiled AoT in Android...
Compiled vs. interpreted is intended there to mean something a little different than runtime behaviors.
Compiled - source files are run through a compiler which produce some output file that's then run. With C or Java you run the .c or .java file. C compiles to an executable, Java compiles to byte code.
Interpreted - source is fed to an interpreter which typically turns it into some kind of representation that's immediately run with no intermediary step on the part of the user/programmer. Perl, PHP, Ruby, Python, JS, TCL, etc. are typically run this way.
The terms are inherently a little muddy, since you can compile some interpreted languages to bytecode (common in PHP, and how JRuby/Jython are often run), and in theory could write a dynamic loader for C / Java to run them as interpreted - no idea if someone has been so perverse as to do it.
The behavior of a JRE or other runtime in loading bytecode/whatever to execute - JIT, AOT, or whatever has to do with how a runtime handles bytecode that's already been compiled from source and turns it to machine code to execute. The terms show up there also, but have a different meaning.
That's not what people are typically discussing when they talk about a compiled vs. interpreted language, and the author was correctly using the terms when referring to languages to draw the distinction in what a developer does (rather than what a runtime does, which is basically irrelevant to C).
In your explanation you even state "an interpreter which typically...". You talk about compilers and interpreters, which aren't part of the language.
The way you describe it, compiled or interpreted is a transitive property of a language, based on a popular way to utilize it.
If the definitions are "inherently a little muddy", then they're not very useful. Javascript and compile-to-JS languages have a ton of compilers written for them. Does that make them compiled or interpreted languages? How popular does compiling a language have to be for it to become compiled instead of interpreted?
It's no wonder that the parent prefers the strict definitions which lack this ambiguity.
I think the slide was correct in expressing the idea it wanted to express using terms that are commonly used in the sense they used, while the parent was being pointlessly pedantic in wishing the terms had some strict sense that was the only true definition.
Sometimes terms do have very strict definitions that are only correctly used in a limited sense. In the case of compiled vs. interpreted that's not the way things are and criticizing an introductory doc. for not adhering to an arbitrarily selected strict definition is pedantic and counter-productive.
No, seriously. All the lecture had to say to be correct and at least as useful is that C requires a build step unlike a lot of high level languages which don't.
Nothing about compilers vs interperters, none of that is relevant and the fact it's totally - incorrect is just icing.
This is not a terminology debating society, you're welcome to look up the definitions on wikipedia or take an introductory CS course if you're not sure what a compiler is or the difference between a language and a compiler/interpreter for it.
You're right that if the slide said C requires a build step unlike a lot of high level languages it would be correct. The terms "interpreted language" and "compiled language" have a sense in normal usage that I described, and I just verified that Wikipedia doesn't agree with you. There are no standards bodies that define formal definitions for those terms, common usage is how they are defined. All this is pointless pedantic quibbling, though, so it wasn't really worth my time, nor is it worth yours, really.
Agreed, interpreted/compiled shouldn't necessarily be seen as part of a language definition. However, I see two counter arguments.
When using preprocessor macros in C/C++, these only make sense with a compilation step. But they are part of the language, are they not?
In interpreted languages, you generally have an eval function/command that lets the interpreter execute any dynamically constructed code. That eval function arguably is percieved as part of the language, but only works in an interpreted environment.
int my_var = 3; // It's an int!
my_var = "abc"; // COMPILER ERROR! You clearly stated that it was an int!
"abc" gets interned and has a memory address. The statement my_var = "abc" tries to assign that address to my_var, but since it's not a pointer to char it gets cast to int instead, possibly truncating the value.
The program still compiles, just printing a warning.
As someone who just reread "C Programming Language", in chapter 1 they teach you how to count occurrences of characters so I'm not sure I understand the sarcastic tone of "you're not going to count foo in a file". C can be used for many things, I don't think the author of the slides dod a great job of describing "when" C is the right tool for the job.
I think he is just saying that C isn't worth learning if you just need to do really simply things that a ton of modern languages can do in one of two lines.
As someone who occasionally dabbles in C code, I am interested in knowing what is the modern take on `goto`s.
I know they are "harmful" but I often come across code riddled with goto statements [1] and I personally feel that as long as it makes the code readable without significantly obfuscating the logic, goto is a perfectly fine way of doing things (although popular opinion and consideration for best practices have more or less forced me to remove goto from my list of C tools). Also, the fact that it maps almost directly to asm makes it easier to reason about the generated machine code (although if that's a significant reason for using goto is questionable).
Goto is fine. The 'harmful' style was using them in favor of ifs and loops.
The things I've seen people do to avoid a goto are pretty awful though. If you ever use a 'do {} while (0)' just to break out of it, you should feel bad. Goto is much clearer and cleaner than nonsense like that.
I think one uses goto's for two sometimes three reasons.
1. Sometimes one feels the need to abuse exceptions. But C doesn't have exceptions. So one abuses the old goto instead.
2. Sometimes the code is far more readable if you use a goto to short circuit a complex block of code. Which will become insufferably more complicated if it has to say keep track of a trivial case. if(this and not trivial case) else this and not trivial. if(case a and not trivial case) else {if not trivial case)
3. You can use long jumps to do exception handling stuff. I've never had to actually do this.
One comment. I remember trying to read ancient code that abused goto's mostly because the programmer was desperately trying to fit everything into 4k of prom in languages that didn't support structured code. That was the kind of stuff Dijkstra was bitching about, not uses 1, 2, and 3. And actually since C has always had modern control structures goto just is not abused much in practice. Probably the opposite.
Side note.
int my_var = 3;
my_var = "abc";
Just usually generates a warning when compiled. If run my_var will usually get loaded with the address of "abc". If you follow it with the statement
printf("my_var=%s\n", my_var); // this will throw a warning
The convention of the project is that every function that has to deal with an error condition has an "error" label. These macro a used to jump to that label to clean up the function before returning. Here's an example:
I think that this use of gotos is more clear than the alternative of a lot of nested if statements checking for success or having the clean up logic duplicated in a lot of places.
Personally, though, I prefer using C++ and destructors (and, where appropriate, exceptions):
result_t foo ()
{
// Allow exceptions to propagate
auto r1 = get_r1 ();
auto r2 = get_r2 ();
auto r3 = get_r3 ();
return get_result (r1, r2, r3);
}
Exceptions aren't always appropriate, but the C++ still usually ends up a little cleaner. I would love to see something like Haskell's Maybe monad that lets you write code like the above, but returning status information instead of throwing exceptions.
Why is writing a compiler in C a premature optimization? Ignoring performance is just bad approach to writing software. And "premature optimization" phrase is overused, even when it's not relevant to the discussion.
I misread the title as "C for high level programming" and was a bit confused. Of course, long time ago, C was considered a HLL.
The presentation is quite good. I might point people to it in the future so they might at least have an idea in what dangerous place they want to venture.
So as someone who has an unhealthy love of C I have to say this is amazing and will be forwarding it on to some friends trying to learn C. The humor is just fantastic IMHO.
May I suggest that you also send them a link to Zed Shaw's "Learn C The Hard Way" online book [0], which presents lessons in modern C programming, including the use of tools like Valgrind, etc.?
I just wrote a comment [0] about my problems with learning C, would you say this book covers the issues that I raised and is worth reading?
EDIT: Well, I just looked over the chapters in this book and I'm now extremely excited to give it a read. It seems to cover most of the topics that I'm interested in, so thank you SO MUCH for sharing!
The thing about pointers that I don't get is why do you need the memory address of the variable? Is that the only way to get the value when you want it? Like, every variable has to have a pointer in order to make use of the variable?
This comes from a restriction of almost all existing computer architectures. You have a small amount (16 on amd64) of 'variables' called registers that you can directly work with. Additional variables have to be loaded from and stored in memory, which is slower and requires you to know that variables address (pointer) -- which is just an integer with special meaning. In C, variables you never take a pointer to might live exclusively in a register, and all local variables are stored at a fixed offset to a special 'stack pointer' which is kept in a dedicated register.
Some architectures try to have a more sophisticated approach and have some sort of 'fat pointer' in which pointer values have a special tag and are subject to special rules so they can only point to valid objects. The exact rules used and what constitutes 'valid' is specific to the architecture. Intel has introduced mpx on newer processors to check array bounds with such a scheme, and older processors such as LISP machines have much stronger (but less efficient) schemes.
If you want arrays of variables, you need pointers. It's not enough to know the value of an int (the first element of the array), but you need the pointer to an int (pointer to the first element of the array). Now you can increment the pointer to go to the next element. You couldn't do this with a simple value.
Also, suppose you want to pass a variable of a large data type, like an image, to a function. Instead of copying the entire variable, just pass the cheap pointer. The analogy is giving someone an URL vs the source code of the site for them to paste in the browser (You can't fit the latter onto a QR code, for instance, but you can the former).
Of course the problem is, if you're passing a pointer to a data structure to a function, the function doesn't know the size of the data structure unless you pass that as another argument.
You meant to say, "if you're passing a pointer to an array to a function, the function doesn't know the size of the array unless you pass that as another argument".
When passing a (pointer to a) data structure to a function, in 99.99% of cases there's only one data structure you'd pass, and you build this into the function's prototype, e.g.,
int myfunction( struct my_structure *x )
instead of
int myfunction( void *x )
and so, yes, the function does know the size of the structure. And in the case of arrays, often it's enough to mark the end of the array (with '\0' in the case of char arrays or NULL in the case of pointer arrays), I'd only roll my sleeves up and worry about minimizing length calculations if I had actually done some profiling and determined that such nitty-gritty optimization was needed (it rarely is).
You don't need the address of a variable: you can use the plain variable just fine, and most C code does a lot of that.
What a pointer does is add a level of indirection: so instead of having a value "an integer" you can have a value which is "the location of an integer". A variable holding such a value can be assigned the location of any integer variable, and importantly can also be reassigned the location of a different integer variable.
The additional indirection also means that you can link one data structure to another without including one as an integral part of the other.
And why is this useful? Well, for high-level folks, pointers are used for roughly the same thing as reference variables in other languages.
For low-level folks, sometimes you need to be able to read from / write to a specific address in memory. So if you have, for instance, a system clock device that always give you the current time if you read address 0x1234, you might do something like this:
uint64 system_time;
uint64 system_time_device = 0x1234; // A pointer to the system time device...
system_time = system_time_device; // read the contents of the memory at address 0x1234 to get the time
Speaking from a C++ perspective, the item pointed to may not be a simple type (like an integer) but a complex object.
Copying an object around all over the place (into functions, out of them) would be expensive. It would be like copying an entire ledger every time you wanted to make a change to the ledger. Far better would be to hand the ledger around, or when looking for it ask "where is it?" and be pointed to where it is now.
It also makes it simpler to make sure all your data is in one place, which is a good thing for program design.
If you don't have a copy of the value, you need someway to find it, right? That's a pointer, or a reference, or a handle. (These are all approximate synonyms for some way to "address" the data.) In the olden days, there was a fixed mapping from a number to a physical location in storage, but now there are many levels of indirection, such as virtual memory, pools, etc.
These slides are great, I was thinking of presenting something similar to a bioinformatics lab group I am a part of. Could I adapt some of these slides?
I remember someone using LINQ to do fancy things on the results of an SQL query. It was stupid. Far better would be to have written the SQL properly in the first place, instead of grabbing loads of data and doing cartwheels client-side.
teaching C and not checking malloc, bad idea. using realloc in the way its used in this "tutorial" can cause memory leaks. realloc should be checked before reassigned because it can return NULL and that will overwrite the previous valid memory address.
Not checking malloc's return value can easily lead to security vulnerabilities, particularly in bytecode interpreters and things like that.
The basic plan is simple: Trick the target program into allocating an impossible huge block of memory (e.g. 3 gigabyte on a 32 bit system). Malloc will return NULL but the program blindly assumes the allocation has succeeded. Now use carefully chosen indices to read and write whatever memory you want.
An issue which is not linux specific is when calculating the size of an array allocation you can overflow size_t. So when mallocing arrays, you're supposed to check the computed size for overflow (or use calloc(3) or OpenBSD/libbsd's reallocarray(3) instead).
If "a" were nothing more than a pointer to the first element of the array, then "sizeof a" would yield the size of a pointer rather than the size of the array object.
This is all explained very well in section 6 of the comp.lang.c FAQ, http://www.c-faq.com/.