Hacker News new | past | comments | ask | show | jobs | submit login

This fix leaves a variable uninitialized, and depends on funny logic to later give it a value.

The general structure should be

  FILE *fp = stdin; /* sane default */
then override with a new pointer from fopen if necessary.

In fact, looking at this more closely, note how fp is scoped to the body of the for loop. That's where it should be defined:

   for (firsttime = 1; ; firsttime = 0) {
     FILE *fp = stdin;

     /* logic flips it to fp = fopen(...) as necessary */

   }
[Note: the C compiler Joy was working with probably didn't support block scope declarations except at the top of the function. This comment applies to merging the fix using the environment of today.]

Note how the fclose at the bottom of the loop ends up closing stdin in the case that fp == stdin! It ensures that this only happens once, but you have to analyze it to see that this is true. I would write this as:

     if (fp != stdin) /* defensive! */
       fclose(fp);
Making it an infinite loop terminated by calling exit is unnecessary. There is already a continue in the loop, and a break in an inner loop; another break instead of the exit, and a "return status;" at the bottom wouldn't hurt. Better yet:

  /* Enter loop whenever there are args left to
     process, or even if there are no args if it is
     the first time through (to process stdin instead). */

  for (firsttime = 1; firsttime || *argv; firsttime = 0) {
    FILE *fp = stdin;

    if (*argv) {
       /* have args; do the fopen to replace stdin */
       /* set status = 1 and continue if cannot fopen */
    }

    /* do head logic */

    if (fp != stdin)
      fclose(fp);
  }

  return status;
C itself was only a couple of years old when that code was written originally, and the systems were small. Whereas I've been using it since 1989. Not criticizing the original!



I would argue your insistence on giving the variable a "sane default" is a matter of opinion. I personally am much happier seeing a variable left uninitialized than ever seeing it double-assigned, as to me a fundamental semantic logic error has occurred if you are overriding the value of this kind of variable, especially as part of a conditional later where it might not be clear whether or not it happened: I really would prefer to look at the variable being assigned and think "ok, now we know the value". The compiler is entirely able to determine whether or not the variable has been initialized in all possible control flows for a case like this (at least, once we move the scope of the variable to inside of the for loop, of course), so there is no safety benefit to hoisting that default assignment out, nor is it saving a meaningful amount of code: it only serves to obfuscate the "default" control flow by making what had been an explicit if/else into an implicit if/default :/.


Always initialize your variable. There is not aesthetic argument, but there are multiple software engineering argument against failing to initialize variables.

For starter, you don't have prescience. You can't prevent future version from growing new alternate code paths. As the code evolve, checking that every initialization-free variable will still get initialized is a burden and easily overlooked. Then there are more potent evolution that will make it just impossible: the code will get generalized, or moved into a library, or get a a context object to be able to run multiple instances. Or will get a state machine to handle complex combination of options.

You're relying on the compiler telling you the variable will get initialized. But the compiler may change. Or the human compiling it might not look at warnings.

OTOH, double-iniitalization is mostly a myth. If the compiler can tell the variable may not get any initialization and warn you are exactly the same as the case the optimizer will be able to see the double init and optimize it out. I've looked at generated code and see compilers properly eliminate redundant stores.

You're better off betting on optimization and get bitten by occasional double-stores than get bitten by uninitialized variables and give system penetrators yet another free lunch.


My issue is not performance: I believe that you will have more coding errors due to the semantic error inherent in the variable being initialized twice. In some languages... in many languages... this is in fact impossible to even type, and the arguments why are due to semantics, not performance.

If it makes you feel better, move this code to a function that returns a single value and initialize the variable once in one place (and then mark e value const!). I actually also agree this is a better option, but semantically the if/else is clearer than the if/default, so I would not fall back to if/default over if/else.

As for relying on the compiler: turn that warning on manually and turn that warning into an error manually. (Do not turn all warnings on, as upgrading your compiler can break your build, but turn on warnings that you actively want to verify your code against and have cleared your code against (including this one.)

I will also argue that you should treat your static analysis similar to a dynamic unit test on your code: it doesn't matter if some people are using a compiler that doesn't have this feature (though that seems unlikely as I believe all major compilers support this feature) as the maintainer will, in the same sense that running unit tests sometimes requires more dependencies than running the underlying code.

Again, though, if you are very used to the if/default control flow, I can appreciate wanting to use it: I just would rather see people take advantage of static language analysis that is present to deal with silly errors, and then arrange their code to better handle high-level logic errors.


If you use static analysis, defensively initializing to a wrong value may mean that your tooling doesn't tell you when you fail to correctly set it from that wrong value.


I have seen this argument numerous times before.

It doesn't hold water, because potentially initializing variables to wrong values is just as much as consequence of using that static tool, as it is a consequence of an "always initialize" coding habit.

The static tool produces a report of uninitialized variables or potentially uninitialized ones. The developers react to this by fixing all those instances, so that the tool is "shut up". Once that happens, they have a code base for which the tool doesn't report anything, just like the other developers who always initialize.

There is no reason to believe that either team has a lower or higher rate of incorrect-initial-value bugs. Or is there?

I would argue that the developer who is writing the code, at that moment, quite possibly has a better idea of what the correct initial value should be, than someone who is running a tool across a large code base and following up (by him or herself).

Really, the coding convention should be "always initialize, with the correct value which doesn't need to be overwritten later."

Even in imperative languages, we should program as "functionally" as is reasonably possible.

Imperative code that mutates local variables should only do so if it implements iteration. If the code has no loops and is setting local variables conditionally, that shows that it's trying to influence the behavior of some common code where the control flow paths merge --- instead of putting it into a function!

In the specific program we are looking at, for instance, the function is trying to use the head processing block at the end, but over different input streams. This can be done by putting it into a function, so then there is no assignment to the fp variable:

   if (!*argv) {
     head_processing(stdin);
   } else {
     FILE *fp = fopen(*argv, "r");
     /* check for fopen failure ... */
     head_processing(fp);
     fclose(fp);
   }
We can eliminate assignments out of loops, too, but that requires tail recursion:

   /* pseudo code */
   head(int argc, char **argv, int first_time)
   {
     if (argc == 0 && first_time) {
       return head_processing(stdin);
     } else if (argc == 0) {
       return 0;
     } else {
       FILE *fp = fopen(*argv, "r");
       /* head_processing checks for null */
       int status = head_processing(fp);
       fclose(fp);
       
       /* tail call, if status was good */
       return (status != 0) 
              ? status : head(argc - 1, argv + 1, 0); 
     }
   }
C programmers will tend not to be comfortable with this style, and it requires tail call support in the compiler to not generate O(N) stack. So for pragmatic reasons, you can't make a coding convention out of this.


No tool will help you much if you don't know how to use it. In this case, using the tool wrong, one failure mode puts you in the same situation as not using the tool and defensively initializing. That is a very poor argument that the tool does no good.


That argument isn't that the tool does no good. The tool is useful even if you follow the practice of always initializing: it helps catch accidental deviations from that practice.


Well sure. My point is that it basically devolves into checking that if you treat it poorly, and can be more useful if you don't. If you make a habit of initializing variables with the wrong value, it's less useful.


My other comment was in response to the first half of your comment, before an edit added more about structure of code. I'd say I partly agree, but I think another consideration is that in C there is a limit to how much you can reasonably fit in an expression. What in, say, Haskell might look like...

    let f = case something of
                Foo -> 10
                Bar -> 22
cannot reasonably take the same form in C, despite there not being a "real" update going on.

This means in C you wind up writing things like,

    int f;

    switch(something) {
        case Foo:
            f = 10;
            break;
        case Bar:
            f = 22;
            break;
        default:
            /* ... handle error ... */
    }
And the flow is basically the same, but with an additional degree of freedom to make a mistake, and there some static checking can have your back.

Incidentally, on GCC and Clang I recommend building with -Wswitch-enum, which will help point you at the above when you add something to the enum that contains Foo and Bar.


The compiler will issue a warning if a code path uses an uninitialized value.


As I understand it, you can fclose(stdin) without issue on almost all POSIX systems.


Sure, but it's surprising, and in code, 'surprising' usually translates to 'wrong'.


Yes, I know this, and in a throwaway little program where you are sure you are the last one to ever touch stdin (because you're calling exit right there in the loop, for instance) it is okay.


In old sysvinit-style daemons, closing stdin/stdout/stderr (or redirecting them to /dev/null) was a necessary part of the daemon initialization process.


Not sure that scoping to the block + unconditionally doing the initialization actually helps readability, though I do think doing one or the other would.

Also the original is super efficient which is pretty cool, and (at least when correct) is arguably more readable (probably not so much re. extendable) since it seems like there's no way to look at the code and think you know what it's doing when you actually do not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: