Hacker News new | past | comments | ask | show | jobs | submit login
Misconceptions about loops in C (acm.org)
166 points by matt_d 7 months ago | hide | past | favorite | 182 comments



If you're wondering why the paper seems to complicate "simple" constructs by converting them to control flow graphs — that's because many other kinds of static analysis, like tracking when variables are initialized and overwritten, are even more puzzling when trying to analyze very stateful code directly based on the syntax tree. The graph representation allows splitting the code into smaller blocks, and remove a lot of statefulness from them.


So this is specifically from the point of view of someone analyzing control flow, and how that can be tricky or ridiculous based on how many features of C you combine. Not what I expected but quite interesting. Especially the difficulty in defining a loop.


LLVM tries to canonicalize all the loops into one of the two canonical forms[1] before doing any useful analysis and transformations. It has been quite effectives. Not all loops can be shaped into a canonical one though.

[1]: https://llvm.org/docs/LoopTerminology.html


Best footnote for June:

returns should be thought of as domesticated gotos but domesticated in the same way that cats are domesticated


Good footnote but crap take

As even their example mentions error handling

Goto sucks but not having proper exception handling sucks more


So returns are gatos?


Back in the 80's it was normal for C compilers to recognize `for` loops and do optimizations on them.

I had attended a summer class at Stanford on Data Flow Analysis, and took advantage. My compiler would break all the C flow control constructs into `if` and `goto` statements. Then, the optimizer would reconstruct the loops using graph algorithms.

Then, no matter what snarl of for, while, break, do-while, continue switch, or goto statements the programmer wrote, the optimizer could find all the loops, identify the loop variables, the loop header, etc. and then optimize it.


While good in general, that does have its downsides. I've had a case of adding an "if (undesired input format) { change input format; goto place_that_decides_method; }" resulting in clang now considering most of a massive function as being a loop body, and thus a bunch of computation from various branches being extracted to be always calculated outside "the loop" (despite me knowing that at no point should any single path (beyond the small dispatching part) be executed more than once).


All optimizations have cases where they are deleterious. The number I used for loop optimizations is assume the loop executes 10 times. It works reliably well.


GCC does the same thing to this day. As a result, the GCC AST hasn't much S in it, except for some OpenMP constructs (which are preserved in the AST). It's not great if you want to write syntax-driven tools.


Back in the late 80s, the Mac MPW C compiler would create a stack frame every time you used a brace bracket.

As a 68k coder (from Amiga)I often looked at the code generated and re-write the pinch points in assembler.

A Boyer-Moore text searching algorithm being one of them. The C code was nicked from Dr Dobbs.


I imagine that would also be a good idea, if you wanted to (re-) optimise something like Risc-V assembly?


This reminds me of the decompilation stuff I worked on long ago, which was essentially a large recursive pattern-matching algorithm. Flows that couldn't be matched as loop constructs just got left as gotos, and breaks would be identified as such where possible, but for the most part it worked pretty well with the (very) stupid compiler output of the time, and even with much of the handwritten Asm that was far more common than it is now.


For the first time, I used a c style loop in a bash script today. What a coincidence


These are all obvious but this looks very useful as a checklist if you are writing a static analysis tool, which I think is the intended purpose.


It’s great to see this write up - should be mandatory reading for anyone getting started with compilers. But it’s not news for the experienced folks in the field.


I wish people would quit pretending that a research paper should only have the most novel/freshest/unique results. It would not only dramatically improve researchers lives it would actually improve the average quality of research output because you would no longer have to exaggerate claims.


It’s a terminology issue for me.

This paper is documenting reality that is known to folks who practice the craft so in that sense it is not research as I would normally think of it. But it is useful documentation and so worth having written down.


Literature reviews are still research. As are other publications that are not just completely novel observations. I think your terminology is in error here.


> folks who practice the craft so in that sense it is not research

I "practice the craft" (and I'm fully, gainfully, highly employed to practice the craft), I have a PhD in the "craft", I have published papers about the "craft", and I'm here to tell you IDGAF about what's terminologically research and what's not. I only care about well-written, detailed articles that explain a complex technical issue, problem, or solution.


There are lots of well written detailed articles that don’t call themselves research.


There are also lots of horribly written, superficial articles that do call themselves research. Hence my overarching point: let's quit with the dog and pony show.

EDIT:

"Research is 'creative and systematic work undertaken to increase the stock of knowledge'. It involves the collection, organization, and analysis of evidence to increase understanding of a topic, characterized by a particular attentiveness to controlling sources of bias and error."

The crucial/key words/phrases here are: "systematic", "increase understanding of a topic", "controlling sources of bias and error". Nowhere do I see "completely brand new never before heard of".

https://en.wikipedia.org/wiki/Research

EDIT2:

> It’s a terminology issue for me.

PLDI, the premier compilers/PL conference, which you must put a lot of stock in since you care about the current institutionally supported notion of "research", disagrees with you.


I don’t put any stock in PLDI.

I’ve been published there, twice, but academic understanding of PL implementation is probably at least a decade (if not more) behind where the top folks in industry are. If you’re good at abstract interpretation, IR design, GC, etc then you’re going to be making good money and having a blast shipping that shit to people, usually in some open source context with a friendly and nontoxic community around you. So much better than the hatchet-job style reviewing and publish-or-perish of the PLDI crew.


> I’ve been published there, twice, but academic understanding of PL implementation is probably at least a decade (if not more) behind where the top folks in industry are.

My friend you're just driving more nails into your own coffin; yes I'm very well aware of this and hence all the more reason to dispense with the idea that what PLDI publishes needs to be cutting edge novelty.

> So much better than the hatchet-job style reviewing and publish-or-perish of the PLDI crew.

Again: you're literally contradicting yourself. The reason PLDI and all other "top" venues are so toxic is because of people like you on review committees that demand utmost novelty. Relax your requirements (i.e. encourage them to relax their requirements) and you will magically transform academic research into a fun, friendly, and vibrant community.


I’m not on the PLDI review committee. Haven’t ever been.


> i.e. encourage them to relax their requirements


Yes, amen

There's not having to exaggerate and moreover there's value in rehashing known but tricky stuff


I wouldn't call this a research paper. It doesn't include any research.


PLDI disagrees with you. I generally don't care about institutional pedigree but I think in this case I'll defer to them over u/slashdave.


This is not published at PLDI but at the co-located SOAP workshop.


Literally who cares? It's again the same thing: some kind of weird elitism about what qualifies as real research and who qualifies it. It's so simple in my mind: people did good work and they're willing to share it with the world without much in return (just a venue and 30 minutes to talk about it). Cool great I'm all for it no matter whomever/wherever/whatever. There is exactly zero scarcity here anymore that necessitates ranking anything at all.


I don't care where papers are published (in fact I mostly agree with you, I think that the idea that research needs to be novel to everyone currently in the field is harmful and leads to gate-keeping, self-censorship and unpublished folklore). I am merely pointing out that appealing to the authority of PLDI as the premier PL research conference to qualify the paper as research does not work since PLDI did not publish it.


> I am merely pointing out that appealing to the authority of PLDI as the premier PL research conference to qualify the paper as research

I wasn't actually - I was pointing out the contradiction in supporting gatekeeping and then the second-guessing the gatekeepers.


I didn't say it wasn't good work, I just said it wasn't research


> [...] should be mandatory reading for anyone getting started with compilers.

Well, only for those folks who you taint with C before you let them loose on compilers. If you follow something like https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_... as your introduction, I doubt you would benefit from worrying about pitfalls of C.


But then you’d want your compiler to infer loops from tail recursion and then you’d have all of the same problems as this paper cites.

The paper’s title is misleading in that way. It’s describing problems with loops, not problems with C loops.


That's a good point. I guess I would say the article (or something like the article) is good for someone just a bit beyond 'getting started with compilers', but that's probably just a matter of exactly where you draw the line between beginner topics and slightly more advanced ones.


First author is named as 'Martin Brain'. I wonder if that's a case of nominative determinism at work.


it was probably originally Martian Brain


I think it's really unfortunate that most programmers are taught c style loops before tail recursion. But this is also a symptom of standard c's problem of not allowing one to define functions inside of functions. A loop can always be expressed with tail recursion. While the reverse is also true, there are many problems for which this is not straightforward given that a tail recursive function can have multiple exit points and these have to be coalesced into a single flag for the loop.


Without having a way to tell the compiler to fail if it can't TCO (f# finally got this with a function attribute in 8.0), teaching tail calls first is only going to lead to problems because screwing them up to pile up on the stack is likely when you don't know what you're doing.



I was pretty sure OCaml had it but I'm less than a newbie with it (touched dune a little bit and built tiny apps but nothing real) so didn't want to bring up something I wasn't sure about.

And Scala I haven't looked at in like a decade, but thinking about it I am not surprised it has similar protections. I feel like Clojure also had a keyword to recur as tail but probably been 12 years since I used that language so my memory could be completely faulty


How is this unfortunate?

Most programmers learn about loops pretty much at the absolute start of their development experience, where they don't yet have a way to talk about recursion. Don't even start about tail recursion or tail recursion optimisation.


> where they don't yet have a way to talk about recursion.

I'd like to know how its unfortunate as well, I'm not sure I agree with this though.

    int a = 0
  begin:
    if a == 10 {
      jump :end
    } else {
      a = a + 1
      jump :begin
    }
  end:
The programmer will have learnt that programs have a beginning and an end, they will have some notion of a variable, its type, and manipulating their values. They will even likely have learnt conditional branching logic. The only new concept here is that of jumping areas of code.

If you next introduce methods you can clean it up and illustrate it more cleanly:

  myFunc(int a) {
    if a == 10 {
      return
    } else {
      a = a + 1
      return myFunc(a)
    }
  }

  myFunc(0)
Finally you can explain the programmer "hey, there's this shortcut we can take called a loop that expresses this more succinctly":

  int a = 0
  while (a != 10) {
    a = a + 1
  }
Nice simple-looking code. Yet this concept requires being able to grok much more than the relatively simple tail-recursive definition.


I suppose the order your introduce those examples in ultimately comes down to whether it's more important for the student to first understand what the computer is doing or how it is doing it.

Most people don't start out thinking like computers, so I think it's probably more important for a new student to understand how code describes a particular series of operations and then help them translate that into how a computer "thinks" about those operations.


Most people don't start out thinking like computers

Everyone who has followed a sequential list of instructions would strongly disagree.


Which, FWIW, are almost never written using recursion: they are written with either imperative loops or the moral equivalent of a goto statement.


I think most people learn about loops before they've learned about functions because your first example maps obviously to your last.

I don't think your middle example would be obvious to most people learning about functions until they've had quite some time to get their heads around what functions do, even with the context of the top piece of code.


> I think most people learn about loops before they've learned about functions

That may be so, however I was taking issue with the idea that they couldn't possibly understand tail recursive functions first. Many (if not all) of the concepts that loops introduce also get introduced with functions (scoping, regions of code, stack parameters, jump return statements e.g. break/return). The programmers just may not have words for all of them with loops, however these concepts are usually explicitly covered with functions.

Loops are particularly useful in applications of batch processing or indirectly for parallelization (again, actually functions are more useful here), so they may learn them for a business use case first, but that doesn't mean they couldn't learn to master both functions and tail recursive variants first. As a sibling commenter pointed out, if you were to come from math's background you might even naturally prefer this construct.


I feel like there's a semi-philosophical question somehwere here. Recursion is clearly a core computer science concept (see: many university courses, or even the low level implementation of many data structures), but it's surprisingly rare to see it in "day to day" code (i.e I probably don't write recursive code in a typical week, but I know it's in library code I depend on...)

But why do you think we live in a world that likes to hide recursion? Why is it common for tree data structure APIs to expose visitors, rather than expecting you write your own recursive depth/breadth-first tree traversal?

Is there something innate in human nature that makes recursion less comprehensible than looping? In my career I've met many programmers who don't 'do' recursion, but none who are scared of loops.

And to me the weird thing about it is, looping is just a specialized form of recursion, so if you can wrap your head around a for loop it means you already understand tail call recursion.


> but it's surprisingly rare to see it in "day to day" code

I rarely use them because I became tired of having to explain them to others, where I've never had to explain a simple while loop that accomplishes the same thing with, usually literally, a couple more lines of code.

From all of my experience, recursion is usually at the expense of clarity, and not needed.

I think it's related to the door memory effect [1]: you loose the sense of state when hopping into the function, even though it's itself.

[1] https://www.scientificamerican.com/article/why-walking-throu...


Not "doing" recursion as a principle is often a sign the person has not been exposed to functional languages or relational kind of programming like Prolog. It often points at a lack of experience with what perhaps is not so mainstream.


Or the person is sensibly trying to make the code easier for other people to understand.

I am tech lead and architect for large financial systems written in Java but have done a bunch of Common Lisp and Clojure projects in the past. I will still avoid any recursion and as people to remove recursion from their PRs unless it is absolutely best way to get readable and verifiable code.

As a developer your job is not to look for intellectual rewards when writing code and your job is not to find elegant solutions to problems (although frequently elegant solutions are the best ones). Your job is taking responsibility for the reliability, performance and future maintenance of whatever you create.

In my experience there is nothing worse than having bright engineers on a project who don't understand they are creating for other less bright engineers who will be working with it after the bright engineer gets bored with the project and moves on to another green field, rewarding task.


The stack traces when something goes wrong are inscrutable under recursion. Same when looking at the program state using debuggers.

Fundamentally, the actual state of the program does not match the abstract state used when programming a recursive function. You are recursively solving subproblems, but when something goes wrong, it becomes very hard to reason about all the partial solutions within the whole problem.


> The stack traces when something goes wrong are inscrutable under recursion.

Hmm. This is a real issue, for the simple case. If tail recursion is not optimized correctly then you end up with a bunch of stack frames, wasted memory...

I propose partially this is a tooling issue, not a fundamental problem with recursion. For tail recursion the frames could be optimized away and a virtual counter employed.

For more complex cases, I'd argue it matters less. Saving on stack frames is still preferable, however this can be acheived with judicious use of inlining. But the looping construct is worse here, as you cannot see prior invocations of the loop, and so have generally no idea of program execution without resorting to log tracing, while even the inlined recusion will provide some idea of the previous execution path.


I don't agree with your last statement. I've been programming forever, and I understand recursion, and I use it, but I never equate it with loops in my mind (unless I'm deliberately thinking that way, like now) and I always find it more difficult to think about than loops.


This is more about people feeling they are clever. It's the CS equivalent of "everything is an eigenvector".


It is unfortunate, because many programmers develop some feeling of recursion being weird, more difficult or in any way worse than a loop, while this is mostly based on unfortunate language design and lack of familiarity with recursion. It also often comes along with fear of recursive data structures, which is holding programmers back and the products they make.


Most people learning programming have already been exposed to function calling in math (f(x)=x+1). Recursion is not a very big jump semantically from this. Conditional loops are a (relatively) big jump.


> Conditional loops are a (relatively) big jump.

I'd be very shocked if anyone past the age of 4 or 5 had never heard (and learned to understand) statements like "Wash your hands until they're clean" which is a conditional loop (wash your hands, check if they're still dirty, repeat if they are, stop otherwise). If a teen or adult learning to program has trouble with conditional loops, I'd be very very surprised. The translation into programming languages (syntax) may be a challenge, the correct logical expressions for their intent may be a challenge, but the concept should not be.


I happen to know (due to job), that many adults have problems grasping for loops (in Python) when learning programming. It is one of the main points where problems arise in programming introductions. It may all depend on how it is explained for what person, as different people understand different explanations better than others. Or it may be syntax related. Or that people for the first time fathom how they can make the computer do things in such a time saving manner. Who knows.


If you're teaching someone c, tail recursion is just handing them another gun to shoot themselves in the foot with.


> there are many problems for which this is not straightforward given that a tail recursive function can have multiple exit points and these have to be coalesced into a single flag for the loop.

Loops don't require coalescing the exit condition to a single flag (or even a single conditional expression). You can also use break (or your language's equivalent) to allow multiple exits from the same loop.


Break also considerably complicates loop semantics. For example you can no longer rely on the loop condition being false on loop termination. Instead you have to do a flow analysis of every loop and conditional leading to the break to determine the established postcondition.


"Well, I'm done with this now" is a very nice, simple, and most importantly obvious addition to a mental model that will translate to things they've been doing since they were toddlers.

It has consequences, but it's first week sort of thing because it's so easy/relatable.


True, but the person I responded to wants multiple exits in tail-recursive functions which introduces the same complications as multiple exits in loops, non-recursive functions, and non-tail-recursive functions.


> For example you can no longer rely on the loop condition being false on loop termination. Instead you have to do a flow analysis of every loop and conditional leading to the break to determine the established postcondition.

I rarely write loops where the relationship between the loop control variables has anything meaningful to say about what the iterations did.


Simple example: Get input from the user until the input matches a condition, often "is it a number?".


Even if that’s true (and you may be more often than you realize), you still rely on a great deal of code that does, like searching and sorting.

It is a little disappointing how few programmers know how to construct a nontrivial loop that both always terminates and does what it’s supposed to for all inputs.

Binary search is a classic example. If you think it’s trivial to get right, you probably aren’t. Knuth has some interesting things to say about it in The Art of Computer Programming.


Binary search is a great example of why you can't just look at the loop control variables to understand what body does for the iterations.

You have to look at both. If it turns out that the body's behavior can be easily described in terms of the loop control variables (or the reverse), great, but assuming that it does and trying to fix up things when that gets complicated is a disaster.

Of course, if you dislike break, you can fake it with continue and a flag, as long as your loop control guarantees termination. However, that complicates understanding the loop's behavior in terms of said control.


But for a programmer break is wonderful. Maybe you have a loop condition for a normal exit and use break if an unusual condition (eg EOF) occurs.


If you're trying to do control flow analysis on an AST, you're going to have a bad time. The article is a blinking red advertisement to not try to analyze loops at the source level.


I did spend about half the article going "... why are you trying to do analysis based on the AST instead of using a control-flow-graph-based IR?"


I don't think many people consider tail calls more intuitive than a while loop. The opposite is more obvious.


> I don't think many people consider tail calls more intuitive than a while loop. The opposite is more obvious.

This depends a lot on the background of the respective person. For someone who is more trained in classical mathematics, tail calls are more intuitive (or rather: nearer to their knowledge base) than while loops.


While the formal concept of applying a function that calls itself may be more familiar to someone trained in classic mathematics, everyone is somewhat familiar with while loops, from basic life things like 'while hungry, eat' or 'salt to taste -> while (not_salty_enough()): add_salt()'


I've successfully explained a while loop to many a child, and adult, with no experience in programming, within about 5 minutes.

I'm absolutely sure the people who think tail recursion is any way similar, in complexity, to a while loop have never attempted to teach someone how to program.


add salt; not salty enough?; do same again!

The explanation is just as simple translatable, even saving one the words like "while" or "until".


I mean "for" and "while" loops are effectively used in natural language all the time, even to the youngest of kids.

"Do this action 10 times" "Put one item in each box"

I fail to see how recursion is /more/ intuitive, in pretty much every (english) language construct I can think of the condition is separate from the action.


That's like saying "it's unfortunate kids are first taught elementary arithmetic before integrals". C-style loops are natural and easy to understand, tail recursion is not.


Or like saying they should start their study of addition and multiplication from the perspective of group theory - since the integers (Z) under addition forms a group because it satisfies closure, associativity, identity, and inverses, while Z under multiplication does not form a group because, although it satisfies closure, associativity, and identity, it fails to provide inverses for all elements.


I'd like to remind everyone that the computer doesn't care. The computer knows JMP (e.g. GOTO).

Everything else is just abstraction's we're built up to make our lives easier.

If we always jump to the top of the function, or if we jump someplace else, the machine cares not. Just don't blow the stack.


C loops are natural and easy to understand for programmers who were trained on loops. I doubt that tail recursion is any harder to get for students who don't have that background. Would be an interesting experiment though.


Iteration is more natural for most people than recursion. There are a few aliens for whom recursion is more natural, but iteration matches more closely daily life.


People that see programming as math get really excited about recursion.

Everyone else doesn't care, shouldn't care.


Hey, other kinds of needs can do and should care. Recursion is a very interesting perspective to apply to a problem and gratifying in its own right.


There's this summation notation that these people who see programming as math and get excited about recursion forget about too.


I think recursion is just another tool in the toolbox: sometimes, some solutions are easier to think of recursively, and so it makes sense to use it, at least for the first prototype.

But yeah, using it systematically (in a non-functional language) is unwise.


Recursion is a form of iteration.


Worse: both are theoretically equivalent.

But this doesn't change the fact that humans seem to intuitively perceive iteration as more natural.

Like, "space" and "time" can't be understood as two distinct notions, per relativity. But we still perceive them as such on a daily basis.


I think its fairly clear that loops are more natural and obvious for most people for most tasks.

There are examples of other communities which have invented something pretty much the same as loop syntax, for example you get things which are basically loops in knitting patterns or (less reguarly) in recipes. I have never seen an example of recursion outside computer science/programming.


> for example you get things which are basically loops in knitting patterns

Also in crochet. A pattern will have instructions like

* (2tr, 3ch, 2tr) all in next 2ch sp; repeat from * 6 times more

where statements like 2tr mean 2 treble stitches and the bit between the * symbols is repeated 6 times


You guys should write these crochet patterns in continuation passing style!


Rinse and repeat.


This is what happens when we eliminate shop class. Sand this table leg. Good. Now do that three more times. Kids used to know loops before they ever touched a computer.


C doesn't have a "do this 3 times" loop though. It has an "initialize counter to 0 first, increment counter after each operation, and compare the counter to 3 (or is it 4?)" loop. Kids definitely don't get this without being taught.


//Precon: Need 4 legs. Have none.

Int counter_of_sanded_legs = 0;

do{

Sand_a_leg; Counter_of_sanded_legs++;

} While (counter_of_sanded_legs < 4);

Need four legs, have none. Doing while I don't have enough. How many legs do I have? 1. Doing. How many legs do I have? 2. Doing. How many legs do I have? 3. Doing. How many legs do I have? 4. Done.

Alternatively:

for (int i=0;i<4;i++;) {

//how we start; whether we repeat; how we update our counter.

Sand_leg();

}

You only need to sand a leg when you don't have enough legs.

These aren't that far off of actual things you'd do moving about. In fact, not realizing that you're distilling something physical into the symbolic is a leading cause of confusion in my experience.

A "do x times" dedicated looping construct would be syntactic sugar that detracts from understanding what the underlying semantics are; which if you're teaching how to program is a bad idea. The idea is to have your understanding precede the machine. Not lag behind.

You can start screwing around with abstractions after you get the basics. As once you have the basics, you can compose them into any form you want.

The rest is just increasing levels of crippling mental illness.


Exactly. Go make a table with 4 legs and also every leg has 4 legs. Now you get recursion and a very steady table.

You were a kid in shop class. Now you want your kids to take shop class. Because, recursion.


> Go make a table with 4 legs and also every leg has 4 legs.

If you have to make up nonsensical examples to prove your point, then you probably don't have a point.

> You were a kid in shop class. Now you want your kids to take shop class. Because, recursion.

That's not recursion.


The first paragraph was more of a parody of your just-so argument than point per se; but the second paragraph is literally recursion.


How is it not recursion?

  def spawn():
    take_shop_class()
    return spawn()


Recursion is more beautiful.

But take a newbie, and "do this 3 times" in a loop is far more clear than the recursive equivalent.


How do you measure beauty? You can't: "beauty" is subjective. And even if you try e.g. count the times you use recursion vs. iteration: that metric is subjective and not grounded in reality.

Sometimes recursion does allow you to reason about code more easily or come to a working solution faster, sometimes it does not.

Measure the concrete: CPU time and memory consumed. Iteration will likely trump recursive methods w.r.t both these metrics. If it doesn't, you can likely transform your iterative algorithm to one that utilizes SIMD (not always).


> How do you measure beauty? You can't: "beauty" is subjective

Let me try: in classical dance, martial arts, or even skateboarding, advanced skills manifest as effortlessness: the movements comes naturally, they're not forced, things just flow.

If you compare a typical functional (recursive + pattern matching, but the point would stand even with a fold) factorial with an imperative one (for loop), the functional approach is more effortless, you have to be less explicit about what's going on. It's more eloquent.

However as you seem to imply, when we're programming, the focus should be on delivering something that works as expected; this particular kind of aesthetic is secondary at best.


> How do you measure beauty?

Using the beholder's eye, of course.


Often I get the impression that beauty is defined as "how far is this from what the computer can actually do efficiently?"


This is a painfully stemlord response. 'Aesthetic qualities cannot be objectively measured therefor they do not exist'. Christ. Its fine to pick a metric to evaluate one thing against another, its another to deny that there are other ways to value things.


I never understood anyone who used such strong language like "beauty" when talking about something mundane like recursion. I suspect it's just in-group language like how a cult understands the meaning of their own words, while to outsiders it would appear nonsensical. I have seen other groups appropriate the word "beauty" too, like people who chronically overeat.


https://en.wikipedia.org/wiki/Mathematical_beauty

Simplicity is actually a hallmark of this sense of beauty.


Nope, it just has its own beauty. The material blew me away in school.

What culty group even exists around recursion anyway? Can you get me an invitation?


Too bad C doesn’t have a “do this 3 times” loop.


Recursion is more natural and deeply-engrained than people realize. Natural language has a recursive grammar:

My dog => My friend's dog => My father's friend's dog => My father's sister's cousin's nephew's teacher's friend's plumber's dog's chew toy.

We can construct sentences like this with a completely arbitrary level of recursive nesting. It's so natural that people do it without thinking about it. Recursion only becomes unnatural when we prime people on it and get them thinking it's scary and complicated.


I'm surprised you consider that construction "simple". Recursive relationships like that are used for brain teasers and even a famously absurd joke in Spaceballs.

    Darth Helmet: I am your father's brother's nephew's cousin's former roommate.

    Lonestar: What does that make us? 

    Darth Helmet: Absolutely nothing.
In fact, standard English tends to avoid recursive relationships with specialized kinship terms based on ordinals like first cousins, first cousins once removed, great-, and so on.


I'm surprised you consider that construction "simple".

It's VASTLY simpler than the alternative. See how difficult it is to express the same relationship without using the recursive grammar.

Anyway, give anyone a family tree diagram containing all of these relationships and they can follow the chain from that sentence to the destination. This is the essence of why we use recursion in computer science in the first place: it's the best tool for navigating trees.


> Anyway, give anyone a family tree diagram containing all of these relationships and they can follow the chain from that sentence to the destination. This is the essence of why we use recursion in computer science in the first place: it's the best tool for navigating trees.

This is because English sucks at tree relationships. Other languages are much better at this.

For example, Mandarine Chinese has unique words for each side of the family tree (e.g. unique word for Grandma on mother's side vs Grandma on Father's Side), and also a rather logical system to describe how you navigate the tree.

Chinese isn't even unique in this, it is just that English is really really bad.


Is having a unique word that much of an improvement over "paternal grandmother" vs "maternal grandmother"?

Or is it that you're referring to other relations than those two having unique words? If so then that would seem to introduce its own problems in ballooning the vocabulary.

Maybe English is just happier with the ambiguity?


One word helps make it easier.

I got very tired, very quickly, even as a kid, of saying "Grandma on my Dad's side".

I had two Uncles with the same name, again, "Uncle <foo> on my Dad's side" is a PITA after awhile.


I don’t think this is true at all. I remember vividly when I learned about loops („oh it is just the statement multiple times“) vs learning recursion (spending 3d on an assignment from uni and scratching my head and not getting how the heck this makes sense)


So natural, yet the specific example you provide is broken


Using recursion for loops then letting the compiler work out that it's not actually recursion and it actually a loop is more for people caught up in the pageantry of programming than people who want to make programs run.


Loops are more intuitive for similar reasons that ∑ and ∏ are more intuitive than the equivalent recursive formulation.


Loops themselves are a design mistake. Look how wordy loop notation is. This is a sign of an internal struggle: the real thing does not fit a Procrustean concept. In C we have five loop-related keywords: 'do', 'while', 'for', 'break', and 'continue'. All of them are syntactic sugar for 'goto'. Yet all that sugar is not sufficient for certain cases of control flow. And these cases are not even exotic. We all know how to loop forward in C:

    for (i = 0; i < n; ++i)
Now let's try to loop backward:

    for (i = n; i--; )
Why this thing is so asymmetric to the forward case? And doesn’t it, perhaps, start to become a little too idiomatic? Now, if we try to rewrite them with 'goto':

        i = 0;
    a:  <body>
        ++i;
        if (i < n)
          goto a;

        i = n;
    b:  --i;
        <body>
        if (i > 0)
           goto b;
Isn’t this more symmetric? Can we now see that the chief reason of asymmetry is that 'n' is not a valid value for 'i' but '0' is?

'Goto' is, of course, an implementation detail and is not convenient for reasoning. Thing is loops are not necessary for reasoning either. They are an implementation detail mistakenly dressed into reasoning clothes.


I also think that some algorithms can be expressed better using goto than with a combination of nested loops, break, and continue. For example, Donald Knuth talks about the "basic backtrack" algorithm in fascicle 5 [0], which he uses to implement a solution to the N-queens problem.

It's possible to express it as a set of while loops (which someone on Stack Overflow showed [1]), but it's, IMO, less readable than the goto version. Of course, the recursive solution is the most readable anyway.

Edit: Also, both the goto version and while loop version of that algorithm on the Stack Overflow post have an out of bounds index when col and row are both zero. That algorithm was meant for 1-indexed arrays and was not properly translated.

[0]: https://cs.stanford.edu/%7Eknuth/fasc5b.ps.gz

[1]: https://stackoverflow.com/questions/78614303/can-knuths-algo...


This is absolutely true. Even advanced loop syntax implies nesting; but control flow in general is not hierarchical. Simple loop syntax in C is even more limited, because it uses a fixed place for the loop condition test: either at the start or at the end of the body. As a result if we need to do something, then test the condition, then do something extra, we are stuck. For example, we may want to print a list of stings separated with commas:

    void printManyStrsWithCommas(char *s[], int n /* trusted to be positive */) 
    {
        int i = 0;

    a:  printOneStr(s[i]);
        ++i;
        if (i < n) {
            printOneStr(", ");
            goto a;
        }
    }
Here the loop condition is 'i < n', but when it is true, we need to do one more thing (print a comma) before resuming the loop. If we are to stick with non-'goto' syntax we have the following options:

- Somehow cram it into the loop expression with the comma operator. This is very limited because it is an expression. For example, what if 'printOneString' can err?

- Somehow use 'while(1)' with 'break'. It may even compile to the same code as with 'goto'. But why, then, we have to use an unbounded loop syntax with a clearly bounded sequence?

- Somehow add additional variables and tests and either lose efficiency or trust the compiler to lead us out.

With 'goto' the code does only what is necessary. What 'goto' does not do is that it does not immediately convey a clear idea that the code repeats certain steps.


In this particular case I think you could do one of the following (ignoring the "while (true)" + "break" combination):

1:

  while (printOneStr(s[i]), ++i < n)
      printOneStr(", ");
2:

  goto start;
  do {
      printOneStr(", ");
  start:
      printOneStr(s[i]);
      ++i;
  } while (i < n);
But for this kind of code in general, I have thought that the "do while" statement should be extended to take an extra statement after the "while (cond)" part. It could still be an empty statement, with just a semicolon, like "do ... while (cond);", exactly like it's already used; but in cases like yours, you would be able to write this:

  do {
      printOneStr(s[i]);
       ++i;
  } while (i < n) {
      printOneStr(", ");
  }
As a bonus, the current mandatory semicolon after do-while statements would stop being a special case for the grammar of C.


You are writing it differently with the goto. You can do the same with the for loop and it becomes symmetric again

  for (i = 0; i <  n; ++i);
  for (i = n; i >= 0; --i);
(Note: not sure if intended that backward should run more often and include i = n)


To be symmetric, the loops should read like this:

  for (i = 0; i < n; i++)
  for (i = n; --i >= 0; )
The second one could also be this:

  for (i = n - 1; i >= 0; i--)
I say this because your backwards loop accesses the inclusive range [0, n] while your forwards range accesses [0, n - 1].


  for (i = 0; i <= n-1; i++)
  for (i = n-1; i >= 0; i--)


Yeah, that way of writing them definitely makes them look more symmetric. You could also do this, though it's a bit less pretty:

  for (i = -1; ++i != n; )
  for (i = n; --i != -1; )


Your post is a good example of how backwards iteration is so error prone and fragile.

Your reverse iteration would result in an out of bounds access as well as unsigned integer overflow which results in an infinite loop.


If we’re going to fundamentally alter the way programmers are educated to handle repetitive tasks, the first lesson should be on map and reduce, not counters or recursion.


C compilers may not optimize tail recursion properly, so it's not necessarily safe to teach C programmers to do that.


In my experience, modern optimising compilers can turn many naive loops into tail recursive loops.

Actually learning tail recursion and writing loops for it is a lot less necessary today than it was in the 1980s. It isn't necessary to teach it to beginners anymore because the compiler largely does the heavy lifting for us.

These days, tail recursion is a very advanced performance tuning trick, only used when you've positively identified your compiler version's implementation of your loop as the cause of slowness and you're willing to incur the increased code complexity for the increased performance.


How can tail recursion have better performance if they both compile to the same machine instruction - a conditional jump?


Because it results in a tail call and you don't have to build a new stack frame.


A loop didn’t have to build a stack frame in the first place. Once the optimizer has done it’s job, both the TCO optimized tail call and a basic loop will have the same instructions, hence the performance will be the same.


Yes, that's exactly the point I am making.

Modern compilers optimise this for you, exactly as you said.

Back in the 80s compilers didn't do this. There really was a difference between the machine code emitted for a naive loop and a tail call loop.

A lot of advice to rewrite naive loops to do tail calls is based on that 1980s compiler behaviour, not on the optimising 2020s compiler behaviour.


I think recursion is easier to read than C-style for-loops because you don't need to memorise syntax.

for (int i = 0; i < 5; i++) {...}

vs

```let rec func_name counter = if counter = 5 then print "blast off!" else func_name (counter + 1)```

I don't really need to memorise the order of operations compared to a for loop (initialise, exit condition, increment) and I don't need to memorise if the exit condition means "execute while this is true" or "break the loop if this is true". That's just me though. I think the syntax for recursion is easier but loops might conceptually be easier to understand.

I think, better than both of those for the general case (and a good introduction) are "fold" functions and "foreach"-style loops, which iterate over every element in a list. I think those are used more often, and the reason you may want to do this is clearer to a student.


With beginners, there's a severe "working memory" limitation from the fact that they have no compression that comes from tying things together.

The for line ends up being one independent line they can grok, then free from their memory when looking inside the loop, knowing that I gets bigger for a while. From my experience, something like the recursion will blow away a beginners working memory, because they can't piecewise it. But, the biggest problem is the for loop is trivially expanded with the exact same format, you just shove stuff into the body. The recursion method requires significant reworking to expand. Beginners appreciate simple templates that they can understand and modify, because they're still putting it all together.


Aren’t you leaving out the initialization in your recursive example though?


You're right. I didn't notice that, but I guess the initialisation would be the caller function's responsibility or there would be a wrapper function with a default initialisation value (maybe with the recursive example nested inside the wrapper).


Tail or not, I find recursion to be too unsettling for my non-math orientated brain to deal with on the daily. It's like looking at oneself between facing mirrors. I find the loop (in for or while form) to be more comfortable to reason about. I believe I'm not alone in this situation.


Perhaps they just want to write programs that work without becoming confused?


IMO a functional first language should be the first language people learn. Much easier to go from FP to OOP and even PP rather than the other direction.


How do you know it’s easier?


There seems to be this weird notion going around that the only reason imperative languages seem easier to people is because they're taught them first, and that if we teach Haskell or Scheme to everyone first then they would find imperative programming unintuitive. I honestly think it's a load of rubbish. The fact that humans easily understand "a list of instructions executed one after the other" over monads and tail recursion is not because they are taught C first, but because that's just the basic instructions we communicate actions with. Recipes, directions, instructions for how to tie your shoe laces, etc. They're all imperative


It's a complex topic. The thing is that sequential mutable state doesn't scale, and you'll quickly have to introduce routines, modules, objects of whatever to have some control over side effects. The natural aspect of a list of instruction is exhilarating for many but is also rope to hang your self because you won't try to separate concepts and scopes since everything is accessible. So much stuff just doesn't happen when you don't use this paradigm. Instead of drowning in state variable improperly synchronised, you suddenly rise above and start rewriting trees.


>you'll quickly have to introduce routines, modules, objects of whatever to have some control over side effects

that claim seems absurd. all the functional languages I've come across also frequently use routines, modules, and 'objects of whatever' to manage the complexity of huge amounts of code.

objects/entities are just a pattern for hiding complexity behind a simple interface. they can be useful in any language, regardless of mutability.


I think, because of the mention about controlling mutability/side-effects, "object" in that post means OOP-style objects which mutable state. It's not an issue with immutable objects/structs (or I guess "records" which is the term I'm used to with functional languages).


functional languages still have to deal with storing and manipulating changing value and state, and it is still often advantageous to hide the complexities of that storage and manipulation behind simple interfaces throughout the codebase, using opaque values with helper functions to manipulate them.

OOP as a pattern, rather than a language construct.

the great advantage here is that all manipulations are local. there is no spooky action at a distance in an immutable language.

as long as you stay out of the IO monad, anyways.


I just use simple tail-recursion in an event loop to help manage state.

let rec eventLoop state = let input = getInput () let newState = update input state drawState state; eventLoop newState

The update function takes an existing state and an input record (key events, etc.), returning a new state (which may have nested records within itself but it is all immutable).

New state is kept by the recursion (passing new state to the same function). The only IO is getting input and drawing, which are inherently side-effecting operations.

I do have nested records, but (at each stage) they get joined into a new parent record. No mutability there.


Functional programming is not just about monads. I bet it’s easier to explain

    numbers.map(x => x * x)
than

    let result = [];
    for (let i = 0; i < numbers.length; ++i) {
      result.push(numbers[i] * numbers[i]);
    }


Perhaps, but this is a somewhat contrived example. No side effects, no complicated data structures, etc. Sure, in the specific case of "parallelised" operations on an array it is simple. But anything more complicated and suddenly you're in a bad situation.

For example, what if you now want to write to a log before each multiplication, or you want to time how long each multiplication takes? Suddenly, the latter seems far preferable.


    numbers.map(x => {
      console.log(x);
      return x * x;
    })
Still easier to understand than the for loop.


Okay but now this is just imperative lol, it's no longer functional at all, since your function is now just a sequence of instructions, i.e. it's a procedure rather than a pure function. You've just proved my point


It’s a mix of functional and imperative style. Most functional languages (other than Haskell) allow you to easily write code like this.


Haskell also allows you to easily write like this

    for numbers $ \x -> do
        Console.log x
        return (x * x)


Okay and now you’re using monads which I pointed out are not easy to understand


Monads are not hard though. You use them every time you put a ? on something in Rust.


I'm not sure if this is a joke but no using something that might fulfil the laws of a monad doesn't mean you need to know what a monad is generally. Many features of imperative languages can be described using category theory, but only in Haskell do you need to actually understand the generalisation. Why is that? Why can't you just "use monads and not worry about it"? Well the problem is that in Haskell there is a huge leveraging of very general functions that apply to these entities generally, for example "fmap" ">>=" ">>" "<>", etc. and those are simple examples. Therefore understanding the rules and definition of the generalisation* is required.

It's not enough to just point at an example of a monad and say "look, the example is simple, therefore the generalisation is simple". As many philosophers have pointed out, particulars are easier to point out than universals. The latter requires a lot of thought.


I meant it sincerely, not as a joke.

> As many philosophers have pointed out, particulars are easier to point out than universals. The latter requires a lot of thought.

Fair enough, that’s a good point.

You don’t have to use >>= and >> if you don’t want to though. Use the do notation instead and the code will look much more familiar. You also mention fmap; that’s `.map` in Rust, etc. in other languages.

All I’m trying to say is, there are many parallels between Haskell and other languages at the level of particulars. The concepts behind >>= and >> are not unique to it anymore. So there is hope if you want to reconstruct the universal concepts in your mind from those particulars, especially if you use fitting abstractions like the do notation.


> All I’m trying to say is, there are many parallels between Haskell and other languages at the level of particulars.

Of course there are parallels, there must be for Haskell to be able to do what imperative languages can do. However that doesn't mean the abstractions are simple. "do notation" is another complicated generalisation and there's no way any Haskell programmer it going to be able to simply pretend that their do notation is a normal imperative language.

You seem to be asserting that I'm claiming that Haskell simply can't do the things that imperative languages can do, and I'm not at all and I don't understand how you inferred that.


> You seem to be asserting that I'm claiming that Haskell simply can't do the things that imperative languages can do

No, that’s not my intention. I responded to your view that monads are hard to understand by saying that they show up in other languages too in non-trivial ways.

I also don’t mean to say that Haskell is like imperative languages. Just that monads by themselves show up in imperative languages too, so they are not as esoteric as the operator names (>>=, >>) may make them seem.

I appreciate that you followed up in good faith. The point of view that you expressed makes sense to me.


Yes I know but that's not the point of my original post, which is about people who claim that a pure functional style would be considered easier if it were taught first. I'm saying that falling back on imperative is almost always easier than sticking with pure functional


Falling back on imperative is orthogonal to what to teach first. My original argument was that, when just starting programming, it is better to go with FP, as it is easier to learn PP or OOP additionally.

While, when starting with PP or OOP, it is incredibly hard to then learn FP.


I wasn't responding to your specific point. I was addressing this general idea that the only reason people find imperative programming easier is because they were taught it first


I write this in my Haskell day job every day.


You don’t have to worry people have been saying this for more than a decade now and I see no action about it


He likes it more.


Because I know what opinion I have.


> A loop can always be expressed with tail recursion.

If I were writing the code in assembly, I would just use a loop, because that's what the machine is optimized to perform.

> But this is also a symptom of standard c's problem of not allowing one to define functions inside of functions.

There are extensions that do this; however, they cannot create a scoped closure for you so nobody cares to use them for anything.

> and these have to be coalesced into a single flag for the loop.

Or just use 'goto'.


Do you mean a conditional jump? Because assembly doesn't know about loops. And recursion is basically a jump too, even more so if you manually do the TCO. They are literally equivalent.


Depends on the assembly and depends on the loop. x86 has multiple instructions that hint at knowing about loops: e.g. the `loop` instruction that does a decrement and jump, and the rep prefixes that let you implement memcpyish functions in a single instruction.


Fair point. The higher level languages have been leaking back down into assembly.


For a very long time, even PDP7, on which C was created, with ISZ – Increment the memory operand and Skip next instruction if result is Zero.


> And recursion is basically a jump too, even more so if you manually do the TCO.

The recursion in TCO can be implemented with a jump but it requires an explicit stack frame and ultimately it must return a value through that stack frame. Basic loops have no such requirements which allows you to do things like jump from the middle of one loop into the middle of a different one.

Granted, this is rare, and rarely useful in practice, but it occasionally is and the burden of restructuring these forms into TCO would actually reduce their clarity and performance.

> They are literally equivalent.

They are /functionally/ equivalent. The literal differences are meaningful and have definite performance implications.


[flagged]


Please don't do this here.


I'm very sorry about that. haven't read anything about that in the guidelines. could you please explain why it was wrong to do that.



thank you I'll avoid doing that in the future


Why the fuck they still write papers in 2 columns? It's a pain both for humans and computers alike. Here is the link to PDF https://dl.acm.org/doi/pdf/10.1145/3652588.3663324




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: