Hacker News new | past | comments | ask | show | jobs | submit login
Fun with Go Iterators (xnacly.me)
161 points by xnacly 26 days ago | hide | past | favorite | 137 comments



I work with Go a lot at my job, and I definitely prefer functional programming so I chafe at the language a bit. I was excited to incorporate iterators into our code. Unfortunately, I'm also working in an environment where we are memory constrained, so on our hot path we need to not allocate at all. I tried playing with the iterators a bit, and couldn't get it to produce something that didn't allocate. I got close, but as much as a tried I couldn't get below 1 allocation per loop (not per loop iteration, per loop). Which in any other setting would be fine, but not for our use case.


This is probably one of the areas where Zig shines. I'm mostly a Gopher but reading Zig is just as easy while ensuring that no allocations are hidden


I've seen issues in Go codebases a couple times where a _lot_ of effort has been spend trying to track down allocations and optimize memory usage. It sounds like the parent comment is describing writing new code striving to avoid allocations, which probably isn't something that Go is that much harder for than similar languages, but I feel like it's one of the more ambiguous languages in terms of the amount of context needed to be able to identify if a given variable is allocated on the heap or not. A pointer might be a pointer to the stack, or a pointer to the heap, or a pointer to an interface that might _also_ have a heap allocation on the concrete-typed value behind that. If you see a slice, it might be a heap allocation, or it might be a reference to a static array, or it might be a reference to another slice...which has the same possibility to be either a heap allocation, a reference to a static array, or just be another link in the chain to a different slice.

This is a place where I feel like the type of simplicity that Go touts doesn't actually feel like it's optimizing for the right thing. Having a single type for all pointers certainly has a sort of abstract simplicity to it, but I feel like it doesn't actually make things simpler when using it in the long run. My perspective is that "conceptual" simplicity is a balancing act between not having too many concepts but also not having concepts being too confusing individually, and I'm surprised that Go is used for domains like needing to completely avoid allocations in a hot path when the language doesn't really feel like it's designed to make easy.


> I've seen issues in Go codebases a couple times where a _lot_ of effort has been spend trying to track down allocations and optimize memory usage.

That is clear sign that Go was not the right tool for the job


I don't disagree, but for whatever reason I've continued to see projects using Go in places where using a large amount of becomes a bottleneck. The best rationale I can come up with is that people measure whether it's a reasonable choice based on the theoretical floor of memory usage that might be possible to reach by writing perfectly optimal code rather than the amount of memory that would be expected from the code that will actually be written and then underestimate the amount of work needed to optimize the former into the latter.


there is a myriad of reasons. usually it is because the rest of the project is in Go, so adding new language just increases complexity. Go is fast to prototype as well. So in general there is nothing wrong with it but once these questions are start popping up(memory and whatnot), it is perfect time to appropriately refactor the code, likely into a different language that can better utilize the hardware for the performance needs of the application itself.


I have not looked at Go's iterator range loop implementation yet. So somebody tell me if I'm wrong here.

My guess is that Go is probably wrapping the body of the range loop into a closure, and passing that closure into the iterator function as the yield function. A break in the body of the loop becomes a "return false" in the closure.

The allocation is probably the closure environment struct (giving access to variables prior to the range loop).

This closure might escape through the iterator function so Go can't just put the environment struct onto the stack, it has to escape to the heap.

The cost is small but it's not free. Usually, not having to think about the allocation is an advantage. In the rare case can't afford the iterator, do it differently.

Go is great.


I know complaining about downvote is not the right thing, but the above is someone else's comment, not mine. Why was it downvoted (I see it grayed) ? It's an useful information without zealotry or "we-know-betterism".


Same, I work in Go most of the time in performance and memory sensitive areas and the lack of usable abstractions grates so hard. I’m increasingly fed up of long hand everywhere and the bug volume and frequency it inherently produces particularly under change. I was worried this would be the outcome of the iterator proposal and am not surprised at all to find that yes, it is in practice. I was similarly disappointed with rand/v2 where I provided feedback thah the interface in the middle of it had been seldom used and could have been eradicated and the choice was to preserve that part of the API because the compiler should in the future eradicate these costs. I see no actual plan for how that will be done though, and I’m skeptical it will happen in practice. I’ll finish with the comment that will lead to comment downvoting, but I derive tons of value for resource constrained work from rusts lazy iterator design combined with the aggressively optimizing compiler folding nice low cost of maintenance abstraction here into tight sometimes even vectorized or unrolled loops - it makes work I do demonstrably easier.


A question for both you and the parent: If you are heavily performance and memory constrained, why are you using a language that gives you relatively little control over allocations?


In my case it was a choice made long ago, when the exact consequences weren’t fully apparent. I don’t think when making those initial decisions people understood how important low memory usage would turn out to be, or that Go would be an obstacle to that. And we’ve got so much code in Go at this point it would be a huge lift to switch languages. The language does have some nice features that make things easier. It’s just in certain portions of the code, in the really hot loops.


Preexisting choice and strong holding bias in the organization


This is usually the case with C#'s equivalent as well. Enumerables and LINQ are nice options to concisely express logic, but you won't see them in hot paths.


Unfortunately, the baseline allocation cost is hard to avoid due to IEnumerable<T> being an interface which all LINQ methods return save for scalar values, and IEnumerable<T> itself returning an interface-typed IEnumerator<T>. Even with escape analysis, the iterator implementation selection logic is quite complex, which ends up being opaque to compiler so at most it can get rid of the IEnumerable<T> allocation but not the enumerator itself, and only when inlining allows so.

There are community libraries that implement similar API surface with structs that can be completely allocation-free and frequently dispatched statically.

Moreover, with the advent of `T where T : allows ref struct` you can finally write proper LINQ-like abstraction for Span<T>s, even if it's a bit less pretty. I have been playing with a small toy prototype[0] recently and it looks like this:

    // Efectively C's array constant
    var numbers = (ReadOnlySpan<int>)[1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

    var iter = numbers
        .Where((int n) => n % 2 == 0)
        .Select((int n) => n * 2);

    // Inspired by Zig :)
    using var vec = NVec<int, Global>.Collect(iter);
The argument types for lambdas need to be provided to work around C# lacking full Hindler-Milner type inference, but this iterator expression is fully statically dispatched and monomorphized save for the lambdas themselves. Luckily, JIT can profile the exact method types passed to Funcs and perform further guarded devirtualization, putting this code painfully close to the way Rust's iterators are compiled.

At the end of the day, .NET's GC implementations can sustain 4-10x allocation throughput when compared to Go one (it's not strictly better - just different tradeoffs), with further tuning options available, so one allocation here and there is not the end of the world, and not all LINQ methods allocate in the first place, and many of them allocate very little thanks to optimizations made in that area in all recent releases.

[0]: https://github.com/neon-sunset/project-anvil/blob/master/Sou...


Sounds pretty interesting. If you were to write up how you tested this, I’d definitely read it.


Not terribly difficult. The standard library's benchmarking framework records allocations in addition to latency, so you can just look at that. Fun fact, you can use this to even make unit tests that assert that certain benchmarks don't allocate, or only allocate a certain amount.


Call me crazy, but I don't like any of this. Make more named functions. Keep your logic flat and explicit. I believe go wants you to code this way as well. Imagine the horrors this kind of function chaining creates. Actually, you don't have to. It's JavaScript.


Since you asked, I’ll call you crazy.

This sort of “chaining” syntax is pretty standard in many languages (Java, JS, Elixir, etc). Especially for streams/iterators/lists. You can have pretty well named and flat logic too. I think it’s just poorly written demo code. To me, this “functional” style of chaining is great. It highlights intent when reading the chain (you read “filter” as a step instead of a for loop with a conditional inside). It’s also really easy to recompose or reorder, and the code-review diffs are super easy to reason about when that happens.

I don’t think it really generates anything conventionally called “horrors” either - you can still use named functions and everything you love, this just makes it easier to use. It may encourage more well-written code too.

Imagine a simple example - get all files in some directory, filter out non-json files, perform some name-manipulation (map) function, and then return a new list. The “old” way would require a series of for loops that make and fill slices passed to each. You then wrap that whole thing in a new method called “GetRenamedJsonFiles(path string) []File”.

With the iterator chaining, you can still wrap it in a named method, but now you can replace the repeated for loops and intermediary slices with: “return GetFiles(path).Filter(isJsonFunc).Map(updateFileNameFunc).Collect()”. It’s probably easier to read, easier to change up later if requirements change, and easier to validate intent when reviewing, etc. It even encourages smaller, dedicated, easy to update or share methods - it encourages named methods for the intermediary steps (getFiles, isJson, updateName).


You can still refactor the original function:

    func GetRenamedJsonFiles(path string) []File {
        files := GetFiles(path)
        jsonFiles := KeepJson(files)            // or `Filter(files, IsJson)`
        renamedFiles := RenameFiles(jsonFiles)  // or `Map(jsonFiles, RenameFile)`
        return renamedFiles
    }
In fact, a long chain of function calls is often hard to read and has to be splitted into several parts anyway. I can even claim that this "old" style forces you to name the outcome of each step. Also it is unclear whether `GetFiles` returns a lazy iterator or a plain slice from its name (I guess it's lazy, but only because you have said `Collect()` there).

It is not even like that "map" and "filter" don't have their places in this style. In fact function chaining is just a concise way to rephrase that! You can write in a functional style without having any function chaining, because the style is all about immutability and resulting composability. Mutability tends to not mix together---any such combination results in something more complex. As long as that can be eliminated, anything would work.


One potential downside I see with this approach is that it forces you to store and name intermediate results which may or may not be meaningful on their own.

Consider a slightly more complicated example that filters on multiple conditions, say file type, size, and last modified date. These filters could be applied sequentially, leading to names like jsonFiles, then bigJsonFiles, then recentlyModifiedBigJsonFiles. Or alternatively, names which drop that cumulative context.

Of course that can be extracted out into a standalone function that filters on all criteria at once or we can use a combinator to combine them or apply any number of other changes, but generally naming intermediate states can be challenging.


Your presented alternative is basically the same thing with named step outputs. I don’t see how chaining is something crazy or problematic in comparison. The only difference is chaining reduces syntactic “noise”. It’s easier to read.

One-time use variables that are consumed on the next line aren’t valuable. Chaining exists as a widely used pattern because it’s useful to not generate a ton of intermediary variables.

In fact, the typing point you bring up is actually less clear. In the iterator chain, the type is “iterator”, which is a standard type that can be produced from many sources as business requirements change. In your example, the types need be compared to the function parameters and methods need to be written explicitly use the same types.


> I can even claim that this "old" style forces you to name the outcome of each step.

Yeah, but that does not necessarily make something easier to understand. Which function do you think is easier to understand?

func quadraticFormula(a int, b int, c int) {

  return (-b+sqrt(b*b-4*a*c))/2*a;
}

Or naming intermediate steps (since we only get to apply one function at a time, and math operators are just functions, why should they be special)?

func quadraticFormula(a int, b int, c int) {

  minus_b := -b

  squared_b := b*b

  four_times_a := 4*a

  four_times_a_times_c := four_times_a*c

  inside_of_sqrt := squared_b - four_times_a_times_c

  sqrt_result := sqrt(inside_of_sqrt)

  numerator := minus_b + sqrt_result

  denominator := 2*a

  return numerator/denominator;
}

Sometimes naming intermediate steps makes code easier to read. Sometimes it doens't. It depends a lot if the intermediate step has a semantic meaning or not, if you want the reader to pause and do a mental checkpoint or if it makes more sense to keep going. Kind of like when you are writing English, and you decide to stop a sentence, or keep going, you don't want your sentences to be too long, but also not too large. That's why good code should follow guidelines, but not hard rules. The programmer should write it in a way that is easier for the reader to understand, and that depends on the context of what is being done.


Since you asked, I find the terse version to be completely unreadable*. If I wanted to understand it, I’d end up writing out all the steps for myself anyway.

There’s an asterisk here because you cherrypicked a function that is already widely known and understood. All we really need to understand that function is its name. If you chose complex niche business logic, or The Wave Function, it would have made for a more fair and instructive example.


When Go eventually lets all this horrible syntax sugar into the standard library, we will meet again in the Zig community.


Even JS doesn't really use functional styles that much. In fact, the whole reason for functional styles is the decomposability: any language with easy function chaining like `x |> f |> g |> h` will also have an easy extraction of any part of the chain (say, `x |> fg |> h where fg = f . g`). It is not a good idea to use chaining when the language supports no such feature, as it would be much harder to work with such chains then.


You're right, this functional style 'clever' programming is exactly what Go discourages (or did discourage... historically...) This is the exactly the kind of code I don't want to see. I want to see clear, easy to read, obvious functions that do exactly what they say on the tin.


For lots of people, unnecessary for loops makes the code less readable. When using filter and map, all the looping is abstracted out so the reader can concentrate on what's being filtered and transferred. The loops and their temporary variables just clutters up the code.


For me, the JS/LINQ style is much more readable than a bunch of for loops.


It is a sad fact that Go is becoming more and more like JavaScript (and deviating from C).


There's always C if you prefer your language to work like C.

Go is supposed to be an improvement over C for most programming tasks.


> Go is supposed to be an improvement over C for most programming tasks.

This is not true. s/Go/Zig is true. That is why I'm switching to Zig now.


.Net seems to handle it pretty well with LINQ?

https://medium.com/@sanjanasw99/an-in-depth-guide-to-linq-in...


LINQ is a very dense syntactic sugar for some selected `IEnumerable` methods. It will surely look like chaining in typical uses and thus is useful, but different from an arbitrary chaining.


I absolutely love it when we can take advantage of Go's type system and add additional traits and behaviors to existing types like this.

That said, I noticed something odd here. In order for a module like this to really shine, I think all these operations need to be functionally pure. Right now, some of these mutate the iterator's `iter` method mid-stream, which is about as side-effect-ful as you can get.

```

func (i Iterator[V]) Map(f func(V) V) Iterator[V] {

cpy := i.iter

i.iter = func(yield func(V) bool) {

  for v := range cpy {

   v = f(v)

   if !yield(v) {

    return

   }

  }

 }

 return i
} ```

Unless I'm misreading that, `i.iter` has new behavior after this call. A better way would be to return a new _iterator_ with the custom iter behavior instead.

``` func (i Iterator[V]) Map(f func(V) V) Iterator[V] {

        // create a fresh iterator around a custom closure (NewIterator() is hypothetical in this case)

 return NewIterator(func(yield func(V) bool) {

  for v := range i.iter {

   v = f(v)

   if !yield(v) {

    return

   }

  }

 })
} ```


Might be written in a hurry and maybe the author was thinking that not allocating new Iterator, which is in fact wrapper around iter.Seq, will produce less heap garbage. But I guess it's the same amount of garbage, because the size of Iterator is the same as iter.Seq which is allocated anyway, it's just different type of the object being discarded


If they allocated, people would complain about that. If they don’t, people complain about mutation. :shrug:

Personally, the article’s implementation seems fine to me. The iter is a private field of a throwaway struct created on the fly in order to support chaining. If anyone is then relying on the (private) contents of that struct, I think that’s user error. I can’t see personally why you’d do that.


This flagged for me right away too. I would be badly surprised if a Map-style chained method mutated the memory of the receiver.


Go people will do this and they'll be content:

  a := []int{1,2,3,4}
  it := slices.All(a)
  it = slices.Reverse(it)
  it = slices.Map(it)
  it = slices.Filter(it, func(i int) bool { return i % 2 == 0 })
  slices.ForEach(it, func(i int) { fmt.Println(i) })
I don't judge the Go enjoyers, but I prefer writing TypeScript to Go which says it all.

Type-inferred arrow lambda for function arguments would go such a long way in making this code nicer... And not make compilation slower at all.

  it = slices.Filter(it, i => i % 2 == 0)
  slices.ForEach(it, i => fmt.Println(i))


No, Go programmers would write

    a := []int{1, 2, 3, 4}
    out := []int{}

    for idx := range a {
        val := a[len(a)-idx-1]
        if mappedVal := Map(val); mappedVal % 2 == 0 {
            out = append(out, mappedVal)
            fmt.Println(mappedVal)
        }
    }
In modern Go I might write a reverse iterator, that index is a bit hairy, which would cut out the 'val :=' line, but as there is not yet a standard library option for that I'll leave it out.


I think the killer case for this is not hiding the computational complexity of mapping/reversing/etc.

I write a lot of typescript and golang and I often notice in typescript my chains end up iterating the same array N times when it couldbe been done in one smarter iteration.

I don't think I care though -- both styles are way more than performant enough. You can be more careful about it in cases it is not.


By that logic, calling any function with a loop or recursion should be forbidden, as it hides computational complexity at the call site. I.e. should we get rid of `slices.Sort` (etc.) because it hides computational complexity?


Iterating an array 10 times doing 1 operation on each element each time is equivalent to iterating it 1 time while doing 10 operations on each element.


Found the theoretical informatician... Nope, not in real world software engineering. Especially not in JIT runtimes like V8. Try it!

I tried it, code: https://pastebin.com/cA8YkE8R

Result:

   process1: 2s 251.327833ms
   process2: 1s 537.721625ms


Funny enough, the iterator version is much faster in rust because the compiler can more easily optimize iterator chains than custom for loops:

https://play.rust-lang.org/?version=stable&mode=release&edit...

(I'd recommend running it on your own machine as the rust playground limits memory and will likely kill this program)

Output from my machine:

    $ cargo run --release
        Finished `release` profile [optimized] target(s) in 0.05s
         Running `target/release/iterator`
    Process 1 returned 18270843109002848788 and took 64.58175ms
    Process 2 returned 18270843109002848788 and took 308.969083ms


But this is apples and oranges. Process 1 creates and calls lambdas, etc.

A proper comparison could be:

  # Case 1.
  for n in range(len(data)):
    data[n] = transform1(data[n])
    data[n] = transform2(data[n])

  # Case 2.
  for n in range(len(data)):
    data[n] = transform1(data[n])
  for n in range(len(data)):
    data[n] = transform2(data[n])


It illustrates how this kind of code is used in the wild - array functions vs a for loop. What you did is not even a fruit, if I go along with your metaphor.


It's a little bit faster (on my machine at least) if you combine the filter and map into a flatMap (it's still not as performant as the imperative solution though).

  function process3(input) {
    return input
      .flatMap((n) => (n % 2 === 0 ? n * 2 : []))
      .reduce((a, b) => a + b, 0)
  }


If it needs to be fast use oldskool for() loops.

  function processfor(input){
    let sum = 0
    for (let i = 0; i < input.length; i++){
      if (input[i] % 2 !== 0){ continue }
      sum += input[i] * 2
    }
    return sum
  }
https://jsfiddle.net/gaby_de_wilde/y7a39r15/5/


Yeah the comment I was originally responding to included a for loop (in the Pastebin link). My point is that if you're set on going down the functional route you don't need separate map and filter steps, you can just use flatMap which is effectively a filter and map combined (returning an empty array filters out the current value, since an empty array gets flattened into nothing).

Of course, if you want the most performant solution an imperative for loop is faster (which is what I said in my last comment).


input.reduce((a, b) => b % 2 !== 0 ? a : a + (b * 2), 0)

or if you want it really silly.

input.reduce((a, b) => a + (b % 2 && b * 2), 0)

dont ask me why but for(a of b) is slower than for(i=0;i<b.length;i++)


Array.prototype.reduce can basically do almost anything a for loop can do, since it gives you access to state from one iteration to the next. The only reason I didn't remove the flatMap in my original example and convert it all to reduce, is because there's no longer any method chaining which was the point of the original comparison between the Go and JS examples.

> dont ask me why but for(a of b) is slower than for(i=0;i<b.length;i++)

Probably because for of loops use the iterator protocol. So I'm assuming under the hood the JS engine is actually invoking the Symbol.iterator method (which is slower).

  #! /usr/bin/env node --experimental-strip-types
  
  function processIterator(input: number[]) {
    let sum = 0
    for (let i = input[Symbol.iterator](), r; (r = i.next()); ) {
      if (r.done) return sum
      if (r.value % 2 === 0) sum += r.value * 2
    }
  }
  
  function processfor(input: number[]) {
    let sum = 0
    for (let i = 0; i < input.length; i++) {
      const value = input[i]
      if (value % 2 === 0) sum += value * 2
    }
    return sum
  }
  
  
  const input = Array.from({ length: 1_000_000 }, (_, i) => i)
  
  console.time('normal for loop')
  console.log(processfor(input))
  console.timeEnd('normal for loop')
  
  console.time('iterator for loop')
  console.log(processIterator(input))
  console.timeEnd('iterator for loop')


In an ideal machine you are absolutely correct! But when executed by a real physical CPU, if the array doesn't fit I the cache, iteration strategies matters!


Not in this way, not significantly. At worse, for each new loop through the array you have an extra cache miss. If the array is large enough, 10 extra cache misses will barely even be measurable.

It's only for small arrays actually that the cost of the loop infra and the extra cache miss per loop matters.


This isn't true outside the theoretical best case. Definitely not true in most interpreted languages. Consider the performance difference of .reverse() (in-place or copy) vs using a descending for loop. Generally with interpreter chaining there are more "hidden" allocations/operations.


You're comparing apples and oranges. Of course modifying an array or copying it takes way more time than looping. The comparison was not about that, it's about one loop with 10 instructions VS 10 loops each with a single instruction, for the exact same instructions.


There's no way to do this (a single instruction loop vs 10 loops with a single instruction) in JS or Golang.


I don't mean assembly-level instructions, just general instructions in the language:

  for i := range arr {
    foo1(arr[i])
    ...
    foo10(arr[i])
  }
Vs

  for i := range arr {
    foo1(arr[i])
  }
  ... 
  for i := range arr {
    foo10(arr[i])
  }


Until the array no longer fits in your cache :)



Many Go programmers would claim this is cleaner and more understandable, unfortunately I'm not one of them.


The original version relied on no fewer than five different imported functions from a package that, if you're not already familiar with them, you would have to go read to see exactly what they do.

The updated version has just one such function.

A function gets less readable as you have to go context switch to more and more different functions to understand what it's doing.

Yes, the `slices` functions are named such that you can make a decent educated guess as to what they're doing, but if there's a problem you're trying to debug that doesn't save you from having to dive into each one.


> A function gets less readable as you have to go context switch to more and more different functions to understand what it's doing.

This is true.

A function also gets less readable every time you have to mentally translate from low-level implementation code, to a high-/human-level description of "what is this actually doing?" (e.g. `val := a[len(a)-idx-1]` - "ohhh, ok, that's iterating in reverse"). Extracting out common patterns to methods with well-known names like `Filter` or `Map` or `ForEach` short-circuits that repeated parsing.

> that doesn't save you from having to dive into each one.

99% of the time, it really does. If you can't trust something as basic and well-defined as `Map` to do what you expect it to do, you've chosen a poor library on which to take a dependency.


> 99% of the time, it really does. If you can't trust something as basic and well-defined as `Map` to do what you expect it to do, you've chosen a poor library on which to take a dependency.

The original comment said:

     it = slices.Map(it)
I understand Map taking an array and a function, as in the article. I don't understand Map taking simply an array.

Then I went to look it up, and turns out that slices.Map does not in fact exist. https://pkg.go.dev/slices

Personally I feel like this exchange is a vivid illustration of exactly my original point.


The facts that GoLang is missing such a fundamental tool from its standard library, and that it's easy to make typos when coding not in an IDE, hardly support your point.

In fact, the very fact that you originally thought of the correct interpretation despite the fact that it had been misrepresented is a great example of why common shorthands are useful.


I suppose there are different sorts of programmers who prefer different levels of guardrails pre-made for them.

Some people consider `Map` to be a fundamental tool. Like the original commenter mentioned, some people also prefer Typescript to Go.

If I was interviewing someone for my (primarily golang) company and they mentioned that not having `Map` to be a downside of Go that hindered its readability, that would be a strong mark against them. I would not want to try to dig through the N levels of abstraction they would create in the codebase to hide what was happening with all its bugs.

Other companies don't mind this, and use other languages, like Javascript or the latest versions of Python.

Python in particular is great for what you mention, as nearly every slightly common programming idiom has its own keyword with its own special usage, so you can always use that instead of writing the fundamentals yourself.

Personally I hate Python because I prefer not to have to learn a dozen new obscure tools and keywords with each minor language release. I dislike trying to read packages written in Python, because they use whatever 20% subset of the language the author was aware of. I like Go because I can look at the source code for nearly any function in nearly any library and understand it immediately. Nobody would import "leftpad" in Go.

Different languages for different people.


I really like how you've presented this (despite disagreeing as a point of personal preference on almost every point you make :) ). I'm baffled by the level of repetition and verbosity that Gophers seem to prefer - but I also can't deny that they've created some truly astonishing tools and projects, despite having to reimplement core concepts over and over again! Clearly, it works for them.

As you say, different languages for different people. Best of luck to you, and thank you for an insightful and civil discussion :)


The bigger problem is that it's not efficiently composable. If your caller needs to additionally filter `out`, that's another loop with a copy.


In most imperative languages, writing .Map().Map().Filter().Map() is another full copy for each call anyhow. As a sibling notes, it is possible to "fix" that, but the fix is not even remotely free. It is quite complicated to do it generically. (Although there are other benefits; the most natural approach to that largely fixes my grumblings about refactoring I mentioned in another post.)

Plus, in a for loop approach, it is not true that the caller may need another loop with a copy. They may just loop over the result and skip over the things they don't need. They only need a copy if they are going to pass that on to something else.

A drum I can not stop banging on is that you can not just take a neat technique out of one language and slam it into another without examining the end result to make sure that you haven't murdered the cost/benefits tradeoff. You can slam together all the maps and filters and reduces you want in Haskell, and applicatives and monads and all the fun, due to a combination of the laziness and various safe optimizations like loop fusion. In an eager context that lacks loop fusion, going .Map().Map().Map().Map() has radically different performance implications. For instance, "take 10 $ map f list" in Haskell will only call "f" 10 times. .Map().Take(10) in most implementations will create the full array, however large it is, and slice ten off the end after that.

In imperative languages, contrary to frequent claims from the functional programming crowd, for loops are actually often better in practice. The solution to their pathologies is to be aware of them and not do them. But it is far, far easier to get good performance out of a for loop in an imperative language than to contort one's code into a pale parody of functional programming.


> In most imperative languages, writing .Map().Map().Filter().Map() is another full copy for each call anyhow.

Which imperative language? In Java and rust, two languages I know, all these operations are lazy until the final collect. So no copy is made


That's why I specified the code. If you're writing .Map().Map().Map().Map(), you are usually getting a lot of intermediate arrays.

If you have a .Collect() call, you're in the deforestation branch. This has its own issues with stressing the inliner (turning simple, direct for loops over large collections into traversals that include several indirect method calls per item in addition to the payload is pretty easy), but that's still generally better than stressing the RAM.

Rust's map doesn't operate on arrays at all from what I can see but operates on iterators directly. This is good and generally more correct. However there's a lot of languages that don't support that. Rust is generally also going to be more reliable about compiling it all away than a lot of other languages where it will be really easy to spill over what the inliner can handle. Those long compile times in Rust do have their benefits.

There's also languages that sort of split the difference, e.g., it is not that difficult in Python to use itertools and generators to correctly write something that will not generate a lot of intermediate arrays, but it is also easy to write a series of calls and list comprehensions to write otherwise equivalent code that will create a lot of intermediate arrays.

I expect as we continue to build new languages over time they're going to all look like Rust here. It's pretty obvious that conceiving of loops as iteration over some sequence is the way to go. However, that is a result that we got to precisely because of our experiences with a lot of languages that don't support it as well, or support it inconsistently, or as is actually quite common, the language nominally supports it but the ecosystem tends to assume concrete values a lot more often than it should, and all these languages are still around.

Writing in this style correctly in imperative code is more difficult than a lot of people jumping up and down about how we should rewrite all our loops as maps and filters tend to account for. It can be done, but it's often harder than it looks, in at least one of the writing and the performance if not both, and the harder it is, the more the costs stack up on the costs side, and the harder the costs/benefits analysis becomes. And I still don't like how it refactors in most cases.


Nor in Python. Nor in C++20's std::views. The very point of iterators us to proceed by one element, avoiding intermediate copies of collections.

One case where the for-loop would be much more efficient is a simple transformation like incrementing each element of an array, or adding two arrays element-wise. A serious C compiler could unroll such a loop into several vector instructions which work on several elements at once. Maybe even LLVM can recognize something like that in Go code.


> In most imperative languages, writing .Map().Map().Filter().Map() is another full copy for each call anyhow

This is incorrect. In fact, the only such mainstream language that I can think of is JavaScript. Python, Java, C#, even C++ all have proper abstractions for lazy sequences, which would get used in such circumstances - and their whole point is that they are composable.

> Plus, in a for loop approach, it is not true that the caller may need another loop with a copy. They may just loop over the result and skip over the things they don't need. They only need a copy if they are going to pass that on to something else.

The point is that your function doesn't know what the caller needs. Yet by making it an eager loop with a copy, you are making that decision for the caller.


The proper term is deforestation https://en.wikipedia.org/wiki/Deforestation_(computer_scienc..., and I have seen Go libraries which do this


ah, that's what it's called! I did this in my compile-to-Go language, but I didn't know the name for it: https://github.com/lukechampine/ply?tab=readme-ov-file#suppo...


Go programming style is like programming in 2005 before functional programming because widespread (Google Guava, Clojure). If you showed functional programming to programmers in 2005 many of them would find it difficult to read.


They might indeed. Or

  a := []int{1,2,3,4}
  it       := slices.All(a)
  itRev    := slices.Reverse(it)
  itMap    := slices.Map(it)
  itFilter := slices.Filter(it, func(i int) bool { return i % 2 == 0 })
  slices.ForEach(it, func(i int) { fmt.Println(i) })
No fun to write. Is it better to read? Yes, if you want to know exactly what's going on. Not really, if you are skimming through a lot of code, you're used to cruising through chains of func calls, and the above feels like speed bumps.


> I don't judge the Go enjoyers, but I prefer writing TypeScript to Go which says it all.

The functional style is fine for simple filter/map algorithms, but once you get into zipping multiple iterators together, classic for-loops are often easier.

My company's hiring coding task had this "trap", at one point you needed to write an algorithm to merge consecutive "empty" cells in a data structure that is used to represent a simple table. This is dead easy with a regular for-loop, but writing it in a "functional" style could easily become extremely hairy and/or have O(N^2) complexity.


You can use loops in typescript if you need to.


Sure. I'm just saying that the functional style is not always the best.


> it = slices.Map(it)

There is no such function as "slices.Map()".

https://pkg.go.dev/slices@go1.23.2


Nit - I think you've missed the `func` argument in the `slices.Map` line.

(But - yes, bravo, correct, you're right and you should say it)


I did, but I believe some people understood what I meant anyway.


Unfortunately the chain approach breaks down when if you need to `map` to a different type, for example. Go does not allow generic typed methods.


You can actually (almost) make it work, you just need to add an extra type parameter to keep Go's silly limitations happy [0]:

  strs := []string{"abc", "defgh", "klmnopqrst"}
  ints := From[string, int](strs).
           Map(func(a string) int { return len(a) }).
           Filter(func(a int) bool { return a >= 4 })
  Iterator[int, float32](ints).
           Map(func(a int) float32 { return float32(a) }).
           Each(func(a float32) { fmt.Printf("%v\n", a) })
  //prints 5, then 10
If they allowed the postfix cast syntax to work for non-interface types too, it could have been a single chain, actually (you could do `.(Iterator[int, float32])` inline instead of needing the extra variable.

Note that the original implementation in the article modifies the collections in place, in which case this issue doesn't come up at all: you can't put strings in an array of ints. My implementation creates copies of these collections so that the concept of mapping to a new type actually makes sense.

[0] https://go.dev/play/p/ggWrokAk7nS


I never understood why that limitation exist.

Can someone explain the reason for this?




It is a consequence of (a) Go's implicit relationship between a concrete type and the interfaces it implements and (b) Go's "heterogenous" translation of generics (like C++, unlike Java). Together, this means you can't know which methods you need a priori. All proposed solutions to date essentially compromise on (a) by limiting the generic "implements" relation to things known at build time, or on (b) by making generic methods "homegenous" (aka boxed) and thus slow (see https://github.com/golang/go/issues/49085#issuecomment-23163... for an elaboration of this approach); or, they involve dynamic code generation.


It's broadly similar to the reason why Rust won't allow generic methods on traits used to construct trait objects. It seems superficially like a reasonable thing to want, but actually isn't when you consider the details. (The sister comments link to the specific reasons why in the case of Go.)


I recall reading some details around this on https://blog.merovius.de. Maybe it was this article: https://blog.merovius.de/posts/2018-06-03-why-doesnt-go-have.... Compilation speed plays a big factor when deciding which features are added to Golang and I think they'd have a hard time maintaining the current compilation speed if they remove this limitation.




Might be interesting to make a library that competes with https://github.com/samber/lo?


It's not interesting. Boring people did this a million times, the result is always the same: just use for loops.


> Tips for lazy developers > I cannot recommend it, but in case you are too lazy for repeating lo. everywhere, you can import the entire library into the namespace.

    import (
        . "github.com/samber/lo"
    )
TIL. Crazy


I just hate the fact that Go is super simple and clear but people try to make it complex with this kind of stuff. :( Makes me sad.


Back when the Go team announced generics, I had a go (pun intended) at it: https://github.com/gtramontina/go-extlib -- might resurrect it one day. `lo` is pretty comprehensive though.


Here's one: https://github.com/szmcdull/glinq It doesn't do function chaining though.


> My issue with the go way of iterators is, you can’t chain them like you would in JavaScript

You are not supposed to chain them. This addiction to try and chain everything everywhere all the time is so freaking weird and has been for a very long time.

Not only you are completely losing grasp on what is going on and write code prone to errors, but you are making it unreadable for other people that will be maintaining or just reading your code who will come long after you are gone from the company or abandon your library.

This is where Go's simplicity approach and splitting each action into its own for loop or block of code is a godsend for maintainability.


Its really not. For example, a map function tells you that there will be the exact number of outputs as inputs. A for loop doesn't have any guarantees. You have to read each line inside the loop to understand what its doing. In practice, having to be so explicit causes many more issues. I've never experienced as many mishandled errors on java projects as I have in Go.


The Reverse implementation seems off to me -- it runs through the iterator twice, once collecting into a slice, and then a second time filling the same slice in reverse. (So basically the first Collect call is only being used to find the length of the iterated sequence.) I'm not sure about Go conventions†, but I imagine it would be considered better form to only run through the iterator once, reversing the collected slice in-place via a series of swaps.

(† Are iterators even expected/required to be reusable? If they are reusable, are they expected to be stable?)


That eliminates one of the main reasons to use this approach. Function chaining as most people write is awful for performance because it involves creating a separate array for each step in the chain. Given that most programs are actually more memory-blocked than CPU blocked, this is a bad tradeoff. Composing Go iterators, and composing iterators in general, is preferable because it doesn't have to create all the intermediate arrays. A bad reverse wrecks that back up.

Still, you need the option, and while reverse is one of the more common iterators, it's still usually avoidable if you need to. But if at all possible I'd suggest a "reverse" type-specialized to slices, and as necessary and possible, type-specialized to whatever other types you are using to actually crawl a value "backwards" rather than collecting a full iterator into a slice.

(Then again, I'm not a fan of this approach in imperative languages in general, due to the generalized difficulty in refactoring code written in this style and the fact that observationally, people just don't refactor it once written and it affords a style rife with repetition. One of the most important considerations about code is how easily it can be refactored. In Haskell, this approach is excellent, precisely because it refactors very, very well in Haskell. In imperative languages, it tends not to, thus, serving as another example of why you can't just blindly bring over nice things from one language into another without verifying they haven't become sour in the process.)


Worth comparing to Swift’s LazySequenceProtocol [0]. Various Sequences have a .lazy property that returns some LazySequenceProtocol<Element>, so you can opt in to this sort of fluent-style method chaining where it makes sense.

[0] https://developer.apple.com/documentation/swift/lazysequence...


Yeah, that all makes sense!, but I think it's not relevent to the code in question, which is this:

    func (i *Iterator[V]) Reverse() *Iterator[V] {
     collect := i.Collect()
     counter := len(collect) - 1
     for e := range i.iter {
      collect[counter] = e
      counter--
     }
     return From(collect)
    }
So this code creates a slice from the iterator in the call to Collect(), and then fills the slice again in reverse by running the iterator again, which I think is wrong (or at least not-ideal).

(Your broader point about wanting to avoid creating an intermediate array at all for iterators and using type information to intelligently reverse "at the source" definitely still stands, in a broader context, though.)


Oh, yes, sorry. The level of error was more than I expected.

Edit: I should add that a "Reverse" that takes an iterator has no choice but to manifest the list and then reverse on the list. (Goodness help you if you try to reverse an infinite list.) But you can do type-dependent reverse iterators that don't have to if the data structure doesn't force it, and if you can do that you should, in general. This paragraph isn't about Go; this is the iterator protocol itself.


I guess if you're doing something like the original post's "type Iterator[V] struct" you could somehow extend it capture the extra information to make an efficient Reverse work.

(FWIW, in C#, the IEnumerable extension method Reverse() just captures the whole sequence to an array and then iterates it in reverse. And yeah, woe unto you if you try to reverse an infinite sequence.)


> Function chaining as most people write is awful for performance because it involves creating a separate array for each step in the chain

Not in some languages. Clojure and Google Guava are lazy, nothing is created, only transformed and only as needed. Swift has lazy but it doesn't default to lazy.


Why reverse the slice at all? Collect the input into a slice, then return an iterator that navigates the slice in backward order.

What this sort of thing lacks is any ability to optimize. For example, let's say the operation is Reverse().Take(20). There's no reason for the reverser to keep more than 20 elements in its buffer. But to express that, you have to make sure the iterator can be introspected and then rewritten to merge the operators and maybe unroll some of the loops to get better cache locality. This is what Haskell can achieve via Stream Fusion, which is pretty neat. But not so viable in Go.


Julia does similar optimizations with the broadcast operator. The results can be mind bogglingly fast.


Thanks, good points.


According to https://pkg.go.dev/iter#hdr-Single_Use_Iterators , most iterators are expected to be reusable, but some are single use. A function returning a single-use iterator is expected to document that with a comment.

So the Reverse implementation in this article wouldn't work with single-use iterators. The iter package isn't clear whether a function that requires a multi-use iterator needs to be documented as such.


> My issue with the go way of iterators is, you can’t chain them like you would in JavaScrip

Because it’s not JavaScript, and that is a good thing.


It always bugs me when I see that pattern in JavaScript, because each `map`, etc. call is an array allocation. Yeah, yeah, I know that the VM's memory allocators are likely optimized for fast allocation, but that doesn't make the allocation completely free.


But not because you can't easily chain functions. That's just a deficiency in the language design.


No. It’s a feature. Chaining makes some of the worst unreadable code.


I’m trying to understand whether this is intended to make Go seem bad or whether it’s just coming across that way to me.


I looked at the code and it was giving me a headache, been messing around with Go for a decade, i do not like this


Yeah, same, used Go fulltime professionally for 10 years, not into this newer stuff.


People have been trying to do this sort of thing in Go for as long as I remember; it's nothing new and has not and most likely never will gain much uptake.

The stages of learning a language are something along the lines of:

1. Force language patterns from previous language

2. Frustration

3. Write library to make it easier

4. Anger

5. Acceptance


Shameless plug. I had experimented with Go iterators a while ago and did a https://github.com/gomoni/it

It was updated to 1.23, so it is as idiomatic as I can get. And yes it has a map method between two types. Just a single simple trick used.


Nice rewrite for 1.23. Btw, just sent you a PR for a typo in the readme.


I like how the author uses a test to run arbitrary code, this is exactly how I do it too!


I just use play.go.dev


Life without REPLs.


We just released a go pkg that uses the new iter pkg. We were so excited by the interface in large part because of how simple iterators are to use.

https://github.com/picosh/pubsub/blob/main/pubsub.go#L18

We have seen in other languages like JS and python the power of iterators and we are happy to see it in Go


Since the article is making a not-so-subtle jab at python being unable to do chain operations, I'm making my annual rounds to point out that implementing simple, straightforward chain functionality in python is as simple as a two-line function definition:

  def chain( Accumulant, *Functions_list ):
      for f in Functions_list: Accumulant = f( Accumulant )
      return Accumulant
https://sr.ht/~tpapastylianou/chain-ops-python/


I think it would be more idiomatic to use statements, not expressions. That is, it’s ok to use local variables for intermediate values in a pipeline.


You often end up with a lot of extraneous variables with no useful names if you do that. A lot of the time the intermediate results in a pipeline are almost meaningless.


The main thing is not to do things that would be fine in other languages when they result in complications in the one you’re using. Some people want to write an entire library rather than writing statements. Why?

Also, local names can sometimes be useful documentation when the function names don’t really get the point across (perhaps because they’re at a different level of abstraction). Or alternatively, in Go it’s idiomatic to keep them short.


> Some people want to write an entire library rather than writing statements. Why?

Because you want to make code more readable, and getting rid of extraneous intermediate results is one way of achieving that.


This reminds me of RxJS (https://rxjs.dev/)


1. That's a good looking Hugo theme!

2. Implicitly chain everything all the time!

In Factor, you might do it as:

    reverse [ sq ] [ even? ] map-filter [ . ] each
Or with a little less optimizing:

    reverse [ sq ] map [ even? ] filter [ . ] each
The least obvious thing is that the period is the pretty-print function.


It's funny the author throws a dig at Python for its syntax that actively discourages this kind of code. Like… my guy you're not taking the hint. Python makes things it doesn't want you to do ugly as sin. It's why lambda is so awkward and clunky.


I get that, i still dont like to write f5(f4(f3(f2(f1())))) instead of writing f1().f2().f3().f4().f5()


Yes. The following is a lot more concise:

    a = [1, 2, 3, 4]
    print([v*v for v in reversed(a) if v*v % 2 == 0])


I think the above is a good idea of what's wrong with python (and Go), because in your example the list comp is evaluated in what seems to be this order:

    FOURTH-> print([THIRD-> v*v for v in FIRST-> reversed(a) if SECOND-> v*v % 2 == 0])
Which is all over the place. I'd rather see:

    a = [1, 2, 3, 4]
    a = reversed(a)
    a = [v*v for v in a]
    a = [w for w in a if a % 2 == 0]
    print(a)


This.

I often use generator expressions for the intermediate values (so I don't allocate a new list for each step), but I find this to be much more readable.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: