Hacker News new | past | comments | ask | show | jobs | submit login
Clojure Don’ts: Lazy Effects (2015) (stuartsierra.com)
108 points by lsh on March 3, 2018 | hide | past | favorite | 41 comments



This isn't just a Clojure "Don't". It's a general programming "don't".

In my old team we had a semi-serious production bug result from this (it didn't actually kill anyone but stopped the system from being used) because we had an expression with a side effect inside a debug logging statement. The code worked perfectly in a debug build, but in release the DEBUG_PRINTF() equivalent was removed by the C++ preprocessor (so logically equivalent to "!debug && " in front of every line) and a crucial value didn't get updated. It took some tense minutes debugging it on site in front of the client before we figured it out... lesson learned!


It's an obvious-enough "don't" that it's one of the stated reasons for universal laziness in Haskell. Simon Peyton Jones has said that he's committed to keeping everything in GHC lazy to keep them honest about side effects. I think the idea is basically that if something has side effects, it will become painful to use immediately due to Haskell's laziness.


I’ve had a Haskell program fail in exactly the same way, though - removing a debug statement caused a program to start crashing - even though the lazy functions were completely pure.

We had a numeric counter that was enumerating the frame number of a simulation. When we removed the debug statement, we didn’t realize that no other code ever evaluated the counter - so instead of resolving each X+1 into an integer, Haskell kept it as a ever-growing nested series of unevaluated thunks: 1+1+1+1+1+1+1+1+1+1+... , and our efficient little program started filling up all its available RAM and crashing.


Just, wow. I'm curious if there are mechanisms that could guard against growing strings of unevaluated thunks. Would be an interesting GC problem. One where you free up space by forcing a maximum depth of unevaluated space. (You'd obviously have to treat any "infinitely extra" computations versus the "do this on top of that" ones. But that shouldn't be undoable, should it?)


Yeah, this is Haskell's biggest footgun by far. The language gives you tools to control evaluation order (including some neat libraries that let you parallelize things without much disruption to the logic of your program), but there are no silver bullets.

A good rule of thumb is to by default mark fields for basic types like Int, Bool, etc. as strictly evaluated, and leave larger structures (trees, lists, etc) lazy. But you still need to be careful.

The compiler trying to silently fix things is probably a bad idea. The current behavior is at least easy to understand; I'd hate to have a program that's working because the compiler could figure out that it could make something strict, and then I bump something mostly unrelatrd such that the optimizer can't be sure anymore, so I get a space leak.

Also, unintended evaluation can potentially cause high memory use as well (e.g. [1..1000000]), so the compiler also has to be careful about introducing excessive memory use.

The compiler does do some strictness analysis, but it's a hard problem.

I like the way idris does things -- strict by default, laziness controlled by the type system, and some nice support for automatic coercions.


To be clear, I was not voting for the compiler to do this, but the runtime. It should be instrumented and it would likely be a round trip process sometimes, with people tweaking how it does things.


Relevant blog post and HN comment thread about tracking space leaks in Haskell.

https://news.ycombinator.com/item?id=10263964


If no one ever references the counter, and its known to be pure, then shouldn't the compiler just remove it entirely?


Yeah, purity doesn't always make laziness easy to reason about. But it usually does.


For those not in the know, this is called a “space leak”.


Haskell generally doesn’t consider computation a side effect which is a bit tragic.


In Haskell it is still of course pure functional programming to have a lazy stream of IO actions. The types make it clear when and where the IO is happening, which makes reasoning about effectful streaming much easier.


I don't think it really makes sense to call the way side effects happen in Haskell "lazy".

The way I think about it is that it's not so much that your program executes a lazy stream of side effects, it's that your program doesn't execute side effects at all. Instead, your program returns a pure structure which describes how side effects should be executed, and the runtime executes a program based on this description. By the time the runtime is executing that program, the pure structure which describes how the side effects should be executed has already been forced, so the side effects happen eagerly within that context.

That said, I'm not much of a Haskell-er, so I'd be interested to hear from more experienced Haskell programmers on whether there's a better way to think about this.


Haskell does have lazy side effects(unsafeInterleaveIO and its derivatives), which is a major source of complaints about the haskell standard library(lazy IO). There is a huge library ecosystem(conduit, pipes, streaming, machines, what have you) to solve this problem.

For example, this code will fail because the file handle is closed prematurely by withFile as hGetContents returns immediately without actually reading the file, deferring that part until the data is forced.

    main = do
      contents <- withFile "foo.txt" ReadMode hGetContents
      putStrLn $ length contents


I'm assuming there is a more idiomatic (nay, correct?) way of doing this in Haskell?


One possible solution is to take a "beautiful fold" http://hackage.haskell.org/package/foldl-1.3.7/docs/Control-... and connect it to an effectful stream of bytes http://hackage.haskell.org/package/pipes-bytestring-2.1.6/do... inside the callback of "withFile".

Not unlike the Streams/Collectors framework in Java.


But... does that work? Doesn't that leave you open to having something evaluate the end of the "withFile" before you evaluated something in the callback?

That is, the only reason what you said is accurate, is because you said "effectful stream of bytes" meaning "eagerly effectful." Right? If you do all of that lazily, you can just as easily shoot yourself in the foot here.

(Please nobody take this as a damning criticism of Haskell. I'm not intending it that way.)


The streams/sinks work in the IO monad and that induces an ordering of the effects, yes. Within the callback, you can still do things like filtering or taking only a part of the stream.

Returning the stream itself from the "withFile" and trying to run it later still is still a bug.

I like this approach better that conventional Haskell lazy I/O, which I personally find quite confusing.


I guess it just surprises me that having to worry about the state of a stream is a thing I have to do. More, it sounds like I can't really use the typesystem to help here. At least, not from the sound of it.


In praxis a streaming library like pipes, streaming or conduit handles the file handles for you so application code is safe. Safe-ish in the case of streaming but that is a conscious tradeoff between convenience and safety.

You could argue that libraries shouldn't contain unsafe code either and go with a substructural type system, use indexed monads, fake regions or something similar. Whether the extra complexity is worth it is arguable, though.


> I don't think it really makes sense to call the way side effects happen in Haskell "lazy".

I didn't actually say this. I was suggesting a lazy chain of IO actions for such problems. For example, ListT a IO, instead of a regular lazy list with hidden side-effects. The Haskell types help make the distinction clear.

> Instead, your program returns a pure structure which describes how side effects should be executed

Yes exactly, the pure structure can still be computed lazily on demand, providing IO actions.

My favourite Hackage streaming library for Haskell is "streaming" and it can work exactly this way.


Worth noting that ListTs bind isn't associative for most base monads including IO and it is deprecated for a reason.

Also, by it's nature ListT can't stream IO unless you use lazy IO aka lazyRead aka unsafeInterleaveIO which deserves its name.


I am assuming "ListT done right" not the deprecated one (which is not even a chain of actions). Here is a definition, to be completely clear:

newtype ListT m a = ListT (m (Maybe (a, ListT m a)))

It is a nice simple example of an effectful streaming type.

> Also, by it's nature ListT can't stream IO

This is only true of the deprecated one, which is also misnamed. The community is rightly re-using the name for the behaviour shown above. See list-t on hackage, or the implementation that comes with Pipes.


How does laziness keep you honest about side effects exactly?

If anything, eager evaluation will reveal the bug faster.


Side effects aren't a bug, in most programs, but they are impure (and purity is the main goal of Haskell). So if Simon Peyton Jones or someone else introduces code that causes side effects, in an eagerly-evaluated pure language like ML, it might not be obvious, because programs that use the impure code can behave as expected. But if you introduce side effects to a lazy language like Haskell, the side effects will happen at the wrong time, or not at all, making it immediately obvious that your code has side effects, and SPJ can replace the impure code with pure code that returns a description of the side effects, which are then evaluated by the runtime, as Haskell intended side effects to be handled.


This is a sound article - Clojure's laziness is often a source of common gotchas for new users of the language.

It's often not clear what is lazy and what isn't - concat is lazy for instance. It isn't externally obvious which of the very common operations are lazy, filter being lazy has bitten me in CLJS a few times too.

Adding to the source of the confusion is that the REPL will often realize your side-effects which then won't get evaluated in your runtime environment.

One of the errors I used to make earlier on was:

    (map insert-into-db table-rows)


always mapv when doing I/O on the backend and always extra vec when returning sequences in the frontend


run! and doseq are for IO on each element of a sequence. Functions like mapv are intended for data transformation, not side-effects. This is why run! and doseq return nil


I don't often find myself doing sequential side effects so often anymore, but when I do doseq has been my go to, though I will confess I didn't know about run! until this thread.


This is part of the reason for monads: they specify the order of operation, which is necessary in a lazy language like Haskell. The `bind` or `flatMap` operation in a monad specifies "what happens next". Once you have defined order of operations you can start to reason about effects.


It's worth noting that while monads themselves naively introduce a data dependency, that doesn't necessarily force evaluation order. If the compiler is smart enough and sees something like:

    foo >>= \_ -> bar
It is well within its rights to evaluate bar and then foo, or do both in parallel, or not evaluate foo at all, as long as it can guarantee that the resulting value is the same. The big thing it needs to be careful of is not to introduce nontermination.

What makes this do the right thing for effects is what values of type IO actually are, and what bind means specifically for IO. It's helpful to think of an IO value as code in another (imperative) language. Bind takes two fragments of code in that other language, and stitches them together into a script that executes one after the other.

The key thing is that the order in which you compute parts of the script is entirely orthogonal to the order the commands appear in the script.

In Haskell, evaluation does not cause side effects, period. `main` is a value of type IO, which is executed when the program is run. The effects of that execution are independent of how the value `main` is computed.

Obviously, the computation itself takes time and space though.


Something that is specially dangerous is mixing laziness and dynamic vars. If you bind a dynamic var to then return a lazy seq, and the dynamic var is used to generate the seq's elements (e.g. `binding` over `map`, and use the bound var inside the map's `fn`), you may get different results depending on when the seq is evaluated (which is not determined)

This made me waste a whole afternoon once. I was even ready to submit a compiler bug!


> In my opinion, the presence of doall, dorun, or even “unchunk” is almost always a sign that something never should have been a lazy sequence in the first place.

I’d say it’s a good rule of thumb, but it’s sometimes justified. For example, `line-seq` returns a lazy seq of lines read from a given Reader; the appeal being you can process them one by one, without keeping them in memory all at once. But if you just want them all, you wrap the `line-seq` in a `doall` in a `with-open`.

My scraping library, Skyscraper [1], has a similar justification for laziness around side-effects: scraping a site returns a lazy sequence of elements, each corresponding to one page. It's terrifically useful to have that sequence be lazy, and there's unchunking code on Skyscraper to enforce full laziness. Incidentally, I'm rewriting it to be based on core.async, but it has a less functional feel to it.

[1]: https://github.com/nathell/skyscraper


Does mapv not do exactly what the author needs? Ie an unlazy map? https://clojuredocs.org/clojure.core/mapv


It can, but using it that way is considered non-idiomatic in Clojure. The intent of using variants of map is to produce a collection with each element transformed. Someone who reads the code might reasonably expect not to find side effects inside any variety of map.

Clojure provides doseq and run! for side effects on collections, and both return nil. One might get the impression that these design choices are intended to discourage the programmer from complecting transformations of sequences with performing side effects on their elements.

Most of the time, you can replace one with another, such as pmap (lazy, parallel) and have the program behave the same way. Using mapv for side effects breaks this assumption.


This problem is magnified in ClojureScript doing React.js rendering. React.js renders breadth-first in stratas (compared to a call stack which is depth first). Everyone gets bitten by it once.


(2015)


That is valid clojure too ;)


No way, a number does not implement IFn :-)


Curious... are numbers open to extension ? so I can (1024 2) => 10


Updated. Thanks!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: