What you call "deduplication" is usually called memoization. https://en.wikipedi...

vidarh · on April 5, 2019

I don't think this is what they mean above.

The 'deduplication' approach is often used with side effects, and is often then described as idempotent with the implication that we're pretending that f(x) -> y with side effects is really f(x, world) -> (y, world') and use memoisation over a subset of the input to make that illusion hold by only executing the side effects once (e.g typical case would be recording the ids of already processed messages so that duplicates or retries does not get processed twice).

theoh · on April 5, 2019

I read and replied to didibus's comment pretty carefully. You are just muddying the waters.

vidarh · on April 5, 2019

The waters are muddy. This, strongly suggest that their point was what I outlined:

> Deduplication is really hard to get right. And it prevents retrying an operation. When someone says, this is safe to retry, its actually saying, you can't retry this, subsequent calls are noop.

The comment you replied to was seeking opinions on whether there is a better way to refer to the case where we an operation with potential side effects are guarded against executing twice for a given input.

This is a common use of idempotent in software development, no matter how much people with a maths background might dislike it. But it does muddy the waters, hence the question that was raised.

theoh · on April 5, 2019

Yes, I don't think we are communicating well here.

The wikipedia page for idempotence (https://en.wikipedia.org/wiki/Idempotence) is very good, however.

I was trying to respond specifically to didibus's obvious misunderstanding: they thought an increment function might be designed to be idempotent. That doesn't make any sense, whether you know the CS meaning of idempotence or not.

One significant issue here is the question of a priori vs. a posteriori properties. When thinking algebraically about software, it's often a case of defining things in an a priori way in order to get the right laws to apply to operations. Idempotence is one such law.

The other angle on this is the a posteriori angle, where you note that something behaves a certain way and use an appropriate word to describe that behaviour.

My reading of didibus's comment was that it was basically about imperative programming, and that there was some confusion around the situations in which the concept of idempotence applies (incrementing a variable isn't one of them).

I think you are trying to introduce the State monad and the technique of modelling a stateful program by chaining pure functions. That's fine, but it does introduce a level of complication that is not implied by a straightforward comparison of memoized functional vs. idempotent imperative code.

vidarh · on April 6, 2019

I was not really trying to introduce the state monad as much as trying to reference it as a way of illustrating what we're often pretending we're doing when we talk about making something idempotent, when what we're really doing is what didbus referred to as "deduplication". Something like this (untested):

    def IncrementWithMemo(world, id)
      val, memo = *world
      if memo.member?(id)
        [world, id]
      else
        [[val+1, memo.concat([id])], id]
      end
    end

The above should give the same results (barring any stupid bugs) if you pass in the same state, by making the changes it makes to the world explicitly part of both input and output.

But usually we're of course not doing that, but cheating and not actually storing the world state, because we're doing something messy and imperative with side effects, and so can't store the entire relevant part of the world.

We both agree that the way didibus described the "fixed" Increment() operation, it is still not idempotent. I gave my example to point out that the problem with their version could in theory be fixed by explicitly serialising the world state. But the point being that "fudging" that and pretending is illustrative of the abuse of the term that they were asking about.

Your comparison about a priori vs. a posteriori makes sense with the added caveat that as I understood it didibus recognised that this is not the correct of the term, and so is asking if there is a better name for this system behaviour (their version, without the explicit serialisation of world state).

I honestly don't know - I unfortunately think that it's largely too late to get people to talk about this as something other than idempotence. What is hopefully not too late is to get people to understand when they're using it as a shortcut and to understand that there is a more formal definition.

Especially because sometimes the shortcut is unnecessary and it can be helpful to find a way of restating the problem as an operation over a limited world where you can make it actually idempotent.