I've been struggling lately with distinguishing between things that are inherently idempotent and things that are made to be using deduplication.
For example, assignment is inherently idempotent. Assign the same thing twice, and the world is still in the same state as before. Calling assign over and over with the same arguments thus results in the same outcome. For example:
a = 10;
a = 10;
But, incrementing is not:
a++;
a++;
That said, you can dedupe the incrementing operation:
increment(a, id1);
increment(a, id1);
Where increment keeps track of if it has already been called for a and id1, and cached the result. Thus subsequent calls dedupe on the cache and return the same result.
In my opinion, these are different. While increment is made idempotent, it requires a deduplication strategy to achieve it. It is not an inherent property of incrementing.
Deduplication is really hard to get right. And it prevents retrying an operation. When someone says, this is safe to retry, its actually saying, you can't retry this, subsequent calls are noop.
Is there a better vocabulary for these two things? I don't like how idempotency is ambiguous and refers to both. I'd like to be able to know if something is idempotent by its design, or if it is through the use of deduplication.
The problem is the increment operator now gives two possible answers to a++. This can't be done without relying on some notion of state. That state has to be stored somewhere.
The distinction you're looking for is "does this operator store the information required for idempotence in the output value, or is it stored elsewhere?" If it's stored in the output value, good. If it's stored elsewhere the operator is not a pure function.
In which case your complain doesn't actually have anything to do with idempotence. Your complaint is that the function which is claiming to be idempotent is not even a function.
You can't claim your code has the property f(f(x)) = f(x), if your code doesn't even have the property f(x) = f(x).
I'm not sure operators are a good example, they can be idempotent or not depending on the implementation. The operator is a function that takes two values and return a result: result = op(left, right).
Is assignment even truly idempotent? Maybe for mathy systems, but not necessarily for programming languages. It would depend on the implementation of the language, and could, for example, introduce global side effects. It kind of depends on what you consider the "value" of the operation. The return value? The global state?
Even a classic GET request on a REST api, is only idempotent if you ignore the fact that the data changes over time.
I guess what I'm saying is, outside of formal systems, I think it's all deduplication.
Assuming you don't count side effects like running out of memory, assignment is definitely idempotent. It is, of course, not immutable. Now, it is completely possible to implement a non-idempotent assignment if you want. For example, if you wrote an assignment function that returned true if the value changed when you assigned it, or false if it didn't, then that would not be idempotent.
And, of course, you could be nit-picky about side effects. For example, a language might implement assignment such that the first time you assign a value, it allocates the memory and the second time it simply overwrites the memory. That's obviously not idempotent. However, you could make similar arguments for a lot of compiler optimisations: they can take an idempotent function and make it not idempotent when viewed from outside the context of the source code.
While there is some value in considering those issues, it's a bit self-defeating as well. You take a useful abstraction and throw it away. I think it makes sense to have conversations like: is this idempotent from the context of the source code? Running out of memory when you assign to a variable that's allocated on the stack is not something anyone normally prepares for (and if you do, then God help you ;-) ). The concept still provides value to the conversation, which is the reason for having words in the first place. You may prefer a different word, but if it's going to be used in every single case I would question the utility of the redefinition.
Assignment is not idempotent if you include ordering in the system.
These two operations have different results:
1. A=5
2. A=6
vs
1. A=6
2. A=5
In such scenarios, whenever an operation is performed, the assignment operator is no longer binary. You also need to include the ordering information (e.g. randomly generated) in the transaction, such as:
ASSIGN(0, A, 6)
ASSIGN(1, A, 5)
Then operations are performed in the order depending on the first value in the tuple.
As you can see, idempotency of assignment is only true if it's ordering value is the same.
It only works for pure functions (functions which have no side effects). It is unaffected by concurrency for that reason. Idempotency doesn't come into the picture because the state of the world doesn't affect the return value.
Assignment is an operation with a side effect: it changes the state of the world (as well as possibly returning a value, depending on the language). So assignment of the same value more than once is idempotent only in single-threaded code. If there's more than one thread in play, you might find that assigning twice has a different result that assigning once.
So, the rule is that whenever the return value of a function can be cached (memoization or what you are calling deduplication) then it is implicit that the function has no side effects. There is no overlap between that category of functions and idempotent operations with side effects.
NB there is another meaning of idempotence which relates to function composition, e.g. abs(abs(x)) = abs(x). You can probably see that in that case idempotence refers to a single expression rather than the state of the whole program.
The 'deduplication' approach is often used with side effects, and is often then described as idempotent with the implication that we're pretending that f(x) -> y with side effects is really f(x, world) -> (y, world') and use memoisation over a subset of the input to make that illusion hold by only executing the side effects once (e.g typical case would be recording the ids of already processed messages so that duplicates or retries does not get processed twice).
The waters are muddy. This, strongly suggest that their point was what I outlined:
> Deduplication is really hard to get right. And it prevents retrying an operation. When someone says, this is safe to retry, its actually saying, you can't retry this, subsequent calls are noop.
The comment you replied to was seeking opinions on whether there is a better way to refer to the case where we an operation with potential side effects are guarded against executing twice for a given input.
This is a common use of idempotent in software development, no matter how much people with a maths background might dislike it. But it does muddy the waters, hence the question that was raised.
I was trying to respond specifically to didibus's obvious misunderstanding: they thought an increment function might be designed to be idempotent. That doesn't make any sense, whether you know the CS meaning of idempotence or not.
One significant issue here is the question of a priori vs. a posteriori properties. When thinking algebraically about software, it's often a case of defining things in an a priori way in order to get the right laws to apply to operations. Idempotence is one such law.
The other angle on this is the a posteriori angle, where you note that something behaves a certain way and use an appropriate word to describe that behaviour.
My reading of didibus's comment was that it was basically about imperative programming, and that there was some confusion around the situations in which the concept of idempotence applies (incrementing a variable isn't one of them).
I think you are trying to introduce the State monad and the technique of modelling a stateful program by chaining pure functions. That's fine, but it does introduce a level of complication that is not implied by a straightforward comparison of memoized functional vs. idempotent imperative code.
I was not really trying to introduce the state monad as much as trying to reference it as a way of illustrating what we're often pretending we're doing when we talk about making something idempotent, when what we're really doing is what didbus referred to as "deduplication". Something like this (untested):
def IncrementWithMemo(world, id)
val, memo = *world
if memo.member?(id)
[world, id]
else
[[val+1, memo.concat([id])], id]
end
end
The above should give the same results (barring any stupid bugs) if you pass in the same state, by making the changes it makes to the world explicitly part of both input and output.
But usually we're of course not doing that, but cheating and not actually storing the world state, because we're doing something messy and imperative with side effects, and so can't store the entire relevant part of the world.
We both agree that the way didibus described the "fixed" Increment() operation, it is still not idempotent. I gave my example to point out that the problem with their version could in theory be fixed by explicitly serialising the world state. But the point being that "fudging" that and pretending is illustrative of the abuse of the term that they were asking about.
Your comparison about a priori vs. a posteriori makes sense with the added caveat that as I understood it didibus recognised that this is not the correct of the term, and so is asking if there is a better name for this system behaviour (their version, without the explicit serialisation of world state).
I honestly don't know - I unfortunately think that it's largely too late to get people to talk about this as something other than idempotence. What is hopefully not too late is to get people to understand when they're using it as a shortcut and to understand that there is a more formal definition.
Especially because sometimes the shortcut is unnecessary and it can be helpful to find a way of restating the problem as an operation over a limited world where you can make it actually idempotent.
What behavior can’t be deduplicated, if deduplication is defined as: first call executes, subsequent calls noop?
I don’t see why idempotent functions couldn’t also be deduplicated
In which case, we’re really just back to “idempotent” and “not idempotent” (knowing that you can turn any operation idempotent, by using this one trick..!)
Could it be that either deduplication or idempotency are a sub category of the other? Or perhaps describe different aspects of the operation?
Like, you could argue that deduplication is a form of idempotency. Or that deduplication is an effect of idempotency.
I've always thought that an idempotent operation could actually continue an operation where it left off, rather than repeating the whole operation. Deduplication is a side effect of that process.
Is reentrant the right word? Like, you can repeat or continue this operation. This may mean no-op calls, or may mean it repeats everything but still achieves deduplication.
For example, assignment is inherently idempotent. Assign the same thing twice, and the world is still in the same state as before. Calling assign over and over with the same arguments thus results in the same outcome. For example:
a = 10; a = 10;
But, incrementing is not:
a++; a++;
That said, you can dedupe the incrementing operation:
increment(a, id1); increment(a, id1);
Where increment keeps track of if it has already been called for a and id1, and cached the result. Thus subsequent calls dedupe on the cache and return the same result.
In my opinion, these are different. While increment is made idempotent, it requires a deduplication strategy to achieve it. It is not an inherent property of incrementing.
Deduplication is really hard to get right. And it prevents retrying an operation. When someone says, this is safe to retry, its actually saying, you can't retry this, subsequent calls are noop.
Is there a better vocabulary for these two things? I don't like how idempotency is ambiguous and refers to both. I'd like to be able to know if something is idempotent by its design, or if it is through the use of deduplication.