Hacker News new | past | comments | ask | show | jobs | submit login
What Is Idempotence? (lispcast.com)
107 points by tdurden on April 4, 2019 | hide | past | favorite | 51 comments



I've been struggling lately with distinguishing between things that are inherently idempotent and things that are made to be using deduplication.

For example, assignment is inherently idempotent. Assign the same thing twice, and the world is still in the same state as before. Calling assign over and over with the same arguments thus results in the same outcome. For example:

a = 10; a = 10;

But, incrementing is not:

a++; a++;

That said, you can dedupe the incrementing operation:

increment(a, id1); increment(a, id1);

Where increment keeps track of if it has already been called for a and id1, and cached the result. Thus subsequent calls dedupe on the cache and return the same result.

In my opinion, these are different. While increment is made idempotent, it requires a deduplication strategy to achieve it. It is not an inherent property of incrementing.

Deduplication is really hard to get right. And it prevents retrying an operation. When someone says, this is safe to retry, its actually saying, you can't retry this, subsequent calls are noop.

Is there a better vocabulary for these two things? I don't like how idempotency is ambiguous and refers to both. I'd like to be able to know if something is idempotent by its design, or if it is through the use of deduplication.


The problem is the increment operator now gives two possible answers to a++. This can't be done without relying on some notion of state. That state has to be stored somewhere.

The distinction you're looking for is "does this operator store the information required for idempotence in the output value, or is it stored elsewhere?" If it's stored in the output value, good. If it's stored elsewhere the operator is not a pure function.

In which case your complain doesn't actually have anything to do with idempotence. Your complaint is that the function which is claiming to be idempotent is not even a function.

You can't claim your code has the property f(f(x)) = f(x), if your code doesn't even have the property f(x) = f(x).


I'm not sure operators are a good example, they can be idempotent or not depending on the implementation. The operator is a function that takes two values and return a result: result = op(left, right).


Is assignment even truly idempotent? Maybe for mathy systems, but not necessarily for programming languages. It would depend on the implementation of the language, and could, for example, introduce global side effects. It kind of depends on what you consider the "value" of the operation. The return value? The global state?

Even a classic GET request on a REST api, is only idempotent if you ignore the fact that the data changes over time.

I guess what I'm saying is, outside of formal systems, I think it's all deduplication.


Assuming you don't count side effects like running out of memory, assignment is definitely idempotent. It is, of course, not immutable. Now, it is completely possible to implement a non-idempotent assignment if you want. For example, if you wrote an assignment function that returned true if the value changed when you assigned it, or false if it didn't, then that would not be idempotent.

And, of course, you could be nit-picky about side effects. For example, a language might implement assignment such that the first time you assign a value, it allocates the memory and the second time it simply overwrites the memory. That's obviously not idempotent. However, you could make similar arguments for a lot of compiler optimisations: they can take an idempotent function and make it not idempotent when viewed from outside the context of the source code.

While there is some value in considering those issues, it's a bit self-defeating as well. You take a useful abstraction and throw it away. I think it makes sense to have conversations like: is this idempotent from the context of the source code? Running out of memory when you assign to a variable that's allocated on the stack is not something anyone normally prepares for (and if you do, then God help you ;-) ). The concept still provides value to the conversation, which is the reason for having words in the first place. You may prefer a different word, but if it's going to be used in every single case I would question the utility of the redefinition.


Assignment is not idempotent if you include ordering in the system.

These two operations have different results:

1. A=5 2. A=6

vs

1. A=6 2. A=5

In such scenarios, whenever an operation is performed, the assignment operator is no longer binary. You also need to include the ordering information (e.g. randomly generated) in the transaction, such as:

ASSIGN(0, A, 6) ASSIGN(1, A, 5)

Then operations are performed in the order depending on the first value in the tuple.

As you can see, idempotency of assignment is only true if it's ordering value is the same.


GET is only idempotent in terms of the effect of the client calling the server.

The server is free to change the state of a resource at any time without informing the client.

What a compliant server cannot do is change the state of a resource as a result of receiving the GET request from the client.


What you call "deduplication" is usually called memoization.

https://en.wikipedia.org/wiki/Memoization

It only works for pure functions (functions which have no side effects). It is unaffected by concurrency for that reason. Idempotency doesn't come into the picture because the state of the world doesn't affect the return value.

Assignment is an operation with a side effect: it changes the state of the world (as well as possibly returning a value, depending on the language). So assignment of the same value more than once is idempotent only in single-threaded code. If there's more than one thread in play, you might find that assigning twice has a different result that assigning once.

So, the rule is that whenever the return value of a function can be cached (memoization or what you are calling deduplication) then it is implicit that the function has no side effects. There is no overlap between that category of functions and idempotent operations with side effects.

NB there is another meaning of idempotence which relates to function composition, e.g. abs(abs(x)) = abs(x). You can probably see that in that case idempotence refers to a single expression rather than the state of the whole program.


I don't think this is what they mean above.

The 'deduplication' approach is often used with side effects, and is often then described as idempotent with the implication that we're pretending that f(x) -> y with side effects is really f(x, world) -> (y, world') and use memoisation over a subset of the input to make that illusion hold by only executing the side effects once (e.g typical case would be recording the ids of already processed messages so that duplicates or retries does not get processed twice).


I read and replied to didibus's comment pretty carefully. You are just muddying the waters.


The waters are muddy. This, strongly suggest that their point was what I outlined:

> Deduplication is really hard to get right. And it prevents retrying an operation. When someone says, this is safe to retry, its actually saying, you can't retry this, subsequent calls are noop.

The comment you replied to was seeking opinions on whether there is a better way to refer to the case where we an operation with potential side effects are guarded against executing twice for a given input.

This is a common use of idempotent in software development, no matter how much people with a maths background might dislike it. But it does muddy the waters, hence the question that was raised.


Yes, I don't think we are communicating well here.

The wikipedia page for idempotence (https://en.wikipedia.org/wiki/Idempotence) is very good, however.

I was trying to respond specifically to didibus's obvious misunderstanding: they thought an increment function might be designed to be idempotent. That doesn't make any sense, whether you know the CS meaning of idempotence or not.

One significant issue here is the question of a priori vs. a posteriori properties. When thinking algebraically about software, it's often a case of defining things in an a priori way in order to get the right laws to apply to operations. Idempotence is one such law.

The other angle on this is the a posteriori angle, where you note that something behaves a certain way and use an appropriate word to describe that behaviour.

My reading of didibus's comment was that it was basically about imperative programming, and that there was some confusion around the situations in which the concept of idempotence applies (incrementing a variable isn't one of them).

I think you are trying to introduce the State monad and the technique of modelling a stateful program by chaining pure functions. That's fine, but it does introduce a level of complication that is not implied by a straightforward comparison of memoized functional vs. idempotent imperative code.


I was not really trying to introduce the state monad as much as trying to reference it as a way of illustrating what we're often pretending we're doing when we talk about making something idempotent, when what we're really doing is what didbus referred to as "deduplication". Something like this (untested):

    def IncrementWithMemo(world, id)
      val, memo = *world
      if memo.member?(id)
        [world, id]
      else
        [[val+1, memo.concat([id])], id]
      end
    end
The above should give the same results (barring any stupid bugs) if you pass in the same state, by making the changes it makes to the world explicitly part of both input and output.

But usually we're of course not doing that, but cheating and not actually storing the world state, because we're doing something messy and imperative with side effects, and so can't store the entire relevant part of the world.

We both agree that the way didibus described the "fixed" Increment() operation, it is still not idempotent. I gave my example to point out that the problem with their version could in theory be fixed by explicitly serialising the world state. But the point being that "fudging" that and pretending is illustrative of the abuse of the term that they were asking about.

Your comparison about a priori vs. a posteriori makes sense with the added caveat that as I understood it didibus recognised that this is not the correct of the term, and so is asking if there is a better name for this system behaviour (their version, without the explicit serialisation of world state).

I honestly don't know - I unfortunately think that it's largely too late to get people to talk about this as something other than idempotence. What is hopefully not too late is to get people to understand when they're using it as a shortcut and to understand that there is a more formal definition.

Especially because sometimes the shortcut is unnecessary and it can be helpful to find a way of restating the problem as an operation over a limited world where you can make it actually idempotent.


You're assuming there's no other thread modifying `a` between the assignments. That would make both examples non-idempotent.


Idempotent means "aa = a." "For all b, aba = ba" is a different property.


What behavior can’t be deduplicated, if deduplication is defined as: first call executes, subsequent calls noop?

I don’t see why idempotent functions couldn’t also be deduplicated

In which case, we’re really just back to “idempotent” and “not idempotent” (knowing that you can turn any operation idempotent, by using this one trick..!)


I'm just spitballing here.

Could it be that either deduplication or idempotency are a sub category of the other? Or perhaps describe different aspects of the operation?

Like, you could argue that deduplication is a form of idempotency. Or that deduplication is an effect of idempotency.

I've always thought that an idempotent operation could actually continue an operation where it left off, rather than repeating the whole operation. Deduplication is a side effect of that process.

Is reentrant the right word? Like, you can repeat or continue this operation. This may mean no-op calls, or may mean it repeats everything but still achieves deduplication.


Assign is only idempotent if you're writing out a constant. Consider:

a = !a;


Which makes it a different function `assign_invert(x) = assign(invert(x))`


At 13:45 in this interview from 2003, Sergey Brin and Larry Page try to explain idempotence to Terry Gross (NPR's Fresh Air host):

http://www.npr.org/2003/10/14/167643282/google-founders-larr...

(It does not go very well.)

Edit: I can't find a transcript, so I just transcribed that portion of the interview myself, starting at 13:45:

TG: Now I'll tell you, in preparing for this, I decided, let me Google Google, so I typed in "Google" into the Google search, and I came up with a lot of Google things in the regular search, but in the "Are you feeling lucky?" search, I got nothing.

LP: Well you just got Google itself.

TG: Yeah, I just got Google itself. Oh, I see, Google was giving me itself.

LP: Yeah.

TG: Oh.

LP: In computer science, we call that recursion. [laugh].

TG: Oh, you even have a name for it. [laugh]. I didn't quite get that. I kept thinking it was just repeating itself. I didn't realize it was giving me itself. [laugh].

LP: [laugh]

TG: And what's the name for it?

LP: Uh, recursion. It's... kind of... Sergey is giving me a dirty look.

TG: Why?

LP: It's a loose definition. [laugh]

TG: Lighten up Sergey. [laugh]

LP: It's a loose interpretation of... [laugh]... recursion.

TG: Sergey, what's the more literal interpretation?

SB: The technical term is you got itself back.

TG: Right?

SB: There's not really much beyond that. [laugh]

TG: Okay.

SB: Idempotence. How about that.

TG: Say it again.

SB: Idempotence.

TG: What is it?

SB: That's when you uh... [laugh]... Maybe I should stop while I'm ahead...

TG: ...You're just making this up, aren't you...

SB: ...Before I dig a deeper hole. Idempotence is when you do something and you get the original thing back.

TG: Oh, so that's a real word?

LP: It's a mathematical term.

SB: Yeah, yeah, but it's also just as loose an interpretation as Larry's was of recursion.


I remember listening to that interview way back when and feeling so uncomfortable that I almost turned off the radio. But now, reading that transcript, I just find it satisfying that they said the words recursion and idempotence on Fresh Air. With Terry Gross!


Classic illustration of why conversations between geeky technology people and "normal" people usually and in awkward silence after about two minutes.


I have totally given up on even trying to explain what I do at work. It never leads to anything.


At the same time, I used to do speed dating and too often had to hear the poor guy next to me confuse "what do you do?" with "what exactly do you do day to day in as painstakingly technical detail as possible?" when the other party doesn't even know if they're an astronaut vs full-time dog walker.


Isn’t it a fix point rather?


By the way, one way to describe idempotence is a function whose range is the fixed points of the function.

Fixed point is very close. Your idea here is that google_search_lucky(search_string), where search_string="google" returns itself. Which would be a fixed point.

The rub is that the function google_search_lucky(search_string) would in all likelihood return the URL of a website if it was to return a string (necessary for even the possibility of a fix point), so the proper fix point would be "https://www.google.com".

If I was trying to map this to mathematical ideas, I'd say the point here is that the function google_search(search_string, is_lucky) with the argument ("google", is_lucky= true) returns the _function_ google_search(search_string, is_lucky) . So it's more like the argument ("google", is_lucky= true) is an identity for the function google_search(search_string, is_lucky).


Yeah! It's a fixed point.


Very cringy. :)


Lately I've been trying hard to not just get idempotence but to get a full CRDT. While established CRDTs like an OR-set are great, simply framing a problem in terms of a max value has been really useful. Also useful are fuse values, where you record the first non zero value. Getting commutativity in addition to idempotence is great.

Examples where this is useful tend to be pretty obvious in hindsight (things like accepting partially reported data, or recording a high water mark), and most people have built systems using these properties but taking a more formal approach has made it easier to identify and confirm that the order of operations doesn't need to be strict and at least once delivery is fine.


Actually idempotent comes from algebra, in particular from ring theory. An ideal is idempotent if its generator, say it a, multiplied by itself n times, equals the ring identity.

See also https://en.wikipedia.org/wiki/Idempotent_(ring_theory)

The definition of idempotent used in IT, assumes that n=2.

Another concept stolen from Mathematics and misused in IT is Topology.


> Actually idempotent comes from algebra, in particular from ring theory. An ideal is idempotent if its generator, say it a, multiplied by itself n times, equals the ring identity.

The Wikipedia article you linked to defined it for ring elements rather than for ideals of rings and for n=2. I agree that it is sometimes useful to define it more generally where it is in respect to some integer n, analogous to nilpotence.

You may have encountered this notion first in ring theory which may have led to this impression, but it is by no means an idea originating from this specific context.

For example, another comment points out that in linear algebra, a projection operator is said to have the idempotence property.


I wrote a blog with this exact title from the standpoint of a SysAdmin/DevOps Engineer: https://www.sendthemtomir.com/blog/what-is-idempotence


An analogy from linear algebra might help with the intuition: a projection is idempotent. Projecting a projection still gives you the same output.


I would have thought that pressing an elevator call button is non-idempotent, since it affected something on the backend. It changed state. An elevator button is a POST. I thought POSTs like that that perform an action are non-idempotent.

Where as asking for the current floor is idempotent (i.e. a GET). Do I have this all wrong?


I think you have it wrong. As far as I understand, an operation is idempotent if you can perform it any number of times consecutively (without another operation taking place), and have the resulting state be the same. So for the elevator, you can press it 1 time, or 100 times, and it will have the same result of calling the elevator to your floor, so long as no operation (ex. The elevator arriving at your floor) takes place in between button presses.


I don't agree. Idempotency in terms of REST (or equivalent) is about the effect on the receiver of the call. Pressing the button each time is definitely not idempotent, because it is a change of state of the button.

The fact that the server may choose to ignore this change in terms of the travel of the elevator is not relevant to the change of state of the button transmitted by it being pressed.


So what is an example of a call that is idempotent? Are only requests that don't modify state considered idempotent, by your definition?


...which is weird because in the elevator example, the state isn't being modified. The effect on the receiver, to use their terminology, is always to set the button to "on" or "called" or whatever you want to name it. Hence, the elevator button example is idempotent.

Quick edit: Someone else below makes a good point - in at least one part of the world, a second button press does act as "cancel". Maybe that's the confusion here?


The idempotency in REST is about the State Transfer, not the state itself.

The POST is not idempotent because it transferred a change of state (of the button) between the client and the server. REST is about State Transfer between a client and server.

POST/DELETE send a change of state of a resource.

GET/HEAD/OPTIONS do not.

PUT and PATCH are idempotent iff the server does not change the state of the resource except as the result of the PUT or PATCH.

It is not about the state of a resource on either the client or server, both are free to change that state independently.

Doing a GET on the current state of the button is idempotent because the client doesn't change the state of the button.

But there's nothing to stop the server (ie the elevator electronics) to change the state of the button (show it as "pressed") if something else changes the state (eg, someone in the elevator presses the button for that floor, the button might display as "pressed" on the floor itself).


I think that DELETE is also idempotent. When you call DELETE you are deleting a specific resource. The result is the same if you call DELETE more than once, the resource no longer exists.


There's a similar property of interest - order independence. That is, doing A, then B has the same final state as doing B, then A. Does that have a cool name?


That's the commutative property. It is really nice to have operations that are commutative monoids, where you can re-order the operands, and combine them in any order, and get the same result. Then if you want to perform that operation (ex. Addition) over a set of operands (ex. Integers), you can do it in parallel, where each process can take any set of intermediate results, and perform the operation on them.


The standard term seems to be ‘permutable’, for A(B(x)) = B(A(x)).

One can also say these functions commute with respect to composition. This is slightly different from the classical notion of commutativity, since the operator (composition) isn't generally commutative.


Ah. That's the right term. Thanks.

This is an important property for database updates.



The elevator analogy doesn't work in Vietnam, pressing the button again sometimes cancels the elevator.


After reading this I realized I’ve been conflating indempotence with purity.


Referential integrity is the ~synonym for purity that is often used.


Referential transparency ;-)


Ha. You're right. Man there are too many terms to keep straight. "Replacing a function call with the result of that call does not change the program" is the one I meant.


It's impotence for people living in Deutschland




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: