Hacker News new | past | comments | ask | show | jobs | submit login

As things stand now, the main reason why Python can't do it is because it could potentially break programs.



> As things stand now, the main reason why Python can't do it is because it could potentially break programs.

Since the documented behavior of Python is compatible with what is suggested as a change, any program that relies on the behavior not reflecting as described in the "change" is asking to be broken (and quite possibly already broken across different implementations -- including different versions of the same implementation.) Immutable types in python already are defined with the semantics of values, in that "for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed." [0]

[0] https://docs.python.org/2/reference/datamodel.html


So, which is the case?

(0) Python provides a compound value abstraction, but this abstraction leaks, because we can distinguish between multiple representations of the same compound value.

(1) Python doesn't provide a compound value abstraction.

With a few exceptions (e.g., garbage collection, which pretends memory is infinite, even though it's not), I tend not to consider so-called “leaky abstractions” actual abstractions, but if you know of a good reason to do otherwise, please tell me.


Python provides a value abstraction that includes (but is not distinct for) compound values.

It also provides an independent mechanism for examining physical (rather than logical) identity -- which is object identity; for all compound values (and all but a very narrow subset of simple values) there is no guarantee that the logical identity of values is equivalent to physical/object identity.

I don't see that this makes the value abstraction "leaky", though; for values, value identity is tested by equality. For objects, equality tests object equality but not object identity. Because Python is an "everything is an object" language, you can also test object identity for values, but for values in general there is no defined relationship between object identity and value identity, so this has no defined general logical (as opposed to physical) meaning for value types. (There are some values which are defined to also be singleton objects so that value and object identity are equivalent for these values; but that's a feature not of the value abstraction in Python but of the value-object mapping in Python.)

So I don't see the value abstraction itself as leaky.

OTOH, I'd probably be happier if Python had (whether it was "is" or something else) a clean logical identity test operator.


> I'd probably be happier if Python had (whether it was "is" or something else) a clean logical identity test operator.

Isn't that what == is?


> > I'd probably be happier if Python had (whether it was "is" or something else) a clean logical identity test operator.

> Isn't that what == is?

Not really.

"is" is logical identity for mutable objects (where logical and storage identity are equivalent), and "==" is logical identity for values (where that is equivalent to structural equivalence) but (only) structural equivalence for mutable objects.

What I was thinking of is an operator that would be logical identity for both values and mutable objects.


Somehow Python lets you distinguish “logically identical” things, though.


If you're interested in properties other than logical identity, yes.


By definition identity checking is the finest-grained distinction you can make between things. If two things are identical, they can't possibly be different in any way. This is what philosophers call the “indiscernibility of identicals”, and, if it sounds like a tautology, it's because it is!


> By definition identity checking is the finest-grained distinction you can make between things.

Yes, but there are different identities. "is" implements storage identity checking between python objects (a term broader than the sense in which you use "objects", which includes both what you call "objects" and representations of values).

Logical identity checking between Python values (as well as object equivalence, but not identity, checking between Python "objects" in the sense you use the term) is done by "==".

For most python value representations, the relationship between storage identity of the representation and logical identity of the value is undefined.


> which includes both what you call "objects" and representations of values

This is exactly what I'm saying is wrong: branching on the representations of values.

> For most python value representations, the relationship between storage identity of the representation and logical identity of the value is undefined.

If it's undefined, then how come I can query it?


> This is exactly what I'm saying is wrong: branching on the representations of values.

Well, yes, its wrong in that (except in the cases where value identity and representation storage identity are guaranteed to be equivalent) you generally shouldn't do it (the exception being if you are building code to do something oomphaloskeptic where the purpose of code is to answer questions about what its own implementation is doing.)

But the fact that doing that is possible in Python does not change the fact that Python does, in fact, support values (including compound values) with value-oriented semantics.

> If it's undefined, then how come I can query it?

The relation between physical identity of the storage representations and logical identity of the values they represent is undefined by the language specification.

At runtime, every representation of a value has some storage identity, and you can query the relationship between that and the storage identity of another representation of a value. But the answer you get has no guaranteed correlation to whether the values represented by those representations are the same value (which you can also query.)


> Well, yes, its wrong in that (except in the cases where value identity and representation storage identity are guaranteed to be equivalent) you generally shouldn't do it

A language's semantics doesn't tell me what I “should” do. It tells me what I can do, and what other people who write code that interacts with mine can do.

> At runtime, every representation of a value has some storage identity

Not in the semantics of the source language. Value representations are purely an implementation artifact.


> By definition identity checking is the finest-grained distinction you can make between things

By your definition, perhaps. The rest of us don't always use the word "identity" with this meaning. And since you don't own the English language, you don't get to tell the rest of us how to use words.

> If two things are identical, they can't possibly be different in any way.

Then by your definition, two Python tuples (1, 2, 3) are not identical. Which says absolutely nothing about how I can write programs using them, or what concepts of "identity" the operators in those programs can implement.

> This is what philosophers call...

We're talking about programming here, not philosophy.


> Which says absolutely nothing about how I can write programs using them,

I can write a program that treats all Python objects identically, by simply doing nothing. That doesn't make all Python objects actually identical.

> We're talking about programming here, not philosophy.

Programming is applied logic, which is a branch of philosophy.


> I can write a program that treats all Python objects identically, by simply doing nothing. That doesn't make all Python objects actually identical.

Yes, you can. Thank you for proving my point that Python programs don't have to use your definition of identity.

But that's a silly example. Here's an example that's not silly: in an earlier discussion you said dictionary keys have to be values--which would imply that dictionary key lookup must use your concept of "identity", so only two objects that are identical by your definition will behave as identical keys in a dictionary. But two distinct Python tuples (1, 2, 3), which are not identical by your definition, do behave as identical keys in a Python dictionary. In other words, Python dictionary key lookup does not use your concept of "identity".

To everyone else, this means your concept of identity is simply not relevant. To you, it appears to mean that Python is violating Western logic.

> Programming is applied logic, which is a branch of philosophy.

To you, perhaps. Not to me. And not, I suspect, to most of the other programmers in this discussion.


> But two distinct Python tuples (1, 2, 3), which are not identical by your definition, do behave as identical keys in a Python dictionary.

The numbers 2 and 4 behave identically when passed to a function that tests whether its argument is an even number. Are 2 and 4 identical now?

> To you, it appears to mean that Python is violating Western logic.

This is the third time I'm saying I never said Python is violating Western logic. It's like saying I made a machine that violates gravity - it's literally impossible! Python just doesn't have compound values.


> The numbers 2 and 4 behave identically when passed to a function that tests whether its argument is an even number. Are 2 and 4 identical now?

They are with respect to the even/odd property. So if that's the property I'm interested in, they're identical. They might not be if I'm interested in some other property.

In other words, as dragonwriter has already pointed out, there is more than one concept of identity.


> there is more than one concept of identity.

There's only one: Two entities are equal if nothing can distinguish them. This is the “identity of indiscernibles”.

In a language with abstract data types, such as Standard ML, you could define a new type whose internal representation is an integer, but which provides no operations that would distinguish between two even or two odd numbers. But Python doesn't have this.


>There's only one: Two entities are equal if nothing can distinguish them. This is the “identity of indiscernibles”.

This doesn't exist. If you can distinguish past and present, you can distinguish anything by time. An object in one moment is not identical to an object in another moment because the moment changed? No. You have to decide on what invariants you care about to have a consistent notion of identity. Yours is inadequate to do anything, as you demand the whole universe be invariant in all aspects; such a thing is trivial and vacuous. And it surely is not the structure encoded in any usage of the term "identity", as that word has non-trivial structure.


> If you can distinguish past and present, you can distinguish anything by time.

That's the thing: Objects exist in time. Values don't. Values exist in the language's semantics, which is a timeless mathematical object. Does it make sense to ask when the number 2 suddenly came into existence?

> An object in one moment is not identical to an object in another moment because the moment changed? No. You have to decide on what invariants you care about to have a consistent notion of identity.

You're confusing “identity of indiscernibles” with “indiscernibility of identicals”.

And, obviously, you can't use the temporal properties of objects to determine whether their atemporal identities are equal.

> Yours is inadequate to do anything, as you demand the whole universe be invariant in all aspects; such a thing is trivial and vacuous.

Nope. It just requires you to distinguish between things that exist in time and things that exist independently of time.


Oh right, we're talking about extradimensional computers. I didn't realize.


> There's only one

Maybe to you. Not to me. And not, I suspect, to most of the programmers in this discussion.

> Python doesn't have this.

Sure it does:

  class NerfedInteger(object):
      
      def __init__(self, i):
          self.__i = i
      
      def __repr__(self):
          return "<NerfedInteger({})>".format(self.__i)
      
      def __str__(self):
          return str(self.__i)
      
      @property
      def i(self):
          return self.__i


What you've implemented isn't an abstract type. You can inspect the internal representation anytime, distinguishing between things that were meant never to be distinguished.


This simply doesn't make sense. It violates the indiscernibility of identicals, which is one of the cornerstones of Western logic. I can accept different opinions on several matters (e.g., the extent to which using objects is a good idea), but throwing logic out of the window is just too much.


> It violates the indiscernibility of identicals, which is one of the cornerstones of Western logic.

Oh, please. Python isn't violating any principles of Western logic. It's only violating your idiosyncratic insistence that there can only be one in-memory representation of any immutable value. Python does this for some types but not for others. Yet computers manage to run Python just fine, Western logic notwithstanding.

If you want to argue that Python ought to change its implementation to guarantee that, for example, any two references to the tuple (1, 2, 3) must refer to the same in-memory representation (so the 'is' operator would always return True), because that would save memory, or make the runtime faster, or whatever, that's fine. Then we can talk about the tradeoffs involved, such as increasing complexity in the interpreter code. But trying to claim that Python is violating "one of the cornerstones of Western logic" is just too much.


> It's only violating your idiosyncratic insistence that there can only be one in-memory representation of any immutable value.

I never said this! There can be multiple representations of the same value. For example, the same ordered set may be represented as two distinct red-black trees, balanced differently.

Values are different from their memory representations. Different memory representations of the same value are okay. Branching on the difference is not. Of course, to hide representation differences from users, you need abstract data types [0], and, sadly, Python doesn't have these.

[0] https://www.cs.cmu.edu/~rwh/introsml/modules/sigstruct.htm

> If you want to argue that Python ought to change its implementation to guarantee that, for example, any two references to the tuple (1, 2, 3) must refer to the same in-memory representation (so the 'is' operator would always return True), because that would save memory, or make the runtime faster, or whatever, that's fine.

I never said this either. I said that it should be a valid optimization, not that implementations absolutely have to do it. As things stand now, it's not a valid optimization, because it would break existing programs.

In actuality, I don't care much about the optimization itself. However, it's a good benchmark for assessing whether Python has compound values. Values can be deduplicated, because they may be represented more than once. Objects can't be deduplicated, because by definition an object exists exactly once in memory.

> But trying to claim that Python is violating "one of the cornerstones of Western logic" is just too much.

It seems appropriate to me. But, to be clear, what I was claiming violates the indiscernibility of identicals is dragonwriter's explanation, not Python itself. A simple explanation that doesn't violate the indiscernibility of identicals is to admit that Python doesn't have compound values. Which takes us back to square one [1].

[1] https://news.ycombinator.com/item?id=12358968


> I never said this!

As Rhett Butler said to Scarlett O'Hara, you gave a very good imitation.

> Different memory representations of the same value are okay. Branching on the difference is not.

Branching on a difference other than a difference in logical identity, if you are intending to test for logical identity, is obviously an error. The fix for that in Python is simple: if you want to test for logical identity, use the == operator.

Branching on a difference other than a difference in logical identity, if you're interested in a difference other than a difference in logical identity, is perfectly reasonable. Maybe you've never had occasion to do that in programs you write, but others might.

> I said that it should be a valid optimization

Consider my comment suitably modified. The rest of what I said still stands: there are tradeoffs involved, and reasonable people can differ on where the tradeoff ends up. You have one opinion; the Python developers have another. That doesn't mean the Python developers are violating Western logic.

> what I was claiming violates the indiscernibility of identicals is dragonwriter's explanation, not Python itself. A simple explanation that doesn't violate the indiscernibility of identicals is to admit that Python doesn't have compound values.

And to me this is a distinction without a difference. Nothing is stopping you from defining "compound values" so that a Python tuple isn't a compound value. But nothing is stopping the rest of us from not caring about your definition. Which, as you say, takes us back to square one.

What would be helpful is if you could give some reason why your definition of compound values is so important, other than talking about identity of indiscernibles and violating Western logic. What makes a language that allows compound values, by your definition, better than a language that doesn't? And if your only answer (other than preserving Western logic) is "better runtime efficiency" (less memory, faster operations), then it seems to me you would do much better to make that argument directly, instead of cloaking it with all this talk about "compound values". But maybe that's just me.


> Branching on a difference other than a difference in logical identity, if you're interested in a difference other than a difference in logical identity, is perfectly reasonable.

It's unreasonable to have a finer-grained distinction than logical identity. If two things are the same, they are the same. If they are not, well, they are not. Again, tautologies!

> That doesn't mean the Python developers are violating Western logic.

I never said Python developers are violating Western logic. I said dragonwriter's explanation of how so-called “compound values” work in Python is inconsistent with Western logic. I have an explanation of how Python works that's consistent with it, but it requires rejecting the assumption that Python has compound values.

> Nothing is stopping you from defining "compound values" so that a Python tuple isn't a compound value.

The definition of compound value is simple: a value that you can tear apart into its constituent parts, then reassemble, getting the original value. You can't get the original tuple object by creating a new tuple object with the original tuple's components. The `is` operator will brush the difference on your face.

> What makes a language that allows compound values, by your definition, better than a language that doesn't?

The easiest way to reason about programs that's completely rigorous (i.e., not just testing on a couple sample inputs) is equational reasoning, which is basically showing that two syntactically different expressions evaluate to the same value. (For example, one expression might be an obviously correct but unacceptably inefficient program, and the other expression might be the program you actually intend to deliver.) In practice, realistic problem domain have to be modeled using compound entities (values or objects), so if you want to use equational reasoning, you need compound values.


> It's unreasonable to have a finer-grained distinction than logical identity.

Maybe to you. Not to me. And not, I suspect, to most of the other programmers in this discussion.

> The definition of compound value is simple

Your definition. Why should I care? See below.

> if you want to use equational reasoning, you need compound values.

Ok, so this would mean that, according to you, it is impossible to use equational reasoning with Python programs. Whereas, according to me, it is only impossible to do this with Python programs that use mutable objects (where "mutable" here includes tuples whose slots point to mutable objects). And of course this wouldn't just apply to Python; the general distinction here could be drawn, in principle, in any language. Or even independently of choosing a language.

So I have another tradeoff here: I can use mutable objects, which can make programs easier to write, but then I can't reason rigorously about them; or I can restrict myself to immutable objects (which means that any time I would have mutated an object writing programs the other way, I instead have to construct a new immutable object with the same logical properties that the mutated object would have had), which makes programs harder to write, but allows me to reason rigorously about them.

Can you show me any real-world examples where using the latter programming style has paid dividends?


> Ok, so this would mean that, according to you, it is impossible to use equational reasoning with Python programs.

You can use equational reasoning on the values that Python actually gives you: small numbers, special constants and object references.

> Whereas, according to me, it is only impossible to do this with Python programs that use mutable objects (where "mutable" here includes tuples whose slots point to mutable objects).

You can in principle use equational reasoning on programs that manipulate mutable objects. What you can't do is treat two distinct objects as equal. But, of course, in practice, you can do whatever you want. Whether your reasoning is sound is a-whole-nother matter.

> I instead have to construct a new immutable object with the same logical properties that the mutated object would have had), which makes programs harder to write, but allows me to reason rigorously about them.

No, this would be wrong. You can't use equational reasoning on objects themselves. You can only use equational reasoning on values. The value itself might be a program that manipulates objects, but you can't equate two distinct objects.


> You can't use equational reasoning on objects themselves.

I didn't say you could. I said you could use equational reasoning on programs that only manipulate immutable objects. "Immutable" means that the object's value can never change, so you can substitute the object's value for the object itself everywhere it appears in the program. Then you can apply equational reasoning to the values so obtained.

In the particular case I described, you would end up using equational reasoning on the value transformation implemented by the code that constructs the new immutable object. You would obtain that transformation, as above, by substituting object values for objects in the syntactic statement of the code.

> You can in principle use equational reasoning on programs that manipulate mutable objects.

How do you reason using an object's value if that value can change?


> "Immutable" means that the object's value can never change, so you can substitute the object's value for the object itself everywhere it appears in the program.

No, you can't. Immutable objects still have physical identities. If you want to do equational reasoning, you need to get a hold of the value itself.

> How do you reason using an object's value if that value can change?

Objects don't have “values”, they have “states”. And I said you can reason about programs that manipulate objects, not about the states of these objects. (Though maybe in a few special cases you can do the latter too.)


Is there some kind of reference that explains the terminology and theory you're using? Because it makes no sense to me as you're explaining it. It would be nice if such a reference also gave some real world examples where the terminology and theory you're using actually pays dividends, as I asked before.


> Is there some kind of reference that explains the terminology and theory you're using?

On values:

(0) “In call by value, the argument expression is evaluated, and the resulting value is bound to the corresponding variable in the function” (https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_va...)

(1) “Most languages use a call by value [evaluation] strategy, in which only outermost redexes are reduced and where a redex is reduced only when its right-hand side has already been reduced to a value - a term that has finished computing and cannot be reduced any further.” (Types and Programming Languages, p. 57)

For the definition of redex, see: https://en.wikipedia.org/wiki/Reduction_strategy_(code_optim...

On objects:

(0) “In computer science, an object can be a variable, a data structure, or a function or a method, and as such, is a location in memory having a value and possibly referenced by an identifier.” ( https://en.wikipedia.org/wiki/Object_(computer_science) ) In some languages, object states aren't values, though.

(1) “object: a cell (unless otherwise explicitly stated)” (p. 325), “cell: a number of contiguous memory fields forming a single logical structure” (Garbage Collection: Algorithms for Automatic Dynamic Memory Management, p. 322)

> It would be nice if such a reference also gave some real world examples where the terminology and theory you're using actually pays dividends, as I asked before.

Compiler authors take advantage of the notion of value all the time. For example, If two expressions are guaranteed to evaluate to the same value, a compiler may emit code that evaluates the expression once, then reuses the result twice.

---

Sorry, I can't reply to you directly because I'm “submitting too fast”, but:

These blog posts show how to do equational reasoning on Haskell programs. (Technically, the subset of Haskell that doesn't contain nonproductive infinite loops.)

http://www.haskellforall.com/2013/12/equational-reasoning.ht...

http://www.haskellforall.com/2014/07/equational-reasoning-at...

http://www.haskellforall.com/2013/10/manual-proofs-for-pipes...

Although Haskell is particularly well suited for using equational reasoning, it can also be used in other languages, provided your program primarily manipulates values, rather than objects.


I'm talking about a reference that explains this whole "equational reasoning" thing and the terminology and theory specific to that.


> > It's only violating your idiosyncratic insistence that there can only be one in-memory representation of any immutable value.

> I never said this!

The standards you have set require either

(1) That there is only one in-memory representation of a given value, or

(2) The language has no facilities that allow you to inquire about the in-memory representation used by a particular reference to a value.


> That there is only one in-memory representation of a given value, or

No.

> The language has no facilities that allow you to inquire about the in-memory representation used by a particular reference to a value.

s/reference to// , otherwise yes.


I would argue that, because Python requires its users to reason about storage and interpreter state, we shouldn't consider (a,b,c) and (a,b,c) identical under is unless they're aliases for the same structure in memory. I'm not sure I understand dragonwriter's idea for a third equality operator, however. Isn't == adequate for that purpose?

By the way, I'd like to thank both catnaroek and dragonwriter for an excellent discussion; this sort of thing is why I come to HN.


> we shouldn't consider (a,b,c) and (a,b,c) identical under is unless they're aliases for the same structure in memory.

Which is the entirety of my point.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: