Hacker News new | past | comments | ask | show | jobs | submit login

Nearly every expression in high-level languages relies on multiple hash lookups. This is part of the reason these languages are regarded as "slow." I suppose you could use a tree in place, and get reasonable performance. However the hash table's nature allows you to pretty much throw more memory at the problem in exchange for speed (though this is hardly unique to this particular data structure).

For instance `a = foo.bar.baz` in Python involves 3 hash gets (local scope, foo scope, then bar scope), and a single set operation (local scope). This is part of the reason Python programs can be optimized by assigning a deep attribute lookup to the local scope outside of a loop's scope, and it will yield improved performance relative to doing the deep attribute lookup inside the loop's scope.

  a = foo.bar.baz
  for _ in range(20):
      print(a)
vs

  for _ in range(20):
      print(foo.bar.baz)



High-level is not a synonym for interpreted, dynamic.


>"For instance `a = foo.bar.baz` in Python involves 3 hash gets (local scope, foo scope, then bar scope), and a single set operation (local scope)"

Can you explain where exactly and why a set operation is performed? Thanks.


When the name `a` is assigned the value of `baz`, Python is setting a key name `a` in the local dictionary (hash table) `locals()`.

Basically `a = 1` is syntactic sugar for `locals()['a'] = 1`

  >>> a
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  NameError: name 'a' is not defined
  >>> locals()['a'] = 113
  >>> a
  113
  >>>
One interesting side-effect of this is you can assign names that are not valid Python syntax explicitly.

For example:

  >>> locals()['foo-bar'] = 1
  >>> locals()['foo-bar']
  1
  >>> foo-bar
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  NameError: name 'foo' is not defined
  >>> 
The name `foo-bar` can't be literally referenced, because the interpreter attempts to interpret it as the subtraction operation `foo - bar`.


This only works in global scope. It won't work in a function:

    def f():
        a = 5
        locals()['a'] = 6
        print(a) # 5
Inside a function, accesses/writes of locals use an internal array of local variables, to skip dicts. Same with constants. See:

https://github.com/python/cpython/blob/master/Python/ceval.c...


locals() is presented to the user as a dictionary, but is that the way cpython actually works with it internally? I've run into weird GC issues that imply it's not a normal dictionary and it's just presented to the user as one for programattic access.


No, that's not how it works internally. The Python compiler resolves all local variables at compile-time. You can see this for yourself by defining a simple function f, import dis, and calling dis.dis(f).

Global variables use a dictionary, however. The disassembly actually looks similar for both.


That's a good question, I'm pretty sure it's not a normal dictionary. However, I'd have to go through the CPython source to confirm. Maybe someone who's more familiar with CPython's implementation will chime in.


It's easier if you look at the bytecode produced for the expression, which is:

    1           0 LOAD_NAME                0 (foo)
                2 LOAD_ATTR                1 (bar)
                4 LOAD_ATTR                2 (baz)
                6 STORE_NAME               3 (a)
                8 LOAD_CONST               0 (None)
               10 RETURN_VALUE
The CPython implementation is a stack-based virtual machine. The first instruction here, LOAD_NAME, pushes co_names['foo'] (basically, whatever 'foo' is bound to in local scope) to the top of the stack. That's at least one hash table lookup, since the scope is a hash table mapping names to values. Then LOAD_ATTR replaces the top-of-stack value with the value of its 'bar' attribute. That's another hash table lookup, since attributes are a hash table mapping names to values. Then another LOAD_ATTR to get to 'baz', that's another hash table lookup. Then STORE_NAME binds the value at the top of the stack to the name given as an argument ('a'). That's an insert or update of a hash table.

So, the expression 'a = foo.bar.baz' involves at least three hash table lookups and one insert or update.


Since Python is an interpreter, it keeps a dictionary/hash table of the local and global variables. These can be accesses by the functions `locals()` and `globals()`.

`a = ... ` can be thought of as `locals()["a"] = ...`

(although you can should not actually modify the local variables this way)


The way you phrase that makes it seem as if the language being slow is somehow the fault of hash tables when it's the use of dynamic linking.


That's because it is the hash tables.

You can have dynamic binding (assuming that's what you meant, dynamic linking is something else) with way fewer hash lookups.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: