Hacker News new | past | comments | ask | show | jobs | submit login
Why Python is more fun that Java (clapper.org)
25 points by marketer on Aug 16, 2008 | hide | past | favorite | 29 comments



I'd wish people stop post all crappy blogposts about shallow language-comparisons.

Saying language X does Y in 4 lines instead of language Zs 5 lines just doesn't say much unless you put it in a bigger perspective.

(And I like Python very much and hate Java).


That's true, and I also like Python more than Java. What's missing in these comparisons are some important architectural consequences of using Python (or ruby or PHP) for web apps. The one reason why I haven't been using Python as much as I like is the GIL (global interpreter lock).

The GIL requires that you use multiple processes to make use of multiple CPUs or cores. And that means you cannot keep much data in memory as it would be duplicated in each process. Some applications benefit a lot from keeping much of the data in memory most of the time, and if you have complex data structures (like graphs) memcached (or similar solutions) is no replacement as you have to rebuild those data structures on each access. Any solution that invalidates process relative adresses incurs a huge overhead in order to access that data.

I know the debate around threads and how problematic they are and how much better a multi process model scales. That's all very well, but threads are the only way to use multiple cores and access in memory data structures fast.

So using Python, for me, means having to write much of my application in C++ (or Java or C#) and access that backend process from the Python web frontend. This approach comes with a lot of complexity and it's a lot more work than using Java or C# in the first place.

On the other hand, keeping a lot of data in memory isn't necessary for most applications as the data structures and the parallelism lives in the DBMS, which is written in C/C++ anyway.

It's also questionable whether Java is a good solution for my scenario. Java and C# need about twice as much memory as C++ for the exact same data structures, and the garbage collector gets slower the more memory a process allocates.

I know there are more options but this post is getting lengthy :-)


If the data structures are static, you can build them, then fork off a bunch of python processes. They will all have the data, but it will only be in memory once. Just make sure to never invoke GC after that point... (unless python's GC isn't compacting). Also make sure to never reference anything, or you will increment it's reference count and trigger copy-on-write...


The data structures are not static. It's effectively an in memory database, so the data keeps changing.


PostgreSQL is not multi-threaded and uses System V IPC. Threads are certainly not the only option.


So how would you do it? I have a large graph on which a number of clients run queries, complex analyses and updates at the same time. I've been pondering various solutions like shared memory, memory mapped files, relying on the disk cache, memcached, etc. But they're all either not performing well because they cause a lot of serialisation in and out of process memory, or they cause overly complex designs or both. I do appreciate any advice I can get...

[edit] Traditional DBMS architectures like that of postgres ,to my knowledge, rely on shared memory or memory mapped files (I'm sure this varies). They swap on disk pages in and out of memory and those pages retain their on-disk structure. There is little if any translation into in-memory data structures. Yes I could do that, but it's incredibly complex. Every single data structure basically has to be implemented on top of a byte array. I wouldn't be able to use any existing libraries for trees, lists, sets, hashtables, etc.


Try ZODB/ZEO, you will be able to access native python data structures (possibly cached in memory) from multiple python processes.

http://www.zope.org/Products/ZEO/ZEOFactSheet


This does not solve the GIL problem - it reduces it just a little.

ZEO and ZODB are nice, but using them also does not solve the problem - there is still one serialization for every update and one de-serialization for every independent client/dirty-read. I love Zope and Plone and use them for a lot of situations (I won't think twice when someone wants a CMS for their intranet), but I sure would like to be able to set "zserver-threads" to 64, buy a Sun Niagara-based server and just live happy with it watching how all thread units get their fair share of usage.


So write something that gives that bigger perspective.


It's a little surprising how many of the Python-favoring features are already present in C# 3.0; closures, syntactically clean properties, and a nicer array syntax; between the yield keyword and LINQ, you get something close to generator syntax; reflection is pretty painless; and while there's no dynamic typing, there's a convenient type inference mechanism.

My personal pet peeve against Java is the lack of multiline string literals, also available in both Python and C#.

Java has some syntactic catching up to do.


I share your peeve. A request for multiline string literals is over 7 years old...

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4472509

Reading the Sun bug database, it's clear who's held back Java's functionality and stability over the years: xxxxx@xxxxx, who on this issue back in 2001 wrote "This is yet another request for a syntactic sugar to save some user from typing. It's not worth it."

I hope for the good of the platform xxxxx has been let go since then.


Agreed. If I ever find xxxxx@xxxxx in a dark alley, he/she better watch out.


Not just string literals - but a better literal syntax for maps and arrays would be really nice.


Interesting article on how to kinda sorta do it using varags, generics and static imports (looks more like scala): http://gleichmann.wordpress.com/2008/01/13/building-your-own...


multiline string literals... also available in C (using \).


Deja vu.

(http://news.ycombinator.com/item?id=260014)

p.s. I thought a dupe post got credited to its predecessor. Is there a time limit on this check?


the URL is different


URL is same (http://brizzled.clapper.org/id/75).

Also, I have experienced this myself. Submitted an article and found it had been submitted ~1-2hr before/after by another (so usually would delete article I submitted).


No, the url is different, which is why it got past the dupe-detection.

The original post's url is actually (http://www.clapper.org/bmc/blog/id/75), which just redirects to the current post's url. Same content, different url.

But yeah, it's a dupe. There should be a way to report this (similar to "flag"), and then the admins can merge the two topics.



You are right. Thanks.


Sorry but I stopped reading here...

System.exit(0);

It's hardly going to be fair and balanced with things like that in it.


His StringBuffer example is also wrong (the java compiler would automatically use a StringBuffer in his example).


StringBuilder?


Oh yeah, that's right, since jdk1.5. It used to be StringBuffer.


TRWTF is this:

  for (String s : args[0].split("\\s+"))
      System.out.println(s);
which is exactly the same as:

  System.out.println(args[0]);
(redacted)


  [jdale@localhost ~]$ java -cp . Test "foo bar baz"
  foo
  bar
  baz


Oh, good point. Cheers.

I then submit my Groovy solution:

  args[0].split().each { println it }


What, no decorators mentioned?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: