Hacker News new | past | comments | ask | show | jobs | submit login
The first stable release of PyPy3 (morepypy.blogspot.com)
299 points by pjenvey on June 20, 2014 | hide | past | favorite | 84 comments



Wow, this is a very exciting moment for the Python world.

And they didn't even reached their funding goal for "py3k in pypy" [1]. This is dedication. I encourage everyone to fund this extremely incredible project!

[1]: http://pypy.org/py3donate.html


I have been checking the py3k branch on Hg every other day waiting for this moment, what a pleasant surprise. Very, very exciting. Thanks all.

I donated a while back, will make another donation soon.

I would like to start using this immediately but I think I'll have to wait until a 3.3 release for "yield from".


Awesome, I hadn't realized this project was quite this far along. If they get PyPy 3.4/3.5 going with NumPy, it will make a really nice package. Fast Python code for the high-level logic, paired with fast low-level number crunching. This could also help speed up the adoption of Python 3.


Looks like they're over 80% of the way to hitting the funding goal for that one too:

http://pypy.org/numpydonate.html


The problem is: even if numpy gets ported we still don't have scipy and a million other packages which require C bindings.


True. Though this will likely make PyPy more mainstream, and thus it'll hopefully attain more community support.


I wish the community would just switch entirely to pypy. Being able to just slightly performance sensitive code in python is a huge win.


It makes sense to use pypy if you're writing pure python code. The second you need a C extension, you're pretty much out of luck. This kills a lot of the appeal for people in the scientific/analytics side of things, who make heavy use of legacy C and Fortran routines.


> The second you need a C extension, you're pretty much out of luck.

In theory, shouldn't CFFI be the foundation of the solution to that problem?


In practice it works pretty well. I am nearing completion of a rewrite of X's XCB-based python bindings in cffi, and it has worked out quite nicely.


yes


You can use these C and Fortran routines on PyPy, just not with the CPython C extension API.


There's a lot of established code built around the C API. It's not like it can just be rewritten using CFFI over a weekend.


Well, I'm not talking about every C extension in the world, I'm talking about thin bindings around C routines which is what numpy has for fft for example, someone wrote a basic equivalent for numpypy in a few hours with no prior numpypy/cffi knowledge.


Unfortunately, NumPy uses deep knowledge of the CPython API in quite a few places, which is one of the reasons implementing NumPyPy has been so challenging.


Another approach should be used for those, but to be clear I wasn't talking about the entire numpy library, I'm talking about things like numpy.fft


No, you are wrong. It supports both ctypes and cffi, both of which should be the goto for calling native code. Use PyObject has been the stupid choice for over 4 years.


> I wish the community would just switch entirely to pypy.

What purpose would that serve?

> Being able to just slightly performance sensitive code in python is a huge win.

I think you slightly this phrase, but aside from that pypy does not work for everybody and everything (e.g. at best it's no slower for sphinx, it really doesn't like the way docutils works). It's not like pypy's a magic wand.


> What purpose would that serve?

If PyPy became the official/canonical implementation, PyPy would receive more attention and third-party library compatibility would be a requirement. Complaints about Python's slowness would be somewhat less relevant, and Python might see wider adoption. The RPython toolchain would receive more attention and that could be useful to other languages. There are plenty of reasons, but PyPy is usually a free speedup for your Python application. Who's going to complain about that?

> pypy does not work for everybody and everything

True, but as the official implementation of Python, compatibility with PyPy would then be a must, and this situation would be greatly improved.


I agree with you, but it will never happen. GvR wants a as-simple-as-possible reference implementation, for one, he has to maintain it with a volunteer dev team. Also, there's a split in the Python community between guys like me and you- and the scientific squad. Until the scientific stuff works 100% in PyPy you'd lose a significant portion of the Python userbase by dumping CPython.

GvR has done enough damage to Python with Python3. I don't intend to encourage him to do make any more changes. Us Python web developers are better off using what we have (non reference implementations, which don't hurt anyone), or just use Node.js.


I don't think it's fair to put the blame of the unfortunate way things have gone with Python 3 solely on the shoulders of GvR. Afaik, a huge part of the community felt this was the way to go. Unfortunely, it wasn't.


Killing Python 3 isn't something a majority of the community wants and it isn't objectively better either.


I agree entirely. What's kind of a pity is that until NumPy is ported over, all of the scientific stack is basically unusable on PyPy - and right now, there are several incredibly good NumPy specific JITs (numexpr, numba, parakeet).


Maybe moving libraries & code to Python 3 should be the priority.


Convincing distros to package it as the default "python" should be the priority. Until that happens, Python 3 will see limited adoption. The path of least resistance will always have the most traffic.


Ubuntu seems to have that as a near-term goal: https://wiki.ubuntu.com/Python/3


The first step is to get everything python3 compatible, and have it use a hashbang or other mechanism to select the right interpreter. After this happens, the default interpreter has no real meaning: everything will use the right interpreter.


It's a chicken and egg problem. So long as most Python libraries run on Python 2 but not Python 3, distros are going to package Python 2.


Someone has to take the first step and break the cycle to get the chicken-egg problem undone. Arch has had Python 3 as the default Python interpreter for a couple of years now and it's been working pretty much fine. Many libraries now support Python 3, and I've done full sites in Py 3. Almost all Python scripts I write these days are Python 3. I don't think I've had to downgrade a script that started in 3 down to 2 for a couple of years now. It's as ready as it's going to get.

The groundwork is done, and I think everyone who is going to support Py 3 without any extra prodding has already done so. Now we need the distros to come through and give that extra nudge to the maintainers that are still slacking, or encourage people to replace those libraries that refuse to update.


Actually, the majority of PyPI packages are python-3 compatible. For a status overview, see http://python3wos.appspot.com/


That's not what that site is saying.

For my PyCon Russia talk, I pulled down the data for all 44,402 packages (as of May 31). 13.5% of all packages on PyPI support some version of Python 3. 75.5% of the top 200 packages by download count claim to support some Python 3 version (according to their setup.py classifiers). Additionally, 64% of the top 500 support some Python 3 version.

Another interesting thing I saw was that of those 44K packages, 44% of them have seen a release within the last 12 months (representing 82% of the last month's download share), and 22% of those packages released in the last year support some version of Python 3.


How much memory do you have?


For a sufficient performance increase? As much as it takes. Memory is cheap


On servers and especially on virtualized servers it is absolutely not.


Minor note: the openbsd support (at least for 2.x) is amd64 only. Building for i386 at some point requires running a bootstrap process that doesn't fit in memory.


> Building for i386 at some point requires running a bootstrap process that doesn't fit in memory.

Seriously, it takes more than 4gigs to build PyPy? Is that also necessary for other platforms besides OpenBSD?


When you're compiling CPython, it's neatly broken into little bite-sized chunks (.c files), each of which has all the type-annotations and such that the compiler needs to produce efficient code.

When you're compiling PyPy, it basically has to load the entire Python interpreter structure into memory so it can do its various analyses and annotations, so compiling PyPy takes a long time. I think for a while it was excluded from certain Linux distros because their package-build-farm machines wouldn't handle it.


http://stackoverflow.com/questions/8452396/does-pypy-transla...

Pypy is written in RPython, a subset of the python language. When it's 'compiled', the pypy RPython code runs in cpython or pypy, to re-compile the pypy source into C code, to generate a binary. Lots of tuning and such occurs at the same time, so the JIT runs well on the target machine. This is why it takes a long while, and lots of memory.

The build also prints a fractal while compiling. http://pypy.readthedocs.org/en/latest/faq.html#why-does-pypy...


I think it's 2 and some change, but yeah. I don't know the specifics. Once bootstrapped, it's more reasonable, but building from source is pretty wicked.


4GB is literally nothing. My laptop has 16, most servers I use have 128+. 4GB is netbook territory.


... I think the implication is that more than 4GB would exceed the pre-[PAE][1] memory limit[2]. A form of cross-compilation might work, though PyPy build isn't exactly a simple, 'classical' build process. :P

Edit: also, looking at your comments[3E] it looks like surely you know this (sorry) so I'm now really not sure what you're getting at... :P

[1]: http://en.wikipedia.org/wiki/Physical_Address_Extension

[2]: and even with PAE you still need to split into multiple processes/address spaces to do anything useful

[3E]: https://news.ycombinator.com/threads?id=sitkack


My point is requiring a lot of ram for a build is not a problem. Yes it would be nice to support low end devices for PyPy compilation, but the set of people on extremely constrained hardware and those people doing development on PyPy that would need to build from source is well, by definition zero.

32 bit is dead except for ARM, and it will be dead on ARM in 4 years.


> 32 bit is dead except for ARM, and it will be dead on ARM in 4 years.

Uh... sure? ... but the parent post was about how building for 32 bit _today_ simply does not work and will not work.

Whilst it's not necessarily best to build for technology almost gone, there definitely will continue to exist 32 bit devices that people would expect to run Python on for quite a number of years yet - today's 32 bit ARM chips aren't going anywhere awhile and not every form factor (say non-desktop) is well suited to a 64+-bit architecture. :/


Remember we are talking about _building_, actually JITing a JIT using a dynamic language _for_ a dynamic language.

I haven't run a 32 bit desktop or server system since 2004. 32 bit is quite dead. In 4 years, only the cheapest ARM SoCs will be 32 bits. In embedded devices, yes 32 bits will be around for a great long while.


4GB is not literally nothing, it's 25% of the memory available on your laptop. That's a significant chunk.


That's not true. $1000 ultrabooks often have 4GB. Hell, the base model rMBP has 4GB (I paid the extra for 8GB).


The people surfing the web and buying some music on iTunes are not building PyPy from source. It makes no sense to put the engineering work into supporting such memory constrained dev environments.


I've just made a small donation.


This is awesome, now just to wait for a Python 3.4 PyPy release :D


And Numpy! And ctypes (for Matplotlib)!

Although I must say, numpypy is quite usable already!


PyPy has had ctypes support for a great long while.


True. Not complete enoughbfor Matplotlib, though.


I think you're talking about the c extension api.


I created a simple Terminal instance that compares Python and PyPy in a performance test:

https://terminal.com/tiny/shkhWWkcEV

(this lets you compare the performance on a real Linux system, without installing anything)


The PyPy people themselves have a benchmark portal at http://speed.pypy.org/ with graphs and everything.


PyPy seems to be 7x faster!


On a silly piece of code that nobody would ever have any use for. I have tried PyPy for "real" data and numerical tasks from time to time, and never have I noticed any sort of speedup. Usually it's slower than CPython. Perhaps this latest version will be different, who knows.


I'm using it in production, and speedups tend to be on the order of 4-5x for my app (the compute-intensive part involves hierarchical agglomerative clustering of documents by text similarity, so it's data/numbers-heavy). Obviously it'll depend on your individual application (and non-CPU-bound tasks won't benefit much), but we switched to PyPy because it showed major improvements in profiling of our app on production data (and we switched around PyPy's 1.9 release, so it's even better now). It's not like everyone's just imagining the speed improvements...


I've just finished writing "High Performance Python" for O'Reilly (due August), we have a chapter on Lessons from the Field and one chap talks about his successful many-machine roll out of a complex production system using PyPy for a 2* overall speed gain. We also cover Numba, Cython, profiling, numpy etc - all the topics you'd expect.


It's not like everyone's just imagining that it's slower for many work loads either.


Not disagreeing, but they implied that this benchmark only showed a speed improvement because it's a toy, and that real workloads with real data are usually slower. That hasn't been the case in my experience.


Help us make your code faster, report it please :)


You do remember that it's a jit and the first run is not fast? You have to let it run for a while to generate fast code and only benchmark after that.


You might try again since things have changed. If you don't get any kind of speedup, the PyPy project would likely consider it a bug and it would be helpful to document that it was slower. Please consider finding some way of reporting the specific measurable issues you find!


Excellent. I've been waiting for this for a long time.


How does the performance of PyPy and Jython compare?


According to Jython's (a little dated) FAQ <https://wiki.python.org/jython/JythonFaq/GeneralInfo>, "Jython is approximately as fast as CPython--sometimes faster, sometimes slower. Because most JVMs--certainly the fastest ones--do long running, hot code will run faster over time."

PyPy aims to be (and is in many cases) faster than CPython.

The advantage with Jython isn't a performance one: it's the ability to call Java code directly.


Jython is usually slower than CPython I believe, it has no GIL though.


Wonder if it can be faster under higher parallelism conditions. Multiple threads doing some CPU intensive work?


Jython can utilize threads as well as Java can, so on many core machine Jython wins by a pretty large margin.


Thank you.


I don't know or use Python but why an implementation that is trying to be "superior" still has the GIL?


This donation page has some background on how PyPy is proposing to replace the GIL with software transactional memory:

http://pypy.org/tmdonate2.html#introduction


What DO you know or use? Did you think that the GIL was an obvious and stupid oversight made by stupid people for no good reason?


Obviously the GIL was shortsighted, yes. Leave the people out of it. The idea was stupid. There was a reason, but it wasn't a good reason.


What would you replace it with ?


No GIL.


Your username is apt, but novelty accounts aren't a thing here. What exactly are you hoping to communicate?


My name is Gil. Look at my comment history and apologize.


you need something to allow concurrent access to internal interpreter data structures...


STM or MVar


Because it's very tricky to remove.

Ruby also has a GIL.


> Ruby also has a GIL.

MRI has a GIL; major alternative implementations (JRuby, Rubinius) do not.

OTOH, addressing the downsides of a GIL are not the only reasonable motivations for an alternative implementation, so there's no reason that a better-than-stock Python (or Ruby) fundamentally must remove the GIL (the current "MRI" used to be an alternative, YARV, to the old MRI, and both had GILs.)


True. Jython and IronPython also do not have a GIL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: