Wow, this is a very exciting moment for the Python world.
And they didn't even reached their funding goal for "py3k in pypy" [1]. This is dedication. I encourage everyone to fund this extremely incredible project!
Awesome, I hadn't realized this project was quite this far along. If they get PyPy 3.4/3.5 going with NumPy, it will make a really nice package. Fast Python code for the high-level logic, paired with fast low-level number crunching. This could also help speed up the adoption of Python 3.
It makes sense to use pypy if you're writing pure python code. The second you need a C extension, you're pretty much out of luck. This kills a lot of the appeal for people in the scientific/analytics side of things, who make heavy use of legacy C and Fortran routines.
Well, I'm not talking about every C extension in the world, I'm talking about thin bindings around C routines which is what numpy has for fft for example, someone wrote a basic equivalent for numpypy in a few hours with no prior numpypy/cffi knowledge.
Unfortunately, NumPy uses deep knowledge of the CPython API in quite a few places, which is one of the reasons implementing NumPyPy has been so challenging.
No, you are wrong. It supports both ctypes and cffi, both of which should be the goto for calling native code. Use PyObject has been the stupid choice for over 4 years.
> I wish the community would just switch entirely to pypy.
What purpose would that serve?
> Being able to just slightly performance sensitive code in python is a huge win.
I think you slightly this phrase, but aside from that pypy does not work for everybody and everything (e.g. at best it's no slower for sphinx, it really doesn't like the way docutils works). It's not like pypy's a magic wand.
If PyPy became the official/canonical implementation, PyPy would receive more attention and third-party library compatibility would be a requirement. Complaints about Python's slowness would be somewhat less relevant, and Python might see wider adoption. The RPython toolchain would receive more attention and that could be useful to other languages. There are plenty of reasons, but PyPy is usually a free speedup for your Python application. Who's going to complain about that?
> pypy does not work for everybody and everything
True, but as the official implementation of Python, compatibility with PyPy would then be a must, and this situation would be greatly improved.
I agree with you, but it will never happen. GvR wants a as-simple-as-possible reference implementation, for one, he has to maintain it with a volunteer dev team. Also, there's a split in the Python community between guys like me and you- and the scientific squad. Until the scientific stuff works 100% in PyPy you'd lose a significant portion of the Python userbase by dumping CPython.
GvR has done enough damage to Python with Python3. I don't intend to encourage him to do make any more changes. Us Python web developers are better off using what we have (non reference implementations, which don't hurt anyone), or just use Node.js.
I don't think it's fair to put the blame of the unfortunate way things have gone with Python 3 solely on the shoulders of GvR. Afaik, a huge part of the community felt this was the way to go. Unfortunely, it wasn't.
I agree entirely. What's kind of a pity is that until NumPy is ported over, all of the scientific stack is basically unusable on PyPy - and right now, there are several incredibly good NumPy specific JITs (numexpr, numba, parakeet).
Convincing distros to package it as the default "python" should be the priority. Until that happens, Python 3 will see limited adoption. The path of least resistance will always have the most traffic.
The first step is to get everything python3 compatible, and have it use a hashbang or other mechanism to select the right interpreter. After this happens, the default interpreter has no real meaning: everything will use the right interpreter.
Someone has to take the first step and break the cycle to get the chicken-egg problem undone. Arch has had Python 3 as the default Python interpreter for a couple of years now and it's been working pretty much fine. Many libraries now support Python 3, and I've done full sites in Py 3. Almost all Python scripts I write these days are Python 3. I don't think I've had to downgrade a script that started in 3 down to 2 for a couple of years now. It's as ready as it's going to get.
The groundwork is done, and I think everyone who is going to support Py 3 without any extra prodding has already done so. Now we need the distros to come through and give that extra nudge to the maintainers that are still slacking, or encourage people to replace those libraries that refuse to update.
For my PyCon Russia talk, I pulled down the data for all 44,402 packages (as of May 31). 13.5% of all packages on PyPI support some version of Python 3. 75.5% of the top 200 packages by download count claim to support some Python 3 version (according to their setup.py classifiers). Additionally, 64% of the top 500 support some Python 3 version.
Another interesting thing I saw was that of those 44K packages, 44% of them have seen a release within the last 12 months (representing 82% of the last month's download share), and 22% of those packages released in the last year support some version of Python 3.
Minor note: the openbsd support (at least for 2.x) is amd64 only. Building for i386 at some point requires running a bootstrap process that doesn't fit in memory.
When you're compiling CPython, it's neatly broken into little bite-sized chunks (.c files), each of which has all the type-annotations and such that the compiler needs to produce efficient code.
When you're compiling PyPy, it basically has to load the entire Python interpreter structure into memory so it can do its various analyses and annotations, so compiling PyPy takes a long time. I think for a while it was excluded from certain Linux distros because their package-build-farm machines wouldn't handle it.
Pypy is written in RPython, a subset of the python language. When it's 'compiled', the pypy RPython code runs in cpython or pypy, to re-compile the pypy source into C code, to generate a binary. Lots of tuning and such occurs at the same time, so the JIT runs well on the target machine. This is why it takes a long while, and lots of memory.
I think it's 2 and some change, but yeah. I don't know the specifics. Once bootstrapped, it's more reasonable, but building from source is pretty wicked.
... I think the implication is that more than 4GB would exceed the pre-[PAE][1] memory limit[2]. A form of cross-compilation might work, though PyPy build isn't exactly a simple, 'classical' build process. :P
Edit: also, looking at your comments[3E] it looks like surely you know this (sorry) so I'm now really not sure what you're getting at... :P
My point is requiring a lot of ram for a build is not a problem. Yes it would be nice to support low end devices for PyPy compilation, but the set of people on extremely constrained hardware and those people doing development on PyPy that would need to build from source is well, by definition zero.
32 bit is dead except for ARM, and it will be dead on ARM in 4 years.
> 32 bit is dead except for ARM, and it will be dead on ARM in 4 years.
Uh... sure? ... but the parent post was about how building for 32 bit _today_ simply does not work and will not work.
Whilst it's not necessarily best to build for technology almost gone, there definitely will continue to exist 32 bit devices that people would expect to run Python on for quite a number of years yet - today's 32 bit ARM chips aren't going anywhere awhile and not every form factor (say non-desktop) is well suited to a 64+-bit architecture. :/
Remember we are talking about _building_, actually JITing a JIT using a dynamic language _for_ a dynamic language.
I haven't run a 32 bit desktop or server system since 2004. 32 bit is quite dead. In 4 years, only the cheapest ARM SoCs will be 32 bits. In embedded devices, yes 32 bits will be around for a great long while.
The people surfing the web and buying some music on iTunes are not building PyPy from source. It makes no sense to put the engineering work into supporting such memory constrained dev environments.
On a silly piece of code that nobody would ever have any use for. I have tried PyPy for "real" data and numerical tasks from time to time, and never have I noticed any sort of speedup. Usually it's slower than CPython. Perhaps this latest version will be different, who knows.
I'm using it in production, and speedups tend to be on the order of 4-5x for my app (the compute-intensive part involves hierarchical agglomerative clustering of documents by text similarity, so it's data/numbers-heavy). Obviously it'll depend on your individual application (and non-CPU-bound tasks won't benefit much), but we switched to PyPy because it showed major improvements in profiling of our app on production data (and we switched around PyPy's 1.9 release, so it's even better now). It's not like everyone's just imagining the speed improvements...
I've just finished writing "High Performance Python" for O'Reilly (due August), we have a chapter on Lessons from the Field and one chap talks about his successful many-machine roll out of a complex production system using PyPy for a 2* overall speed gain. We also cover Numba, Cython, profiling, numpy etc - all the topics you'd expect.
Not disagreeing, but they implied that this benchmark only showed a speed improvement because it's a toy, and that real workloads with real data are usually slower. That hasn't been the case in my experience.
You might try again since things have changed. If you don't get any kind of speedup, the PyPy project would likely consider it a bug and it would be helpful to document that it was slower. Please consider finding some way of reporting the specific measurable issues you find!
According to Jython's (a little dated) FAQ <https://wiki.python.org/jython/JythonFaq/GeneralInfo>, "Jython is approximately as fast as CPython--sometimes faster, sometimes slower. Because most JVMs--certainly the fastest ones--do long running, hot code will run faster over time."
PyPy aims to be (and is in many cases) faster than CPython.
The advantage with Jython isn't a performance one: it's the ability to call Java code directly.
MRI has a GIL; major alternative implementations (JRuby, Rubinius) do not.
OTOH, addressing the downsides of a GIL are not the only reasonable motivations for an alternative implementation, so there's no reason that a better-than-stock Python (or Ruby) fundamentally must remove the GIL (the current "MRI" used to be an alternative, YARV, to the old MRI, and both had GILs.)
And they didn't even reached their funding goal for "py3k in pypy" [1]. This is dedication. I encourage everyone to fund this extremely incredible project!
[1]: http://pypy.org/py3donate.html