Hacker News new | past | comments | ask | show | jobs | submit login

[flagged]



It's worth reading the article. In this case, it seems to have been a hardware issue - as such, not directly related to Rust, C, or Python, but triggered by an instruction that was only called by some file loading routines. It's a very cool deep dive into debugging these sorts of issues.


Although true that it's great article.

It states that python is faster then c, that is not possible since python is build with c. There could be other reasons such libs or implementation.

Also note that the issue he had was not resolved.

The comment was about that python is seen as slow. But that is not always the case.

Once a dev is able to understand the difference between the python and c parts. Python can be quite performant, and efficient with memory.

But if one would actually create a application that does more then just read a file it will be slow again compared to c and rust.


It's not stating python is faster than c in general. This is just one very specific case where non-page-aligned memeory reading on AMD is involved.


It does make me wonder why pymallov and jemalloc used page aligned memory, but glibc didn't. That is odd. Other questions never answered, why did pyo3 add so much overhead? it was over half the difference between the two.


> It does make me wonder why pymallov and jemalloc used page aligned memory, but glibc didn't.

The root cause is not about page alignment. In fact, all allocators are aligned.

The root cause is AMD CPU didn't implement FSRM correctly while copying data from 0x1000 * n ~ 0x1000 * n + 0x10.

> Other questions never answered, why did pyo3 add so much overhead? it was over half the difference between the two.

OpenDAL Python Binding v0.42 does have many place to improve, like we can alloc the buffer in advance or using `read_buf` into uninit vec. I skipped this part since they are not the root cause.


> It does make me wonder why pymallov and jemalloc used page aligned memory, but glibc didn't. That is odd.

Other way around: with glibc it was page-aligned; with the others, it wasn't.

This weird Zen performance quirk aside, I'd prefer page alignment so that an allocation like this which is a nice multiple of the page size doesn't waste anything (RAM or TLB), with the memory allocator's own bookkeeping in a separate block. Pretty surprising to me that the other allocators do something else.


The context of my initial comment is that python is slow, but can be fast.

From the article.

> In conclusion, the issue isn't software-related. Python outperforms C/Rust due to an AMD CPU bug.


> It states that python is faster then c, that is not possible since python is build with c. There could be other reasons such libs or implementation.

In a really strict sense it's impossible to talk about the speed of languages, since any turing complete language could be implemented in any other. In practice when people say X is faster than Y, they mean in practice as actually used; it's completely possible, for instance, that if you ask a large pool of C programmers to... I dunno, sum ten billion integers, and the same to a large pool of Python programmers, most of the C devs will reach for a `for` loop and most of the Python devs will reach for numpy and get vectorization for free, and if that's the case then it's reasonable to say that Python is faster. Or in the actual case at hand, writing the same(ish) program in Rust and Python on the same hardware does result in the Python version being faster, even though it's a bug from that exact hardware not getting along with something under the hood in the Rust version.


The NumPy library doesn't utilize Python's C layer for its memory management.

Instead, it maintains its own memory space. Consequently, transferring data from the Python environment into NumPy or vice versa is relatively slow.

The process of opening a file and travesing its data within Python relies heavily on the C code behind the scenes, resulting in near-C performance.

However, if one were to write an algorithm along the lines of LeetCode - one that has a time complexity of n*2 - Python's performance will be slower compared to other languages. This difference could range from a factor of one to potentially even a hundred.


In this case it is a hardware bug and in no way attributable to Python being fast.


The bug is the other way around :)


reminds me of HP48 programming, there was sysrpl which was near asm speed, and then user rpl which was slooow




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: