Distros are backporting security patches into their releases, so no harm done. If you rely on the python.org releases and don't build from source, then yes, that is a bit sad.
Case in point: The Debian security tracker, see their notes section referencing each commit.
The python:3.8 and python:3.9 container images if used to build web services such as Django with GIS extensions may have an RCE until Python.org sources are updated.
This happens when getting the string representation of a foreign function float. It doesn't affect the standard builtin float type. So it's not extremely common.
But I can imagine this cropping up if you e.g. find a way to cause an exception that formats a value. Or if there's aggressive logging.
Their build system is interesting. One of the quirks is you define your application, and every library/package that you depend on. They don't depend on system libraries for their application code. The idea is to get as consistent an operating environment where possible. It's both generally amazing, and an absolute pain in the arse (usually when you least want it to be, because some upstream package changed their dependencies and you end up with version conflicts to unpick).
When Heartbleed came out, the patches landed in the Amazon build system for the OpenSSL package something like midnight. By the time I got in to the office in the morning, almost every service had been fully rebuilt with patches, and services that do CI/CD had already had the patches deployed. Services that didn't have CI/CD were already kicking off deployments of their front end fleets. IIRC they were paging teams when relevant packages were complete to make sure deployments got kicked off ASAP.
So in this case, someone will have patched Python packages for relevant versions, built an updated version, and everything that depends on it will have immediately recompiled, and from there all the packages that depend on those, and so on down the line. Given how python is used a lot for operations etc. I wouldn't be surprised to find a significant chunk of Amazon got rebuilt today, even in cases where Python wasn't being exposed to external users, or even used by the service directly. That's a lot of components, and probably left zero capacity left for anything unrelated, and no doubt there will be quibbles about the ordering in which things got built.
I'd be extremely impressed if someone actually managed to get RCE out of this, considering you'd only be able to use '0' through '9', or 0x30 through 0x39, in your payload.
Hey! Original discoverer here. I would like to clarify some things. It would certainly be possible given that we can have numeric shell-code (check the reference). The real problem in here is the security mechanisms like stack cookies, brute-forcing those would have been a PITA with only numbers. But I'd have to take a closer look at the stack god knows what GCC could have done there :)
On a system without these protection mechanisms it would be a easy win.
If the float is part of an error message or logging message, the attacker may control portions of the string that appear after the float. This would let them inject a much larger range of bytes, perhaps even UTF-8.
The sprintf is to a temporary buffer that's converted to a PyUnicode object before returning, so subsequent portions of the string are written elsewhere.
It sounds like an extreme minimal instruction set computer (MISC), and it's surprising how much is possible with even just one or two opcodes - that can easily be Turing-complete
This inspired me to grep the cpython code for sprintf.
I found one used with the return value from alloca (inside FindAddress in _ctypes.c). It checks alloca for NULL (does alloca ever return NULL?), but I could imagine it might expoitable. FindAddress is a static function that can be called during DLL loading. I imagine that there is very little code that accepts untrusted arguments to DLL loading though (if so, there are bigger problems...).
There is also a lot of use of fixed 32-byte buffers, for up-to-64 bit numbers, which is fine, but hopefully if in the future they can be 128-bit, people remember to fix it!
In other cases, like getnameinfo, the stack-allocated buffer is way too large! %d can never be 512 bytes, so it's just thrashing your cache for no reason.
Anyway surely this has been carefully audited, since grepping for sprintf is so simple!
Extremely misleading title. This only affects code that puts untrusted floats into ctypes. An relatively very uncommon case. Please stop the clickbait!
In my experience here I find that it’s not uncommon for correct statements to get downvoted more than incorrect ones.
Why? I think it’s because incorrect statements can be corrected in a reply. OTOH correct statements with which one disagrees or otherwise dislikes cannot be corrected but they can be downvoted.
Because moaning about clickbait is really boring, often correlated with people being surprised because they didn't read the article first, and once again not very useful if you read the article.
I read the article, actually I read it in a concerned state, and then I realized there was little need for concern. Clickbait is a daily problem on this site, there’s no need to insult or attack my character for raising the issue.
With 1e300, the resulting string doesn't fit into 256 bytes, thus overflowing the buffer. Exploitation might be interesting, as you (probably?) can only use numbers (ascii range 0x30-0x39):
I am not sure, but since the files affected in are ctypes, I think it can only affect applications with bindings that are written with ctypes.
As far as I know, ALL bindings in CPython are written with the C APIs and not ctypes, including the JSON library. (I can't guarantee that but it shouldn't be that hard to audit.)
I use one ctypes binding for convenience (a single function for CommonMark) but I prefer to use C APIs for this reason ... it's a lot of complexity and unsafety. Using ctypes wrong can crash your process due to invoking undefined behavior.
It would be nice to have a list of bindings created with ctypes, since I think most consumers probably have no idea. I thought that PyOpenGL is done with ctypes but don't quote me on that ...
Deleting sprintf by itself wouldn't work, even the "safe" replacement functions rely on the developer providing the safety information manually. C needs a string type. Sadly large parts of the C standard are build around turning plain character arrays into exploits by pretending that they are a sane choice for string manipulation.
asprintf is fine for most cases. For the other cases you probably don't want to dynamically allocate memory so snprintf with a fixed buffer size is fine.
You would think converting floats to and from strings was a solved problem by now...
I'm surprised noone ever looked at that code --- even when writing it --- and thought "how long can a %f get?" I've been writing C for a long time and that's just something which comes naturally, being ingrained into memory since the beginning. If I see a fixed-size buffer I will always question whether it's big enough (and also if it's perhaps even too big.)
The "psychology" around buffer overflows has always seemed strange to me; a real-world analogy is someone who has no idea how big his car is, finds a parking spot that "looks big enough", and just rams it in without a second thought, sometimes crashing into the surroundings. Not many people would do that in the real world. Yet countless programmers seemingly can't get something simple like this right?
That, of course, is easier, but you have to realize that %f never uses scientific notation for its output (something I must have known at some time, but if you had asked, I would have had to google it), and that DBL_MAX has a lot of digits (it is about 1.8 × 10^308)
I think it is unfortunately that %f, the easiest to remember type field for floating point number, is used for this behavior. Most of the time, you’ll want to use %g, which uses scientific notation if it is shorter.
I guess linters should warn about bare %f without length fields.
3.6.13 and 3.7.10 have already been released with the fix: https://www.python.org/downloads/release/python-3613/ https://www.python.org/downloads/release/python-3710/
The 3.8.8 and 3.9.2 release candidates with the fix will be promoted on Monday March 1st: https://discuss.python.org/t/python-3-9-2rc1-and-3-8-8rc1-ar...
If you're on 3.5 or lower, these versions are no longer receiving security fixes and will not be patched: https://python-release-cycle.glitch.me/