Hacker News new | past | comments | ask | show | jobs | submit login
Python 3.x: RCE in Python applications that accept floats as untrusted input (mitre.org)
217 points by bitshifta on Feb 18, 2021 | hide | past | favorite | 69 comments



Already fixed in Python 3.6, 3.7, 3.8 and 3.9: https://bugs.python.org/issue42938

3.6.13 and 3.7.10 have already been released with the fix: https://www.python.org/downloads/release/python-3613/ https://www.python.org/downloads/release/python-3710/

The 3.8.8 and 3.9.2 release candidates with the fix will be promoted on Monday March 1st: https://discuss.python.org/t/python-3-9-2rc1-and-3-8-8rc1-ar...

If you're on 3.5 or lower, these versions are no longer receiving security fixes and will not be patched: https://python-release-cycle.glitch.me/


Surely delaying an RCE fix for two weeks on supported versions of Python is a mistake.


Distros are backporting security patches into their releases, so no harm done. If you rely on the python.org releases and don't build from source, then yes, that is a bit sad.

Case in point: The Debian security tracker, see their notes section referencing each commit.

https://security-tracker.debian.org/tracker/CVE-2021-3177


The python:3.8 and python:3.9 container images if used to build web services such as Django with GIS extensions may have an RCE until Python.org sources are updated.


Why can't the base image receive those patches as well?


Those images pull from python.org sources, see:

https://github.com/docker-library/python/blob/master/3.8/bus...


The release candidates are already available to use if you don't want to wait 2 weeks.


Update: 3.8.8 and 3.9.2 have been released. https://discuss.python.org/t/python-3-9-2-and-3-8-8-are-now-...


Minimal example:

  >>> import ctypes
  >>> x = ctypes.c_double.from_param(1e300)
  >>> repr(x)
  Segmentation fault
This happens when getting the string representation of a foreign function float. It doesn't affect the standard builtin float type. So it's not extremely common.

But I can imagine this cropping up if you e.g. find a way to cause an exception that formats a value. Or if there's aggressive logging.


For Fedora 32 and 33 it seems to be already fixed:

    >>> repr(x)
    "<cparam 'd' (1e+300)>"


FWIW this also crashes Python 2.7 on macOS Catalina.


Python 2 is EOL


Remediation for this vulnerability basically caused complete gridlock for the internal tools at a certain FAANG company today.


That was handled poorly to say the least.


hahha, I happen to be working on this at said faang...


I bet it’d be an even bigger deal at a couple financial services companies if they took security updates to internal codebases seriously.


Yip, everyone was paged last night, been in a bad way since. Got grief for commenting as much in thread.


Facebook?


Amazon.


FAANG?


In the early 2000s COBRA reorganized as FAANG in response to changing geopolitical realities.


I read that as corba (java) Took me a while.


Wait what was COBRA? I looked it up but couldn't find it.


From GI Joe.


Who’s GI Joe now?


After his tour ended he ran for president and won.


Wasn't that CORBA?


Facebook, Apple, Amazon, Netflix, Google


What's the one that also includes Microsoft? FANMAG? I recall there was something but can't remember what...


IIRC that would be FAMANG, but I like FanMag.


There are other more offensive anagrams I've seen used...


I prefer the Spoonerism but I don't think it'll take off


The "elite" engineers are using ctypes in production?

ctypes has never been considered even remotely secure, it can call any library function, including, drumroll, sprintf!


There's hints elsewhere it's Amazon.

Their build system is interesting. One of the quirks is you define your application, and every library/package that you depend on. They don't depend on system libraries for their application code. The idea is to get as consistent an operating environment where possible. It's both generally amazing, and an absolute pain in the arse (usually when you least want it to be, because some upstream package changed their dependencies and you end up with version conflicts to unpick).

When Heartbleed came out, the patches landed in the Amazon build system for the OpenSSL package something like midnight. By the time I got in to the office in the morning, almost every service had been fully rebuilt with patches, and services that do CI/CD had already had the patches deployed. Services that didn't have CI/CD were already kicking off deployments of their front end fleets. IIRC they were paging teams when relevant packages were complete to make sure deployments got kicked off ASAP.

So in this case, someone will have patched Python packages for relevant versions, built an updated version, and everything that depends on it will have immediately recompiled, and from there all the packages that depend on those, and so on down the line. Given how python is used a lot for operations etc. I wouldn't be surprised to find a significant chunk of Amazon got rebuilt today, even in cases where Python wasn't being exposed to external users, or even used by the service directly. That's a lot of components, and probably left zero capacity left for anything unrelated, and no doubt there will be quibbles about the ordering in which things got built.


I’m sure the issue was more about patching their Python.


I'd be extremely impressed if someone actually managed to get RCE out of this, considering you'd only be able to use '0' through '9', or 0x30 through 0x39, in your payload.


Hey! Original discoverer here. I would like to clarify some things. It would certainly be possible given that we can have numeric shell-code (check the reference). The real problem in here is the security mechanisms like stack cookies, brute-forcing those would have been a PITA with only numbers. But I'd have to take a closer look at the stack god knows what GCC could have done there :)

On a system without these protection mechanisms it would be a easy win.

Reference: https://haxx.in/posts/numeric-shellcode/


Wow that's very impressive.


If the float is part of an error message or logging message, the attacker may control portions of the string that appear after the float. This would let them inject a much larger range of bytes, perhaps even UTF-8.


It's not that powerful, if I'm reading it correctly: https://github.com/python/cpython/commit/34df10a9a16b38d5442...

The sprintf is to a temporary buffer that's converted to a PyUnicode object before returning, so subsequent portions of the string are written elsewhere.


It sounds like an extreme minimal instruction set computer (MISC), and it's surprising how much is possible with even just one or two opcodes - that can easily be Turing-complete


Semi-related: I discovered that you can divert control flow to an arbitrary point on the TI MSP430 using only the bytes in ascii alphanumeric range.

https://docs.google.com/presentation/d/19K7SK1L49reoFgjEPKCF...


Perhaps . too? Maybe , in different locales?


This inspired me to grep the cpython code for sprintf.

I found one used with the return value from alloca (inside FindAddress in _ctypes.c). It checks alloca for NULL (does alloca ever return NULL?), but I could imagine it might expoitable. FindAddress is a static function that can be called during DLL loading. I imagine that there is very little code that accepts untrusted arguments to DLL loading though (if so, there are bigger problems...).

There is also a lot of use of fixed 32-byte buffers, for up-to-64 bit numbers, which is fine, but hopefully if in the future they can be 128-bit, people remember to fix it!

In other cases, like getnameinfo, the stack-allocated buffer is way too large! %d can never be 512 bytes, so it's just thrashing your cache for no reason.

Anyway surely this has been carefully audited, since grepping for sprintf is so simple!


> Anyway surely this has been carefully audited, since grepping for sprintf is so simple!

That is rarely a safe assumption.


Are Flask/Django APIs that take floats as parameters vulnerable to this under any circumstances? Any POC code that can be used for testing?


If it eventually (or a transitive dependency) passes it to the ctypes module.


Extremely misleading title. This only affects code that puts untrusted floats into ctypes. An relatively very uncommon case. Please stop the clickbait!


Why the downvotes? CyberRabbi is correct. The title makes it sound like all programs that accept floats from untrusted input are vulnerable.


In my experience here I find that it’s not uncommon for correct statements to get downvoted more than incorrect ones.

Why? I think it’s because incorrect statements can be corrected in a reply. OTOH correct statements with which one disagrees or otherwise dislikes cannot be corrected but they can be downvoted.


Because moaning about clickbait is really boring, often correlated with people being surprised because they didn't read the article first, and once again not very useful if you read the article.


I read the article, actually I read it in a concerned state, and then I realized there was little need for concern. Clickbait is a daily problem on this site, there’s no need to insult or attack my character for raising the issue.


What I said has nothing to do with your character. Besides, I recognize your name so know you aren't dumb


Does this affect standard json parsing libraries?


No guarantee, but from what I understand the issue is with objects created by ctypes parsing only. I doubt that any json parser would use that:

    >>> import ctypes;x = ctypes.c_double.from_param(1e300);print(type(x))
    <class 'CArgObject'>
When repr'd, an sprintf call with a statically sized buffer of 256 bytes is used to produce a string:

   https://github.com/python/cpython/commit/d9b8f138b7df3b455b54653ca59f491b4840d6fa#diff-4e23b3237d0aa08bf4c434d75fab19200a80837bd147051fefccd98b7f2480faL500-L507
With 1e300, the resulting string doesn't fit into 256 bytes, thus overflowing the buffer. Exploitation might be interesting, as you (probably?) can only use numbers (ascii range 0x30-0x39):

    >>> len("%f" % 1e300)
    308


I am not sure, but since the files affected in are ctypes, I think it can only affect applications with bindings that are written with ctypes.

As far as I know, ALL bindings in CPython are written with the C APIs and not ctypes, including the JSON library. (I can't guarantee that but it shouldn't be that hard to audit.)

The official page is not any more clear on this:

https://python-security.readthedocs.io/vuln/ctypes-buffer-ov...

I use one ctypes binding for convenience (a single function for CommonMark) but I prefer to use C APIs for this reason ... it's a lot of complexity and unsafety. Using ctypes wrong can crash your process due to invoking undefined behavior.

It would be nice to have a list of bindings created with ctypes, since I think most consumers probably have no idea. I thought that PyOpenGL is done with ctypes but don't quote me on that ...


With a quick grep I find this:

- `uuid` used ctypes until 3.9 to get information like the IP address (without floats, so it's not vulnerable)

- `platform` uses ctypes in 2.7 to get the Windows version (without floats)

- `multiprocessing` uses ctypes for interoperability with ctypes

That's everything, on the versions I checked. `json` for example uses a native Python module `_json`, so it doesn't use ctypes.


Django's GIS extensions use c_double, that's a large surface area.


We need to delete 'sprintf' forever.


Deleting sprintf by itself wouldn't work, even the "safe" replacement functions rely on the developer providing the safety information manually. C needs a string type. Sadly large parts of the C standard are build around turning plain character arrays into exploits by pretending that they are a sane choice for string manipulation.


That is why I use Pascal and never had a safety problem when processing strings. It is completely baffling that people still use C in production

Only issue is that the default string type requires heap allocations


asprintf is fine for most cases. For the other cases you probably don't want to dynamically allocate memory so snprintf with a fixed buffer size is fine.


Nope, there's still %n in snprintf. snprintf_s is better, but still a security nightmare.


Sure, but you can just not use %n and not accept format strings from users?


You would think converting floats to and from strings was a solved problem by now...

I'm surprised noone ever looked at that code --- even when writing it --- and thought "how long can a %f get?" I've been writing C for a long time and that's just something which comes naturally, being ingrained into memory since the beginning. If I see a fixed-size buffer I will always question whether it's big enough (and also if it's perhaps even too big.)

The "psychology" around buffer overflows has always seemed strange to me; a real-world analogy is someone who has no idea how big his car is, finds a parking spot that "looks big enough", and just rams it in without a second thought, sometimes crashing into the surroundings. Not many people would do that in the real world. Yet countless programmers seemingly can't get something simple like this right?

Edit: downvoters, care to state your case?


“You would think converting floats to and from strings was a solved problem by now”

It is, but it is an extremely difficult problem. I think “How to print floating-point numbers accurately” (https://dl.acm.org/doi/10.1145/93548.93559) was the first correct implementation. That is from 1990 and, according to its authors “was almost 20 years in the making” (http://kurtstephens.com/files/p372-steele.pdf)

(Faster versions have since been published)


...or at least knowing how long the result would be?


That, of course, is easier, but you have to realize that %f never uses scientific notation for its output (something I must have known at some time, but if you had asked, I would have had to google it), and that DBL_MAX has a lot of digits (it is about 1.8 × 10^308)

I think it is unfortunately that %f, the easiest to remember type field for floating point number, is used for this behavior. Most of the time, you’ll want to use %g, which uses scientific notation if it is shorter.

I guess linters should warn about bare %f without length fields.


Didn't downvote, but I agree, nobody wants to work on the low level stacks anymore, not as sexy as CRUD apps.

I would have expected this issue to be a solved problem too, and now we get extremely insecure Python 3 codebases as a result of this vuln.

So much for moving from Python 2 to 3, will now wait for Python 4.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: