Python 3.x: RCE in Python applications that accept floats as untrusted input

di · on Feb 18, 2021

Already fixed in Python 3.6, 3.7, 3.8 and 3.9: https://bugs.python.org/issue42938

3.6.13 and 3.7.10 have already been released with the fix: https://www.python.org/downloads/release/python-3613/ https://www.python.org/downloads/release/python-3710/

The 3.8.8 and 3.9.2 release candidates with the fix will be promoted on Monday March 1st: https://discuss.python.org/t/python-3-9-2rc1-and-3-8-8rc1-ar...

If you're on 3.5 or lower, these versions are no longer receiving security fixes and will not be patched: https://python-release-cycle.glitch.me/

AaronFriel · on Feb 19, 2021

Surely delaying an RCE fix for two weeks on supported versions of Python is a mistake.

hexa- · on Feb 19, 2021

Distros are backporting security patches into their releases, so no harm done. If you rely on the python.org releases and don't build from source, then yes, that is a bit sad.

Case in point: The Debian security tracker, see their notes section referencing each commit.

https://security-tracker.debian.org/tracker/CVE-2021-3177

AaronFriel · on Feb 19, 2021

The python:3.8 and python:3.9 container images if used to build web services such as Django with GIS extensions may have an RCE until Python.org sources are updated.

hexa- · on Feb 19, 2021

Why can't the base image receive those patches as well?

AaronFriel · on Feb 19, 2021

Those images pull from python.org sources, see:

https://github.com/docker-library/python/blob/master/3.8/bus...

di · on Feb 19, 2021

The release candidates are already available to use if you don't want to wait 2 weeks.

uranusjr · on Feb 19, 2021

Update: 3.8.8 and 3.9.2 have been released. https://discuss.python.org/t/python-3-9-2-and-3-8-8-are-now-...

duckerude · on Feb 18, 2021

Minimal example:

  >>> import ctypes
  >>> x = ctypes.c_double.from_param(1e300)
  >>> repr(x)
  Segmentation fault

This happens when getting the string representation of a foreign function float. It doesn't affect the standard builtin float type. So it's not extremely common.

But I can imagine this cropping up if you e.g. find a way to cause an exception that formats a value. Or if there's aggressive logging.

erdewit · on Feb 19, 2021

For Fedora 32 and 33 it seems to be already fixed:

    >>> repr(x)
    "<cparam 'd' (1e+300)>"

PhantomGremlin · on Feb 19, 2021

FWIW this also crashes Python 2.7 on macOS Catalina.

pantalaimon · on Feb 19, 2021

Python 2 is EOL

LennyWhiteJr · on Feb 18, 2021

Remediation for this vulnerability basically caused complete gridlock for the internal tools at a certain FAANG company today.

belval · on Feb 19, 2021

That was handled poorly to say the least.

logicslave · on Feb 19, 2021

hahha, I happen to be working on this at said faang...

koolba · on Feb 18, 2021

I bet it’d be an even bigger deal at a couple financial services companies if they took security updates to internal codebases seriously.

biosed · on Feb 19, 2021

Yip, everyone was paged last night, been in a bad way since. Got grief for commenting as much in thread.

saagarjha · on Feb 19, 2021

Facebook?

granzymes · on Feb 19, 2021

Amazon.

kiwijamo · on Feb 18, 2021

FAANG?

giantrobot · on Feb 18, 2021

In the early 2000s COBRA reorganized as FAANG in response to changing geopolitical realities.

gonzo41 · on Feb 19, 2021

I read that as corba (java) Took me a while.

why_only_15 · on Feb 19, 2021

Wait what was COBRA? I looked it up but couldn't find it.

Asooka · on Feb 19, 2021

From GI Joe.

basementcat · on Feb 19, 2021

Who’s GI Joe now?

whatshisface · on Feb 19, 2021

After his tour ended he ran for president and won.

abiogenesis · on Feb 19, 2021

Wasn't that CORBA?

TheAdamAndChe · on Feb 18, 2021

Facebook, Apple, Amazon, Netflix, Google

bialpio · on Feb 19, 2021

What's the one that also includes Microsoft? FANMAG? I recall there was something but can't remember what...

Tijdreiziger · on Feb 19, 2021

IIRC that would be FAMANG, but I like FanMag.

grenoire · on Feb 19, 2021

There are other more offensive anagrams I've seen used...

SturgeonsLaw · on Feb 19, 2021

I prefer the Spoonerism but I don't think it'll take off

mkl100 · on Feb 18, 2021

The "elite" engineers are using ctypes in production?

ctypes has never been considered even remotely secure, it can call any library function, including, drumroll, sprintf!

Twirrim · on Feb 19, 2021

There's hints elsewhere it's Amazon.

Their build system is interesting. One of the quirks is you define your application, and every library/package that you depend on. They don't depend on system libraries for their application code. The idea is to get as consistent an operating environment where possible. It's both generally amazing, and an absolute pain in the arse (usually when you least want it to be, because some upstream package changed their dependencies and you end up with version conflicts to unpick).

When Heartbleed came out, the patches landed in the Amazon build system for the OpenSSL package something like midnight. By the time I got in to the office in the morning, almost every service had been fully rebuilt with patches, and services that do CI/CD had already had the patches deployed. Services that didn't have CI/CD were already kicking off deployments of their front end fleets. IIRC they were paging teams when relevant packages were complete to make sure deployments got kicked off ASAP.

So in this case, someone will have patched Python packages for relevant versions, built an updated version, and everything that depends on it will have immediately recompiled, and from there all the packages that depend on those, and so on down the line. Given how python is used a lot for operations etc. I wouldn't be surprised to find a significant chunk of Amazon got rebuilt today, even in cases where Python wasn't being exposed to external users, or even used by the service directly. That's a lot of components, and probably left zero capacity left for anything unrelated, and no doubt there will be quibbles about the ordering in which things got built.

saagarjha · on Feb 19, 2021

I’m sure the issue was more about patching their Python.

Sohcahtoa82 · on Feb 19, 2021

I'd be extremely impressed if someone actually managed to get RCE out of this, considering you'd only be able to use '0' through '9', or 0x30 through 0x39, in your payload.

jordyzomer · on Feb 19, 2021

Hey! Original discoverer here. I would like to clarify some things. It would certainly be possible given that we can have numeric shell-code (check the reference). The real problem in here is the security mechanisms like stack cookies, brute-forcing those would have been a PITA with only numbers. But I'd have to take a closer look at the stack god knows what GCC could have done there :)

On a system without these protection mechanisms it would be a easy win.

Reference: https://haxx.in/posts/numeric-shellcode/

Sohcahtoa82 · on Feb 20, 2021

Wow that's very impressive.

mleonhard · on Feb 19, 2021

If the float is part of an error message or logging message, the attacker may control portions of the string that appear after the float. This would let them inject a much larger range of bytes, perhaps even UTF-8.

duckerude · on Feb 19, 2021

It's not that powerful, if I'm reading it correctly: https://github.com/python/cpython/commit/34df10a9a16b38d5442...

The sprintf is to a temporary buffer that's converted to a PyUnicode object before returning, so subsequent portions of the string are written elsewhere.

herendin2 · on Feb 19, 2021

It sounds like an extreme minimal instruction set computer (MISC), and it's surprising how much is possible with even just one or two opcodes - that can easily be Turing-complete

SilasX · on Feb 19, 2021

Semi-related: I discovered that you can divert control flow to an arbitrary point on the TI MSP430 using only the bytes in ascii alphanumeric range.

https://docs.google.com/presentation/d/19K7SK1L49reoFgjEPKCF...

cozzyd · on Feb 19, 2021

Perhaps . too? Maybe , in different locales?

cozzyd · on Feb 19, 2021

This inspired me to grep the cpython code for sprintf.

I found one used with the return value from alloca (inside FindAddress in _ctypes.c). It checks alloca for NULL (does alloca ever return NULL?), but I could imagine it might expoitable. FindAddress is a static function that can be called during DLL loading. I imagine that there is very little code that accepts untrusted arguments to DLL loading though (if so, there are bigger problems...).

There is also a lot of use of fixed 32-byte buffers, for up-to-64 bit numbers, which is fine, but hopefully if in the future they can be 128-bit, people remember to fix it!

In other cases, like getnameinfo, the stack-allocated buffer is way too large! %d can never be 512 bytes, so it's just thrashing your cache for no reason.

Anyway surely this has been carefully audited, since grepping for sprintf is so simple!

macintux · on Feb 19, 2021

> Anyway surely this has been carefully audited, since grepping for sprintf is so simple!

That is rarely a safe assumption.

henning · on Feb 19, 2021

Are Flask/Django APIs that take floats as parameters vulnerable to this under any circumstances? Any POC code that can be used for testing?

fulafel · on Feb 19, 2021

If it eventually (or a transitive dependency) passes it to the ctypes module.

_dh54 · on Feb 19, 2021

Extremely misleading title. This only affects code that puts untrusted floats into ctypes. An relatively very uncommon case. Please stop the clickbait!

xtanx · on Feb 19, 2021

Why the downvotes? CyberRabbi is correct. The title makes it sound like all programs that accept floats from untrusted input are vulnerable.

_dh54 · on Feb 19, 2021

In my experience here I find that it’s not uncommon for correct statements to get downvoted more than incorrect ones.

Why? I think it’s because incorrect statements can be corrected in a reply. OTOH correct statements with which one disagrees or otherwise dislikes cannot be corrected but they can be downvoted.

mhh__ · on Feb 20, 2021

Because moaning about clickbait is really boring, often correlated with people being surprised because they didn't read the article first, and once again not very useful if you read the article.

_dh54 · on Feb 20, 2021

I read the article, actually I read it in a concerned state, and then I realized there was little need for concern. Clickbait is a daily problem on this site, there’s no need to insult or attack my character for raising the issue.

mhh__ · on Feb 20, 2021

What I said has nothing to do with your character. Besides, I recognize your name so know you aren't dumb

lend000 · on Feb 18, 2021

Does this affect standard json parsing libraries?

dividuum · on Feb 18, 2021

No guarantee, but from what I understand the issue is with objects created by ctypes parsing only. I doubt that any json parser would use that:

    >>> import ctypes;x = ctypes.c_double.from_param(1e300);print(type(x))
    <class 'CArgObject'>

When repr'd, an sprintf call with a statically sized buffer of 256 bytes is used to produce a string:

   https://github.com/python/cpython/commit/d9b8f138b7df3b455b54653ca59f491b4840d6fa#diff-4e23b3237d0aa08bf4c434d75fab19200a80837bd147051fefccd98b7f2480faL500-L507

With 1e300, the resulting string doesn't fit into 256 bytes, thus overflowing the buffer. Exploitation might be interesting, as you (probably?) can only use numbers (ascii range 0x30-0x39):

    >>> len("%f" % 1e300)
    308

chubot · on Feb 18, 2021

I am not sure, but since the files affected in are ctypes, I think it can only affect applications with bindings that are written with ctypes.

As far as I know, ALL bindings in CPython are written with the C APIs and not ctypes, including the JSON library. (I can't guarantee that but it shouldn't be that hard to audit.)

The official page is not any more clear on this:

https://python-security.readthedocs.io/vuln/ctypes-buffer-ov...

I use one ctypes binding for convenience (a single function for CommonMark) but I prefer to use C APIs for this reason ... it's a lot of complexity and unsafety. Using ctypes wrong can crash your process due to invoking undefined behavior.

It would be nice to have a list of bindings created with ctypes, since I think most consumers probably have no idea. I thought that PyOpenGL is done with ctypes but don't quote me on that ...

duckerude · on Feb 18, 2021

With a quick grep I find this:

- `uuid` used ctypes until 3.9 to get information like the IP address (without floats, so it's not vulnerable)

- `platform` uses ctypes in 2.7 to get the Windows version (without floats)

- `multiprocessing` uses ctypes for interoperability with ctypes

That's everything, on the versions I checked. `json` for example uses a native Python module `_json`, so it doesn't use ctypes.

AaronFriel · on Feb 19, 2021

Django's GIS extensions use c_double, that's a large surface area.

mleonhard · on Feb 19, 2021

We need to delete 'sprintf' forever.

josefx · on Feb 19, 2021

Deleting sprintf by itself wouldn't work, even the "safe" replacement functions rely on the developer providing the safety information manually. C needs a string type. Sadly large parts of the C standard are build around turning plain character arrays into exploits by pretending that they are a sane choice for string manipulation.

benibela · on Feb 19, 2021

That is why I use Pascal and never had a safety problem when processing strings. It is completely baffling that people still use C in production

Only issue is that the default string type requires heap allocations

cozzyd · on Feb 19, 2021

asprintf is fine for most cases. For the other cases you probably don't want to dynamically allocate memory so snprintf with a fixed buffer size is fine.

rurban · on Feb 21, 2021

Nope, there's still %n in snprintf. snprintf_s is better, but still a security nightmare.

cozzyd · on Feb 22, 2021

Sure, but you can just not use %n and not accept format strings from users?

userbinator · on Feb 19, 2021

You would think converting floats to and from strings was a solved problem by now...

I'm surprised noone ever looked at that code --- even when writing it --- and thought "how long can a %f get?" I've been writing C for a long time and that's just something which comes naturally, being ingrained into memory since the beginning. If I see a fixed-size buffer I will always question whether it's big enough (and also if it's perhaps even too big.)

The "psychology" around buffer overflows has always seemed strange to me; a real-world analogy is someone who has no idea how big his car is, finds a parking spot that "looks big enough", and just rams it in without a second thought, sometimes crashing into the surroundings. Not many people would do that in the real world. Yet countless programmers seemingly can't get something simple like this right?

Edit: downvoters, care to state your case?

Someone · on Feb 19, 2021

“You would think converting floats to and from strings was a solved problem by now”

It is, but it is an extremely difficult problem. I think “How to print floating-point numbers accurately” (https://dl.acm.org/doi/10.1145/93548.93559) was the first correct implementation. That is from 1990 and, according to its authors “was almost 20 years in the making” (http://kurtstephens.com/files/p372-steele.pdf)

(Faster versions have since been published)

userbinator · on Feb 19, 2021

...or at least knowing how long the result would be?

Someone · on Feb 19, 2021

That, of course, is easier, but you have to realize that %f never uses scientific notation for its output (something I must have known at some time, but if you had asked, I would have had to google it), and that DBL_MAX has a lot of digits (it is about 1.8 × 10^308)

I think it is unfortunately that %f, the easiest to remember type field for floating point number, is used for this behavior. Most of the time, you’ll want to use %g, which uses scientific notation if it is shorter.

I guess linters should warn about bare %f without length fields.

yannoninator · on Feb 19, 2021

Didn't downvote, but I agree, nobody wants to work on the low level stacks anymore, not as sexy as CRUD apps.

I would have expected this issue to be a solved problem too, and now we get extremely insecure Python 3 codebases as a result of this vuln.

So much for moving from Python 2 to 3, will now wait for Python 4.