Hacker News new | past | comments | ask | show | jobs | submit login
Exploiting Misuse of Python's “Pickle” (nelhage.com)
61 points by jahan on June 5, 2016 | hide | past | favorite | 34 comments



I'd argue that any use of pickle is a misuse of pickle. Language-specific binary serialization formats should not exist, much less be used.


While designed for Python, pickle is not language specific, there are implementations for .NET and Java that work well, for example: https://github.com/irmen/Pyrolite


What do you think of ``eval``?

I'll go ahead and keep using pickle. It makes me more productive, despite never unpickling untrusted data.

By the way, reading XML is a security risk, too...


> What do you think of ``eval``?

Never ever.


I guess you can never make a REPL then.

Or namedtuple, one of my favorite tools.


Pickle is used in Python's multiprocessing library to communicate with other Python processes over a pipe. Seems sensible to me.


The default pickle format is not binary, though it's about as human-readable as sendmail.cf:

  >>> cPickle.dump(dict(foo=42), sys.stdout)
  (dp1
  S'foo'
  p2
  I42
  s.


What should I rely on if I want to serialize a function ?


Mu. Unask the question. Serializing a function is itself the security hole, because who knows what that function does? A function is just as possibly malicious code as whatever else you think you're using it for.


Being connected to the internet is a security hole. I'll balance security with practicality.


The GP's point is if you're doing code deserialization, the definition of security is different. The data format having RCE bugs won't be as much of a concern, while trusting the data source will be much more of a concern.


We redefine security for every project. Some projects can (de)serialize code and be secure. Others can't.


If someone is able to modify my local files and wants to change the code I'm running, what is it about pickle that makes me vulnerable? They'd already be in a position where they could change my actual code.


Why do you think the attacker has the ability to change your local files? That's not the attack surface. The attack surface is when it's used in a wire format for network communication. The attack surface is the attacker convincing you to download a file in a format that you don't realize is executable and every file format that includes pickled objects is vulnerable. The attack surface is that you've confused data and code; There's a reason why OpenBSD now enforces W^X, and that is that there's very different domains of trust. I'd trust my neighbor down the street to send me a spreadsheet, I'd not trust him to send me a program (which is precisely why Office Macro viruses were such a problem, because speadsheets could contain programs.)

No, pickling and unpickling files in your user's data-dir isn't a big deal, until it is because your users wanted to share their data online. They'll sync their data-folder with Dropbox, and then their account gets compromised, and the next time they launch your program they've got a virus. They'll download a 'completed savegame' from a sketchy site, and now they've got a virus. They'll get a phishing e-mail, but it's not one of those zips or exes or whatever, it's a file-format they know is your program, and surely that's safe, right?

Don't make your file formats insecure. Don't pickle.


My rather poorly made point is that whether it's a security problem or not depends on what you're using pickle for. My typical use case is simply as a local cache for some slow computation in ad-hoc scripts. Being able to simply dump objects and load them back again without needing to write serialisation code is a great timesaver, and the only ways I can see that causing a security problem rely on an attacker already having significantly more access to begin with.

I don't like people making such strong statements about what others should and should not do, based on issues in some situations.


You can call functions across processes by using RPC libraries.


Cool, but not even slightly relevant.


Is your criticism about the fact that pickle is language-specific or about it being a binary format?


(not the person you're replying to, but) my criticism of python pickle is that it allows the deserialized thing to refer to objects outside of a restricted environment (in this case, subprocess.Popen). JSON deserialization can't refer to console.log or any other object.

Unless you're storing an HMAC alongside your pickled blob, and verifying that HMAC on the deserialization side, you should not trust pickle for anything. (Even then, it's still potentially dangerous, because in a large project some other dev who doesn't understand the issue could come along and write a new thing to deserialize the same pickles and not check the HMAC).


> Unless you're storing an HMAC alongside your pickled blob, and verifying that HMAC on the deserialization side, you should not trust pickle for anything.

I dislike such strong statements. When I use pickle, it's often a dump of where a computation is up to, or a simple cache. How can that be exploited? Someone would need to be meddling on my filesystem, in which case I'm already screwed. What risks am I opening myself up to? Why should I not trust it for my use case?


It depends on the kind of groups/organizations you write code in. In some, I think such strong statements are necessary.

You can mitigate the dangers of pickle by educating junior devs or people new to python, by having a strong culture of code reviews with experienced reviewers, by having a style guide that explicitly prohibits pickle except in exceptional cases.

... Or you can just not use it in the first place and make it trivial for newcomers to your codebase to use safer serialization methods.


Well, specifically, never use pickle for communication. It's perfectly reasonable as an on-disk storage format for some things.


If you want a "secure pickle" you should corrupt it on creation so that it needs a to uncorrupted to be read. preferably with the HMAC.


No, just switch from pickle to json.


But how would you get arbitrary code execution? Pickle does solve a lot of problems, but it creates so many that I think it should be removed from the battery pack. You can't make HMAC integrity checks optional, and you make the mandatory by breaking the payload.


I use it save my shell session between reboots. Very useful, yet no security concern. There is always a niche for any tech.


Interesting. How are you doing it? Do you do objects individually?


Yes. Basically I got a python startup file opening a store object at the begining of the session. If you set an attribute to the object, it's saved in the store, and if you read it, it loads it.


Cool. I'd love a way to completely persist a session to disk and reinstate it later, tho'. In other languages too http://stackoverflow.com/q/3966925/447514 - but only languages like Smalltalk or Forth with the concept of an "image" really do what I want.


OS level support for checkpointing is also a thing that exists: Eg https://criu.org/


I think we will come to that in the future, but the OS will handle that instead of the language.

Of course, one can argue you can already to it with docker :)


The future as in, say, 1979? https://en.wikipedia.org/wiki/System/38


Well you could do it for decades with VMs but that's pretty heavyweight ;-)


If you want a more efficient version of pickle, take a look at feather [1]

It's not a drop in replacement but it's very effective

1 - https://github.com/wesm/feather




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: