While designed for Python, pickle is not language specific, there are implementations for .NET and Java that work well, for example: https://github.com/irmen/Pyrolite
Mu. Unask the question. Serializing a function is itself the security hole, because who knows what that function does? A function is just as possibly malicious code as whatever else you think you're using it for.
The GP's point is if you're doing code deserialization, the definition of security is different. The data format having RCE bugs won't be as much of a concern, while trusting the data source will be much more of a concern.
If someone is able to modify my local files and wants to change the code I'm running, what is it about pickle that makes me vulnerable? They'd already be in a position where they could change my actual code.
Why do you think the attacker has the ability to change your local files? That's not the attack surface. The attack surface is when it's used in a wire format for network communication. The attack surface is the attacker convincing you to download a file in a format that you don't realize is executable and every file format that includes pickled objects is vulnerable. The attack surface is that you've confused data and code; There's a reason why OpenBSD now enforces W^X, and that is that there's very different domains of trust. I'd trust my neighbor down the street to send me a spreadsheet, I'd not trust him to send me a program (which is precisely why Office Macro viruses were such a problem, because speadsheets could contain programs.)
No, pickling and unpickling files in your user's data-dir isn't a big deal, until it is because your users wanted to share their data online. They'll sync their data-folder with Dropbox, and then their account gets compromised, and the next time they launch your program they've got a virus. They'll download a 'completed savegame' from a sketchy site, and now they've got a virus. They'll get a phishing e-mail, but it's not one of those zips or exes or whatever, it's a file-format they know is your program, and surely that's safe, right?
Don't make your file formats insecure. Don't pickle.
My rather poorly made point is that whether it's a security problem or not depends on what you're using pickle for. My typical use case is simply as a local cache for some slow computation in ad-hoc scripts. Being able to simply dump objects and load them back again without needing to write serialisation code is a great timesaver, and the only ways I can see that causing a security problem rely on an attacker already having significantly more access to begin with.
I don't like people making such strong statements about what others should and should not do, based on issues in some situations.
(not the person you're replying to, but) my criticism of python pickle is that it allows the deserialized thing to refer to objects outside of a restricted environment (in this case, subprocess.Popen). JSON deserialization can't refer to console.log or any other object.
Unless you're storing an HMAC alongside your pickled blob, and verifying that HMAC on the deserialization side, you should not trust pickle for anything. (Even then, it's still potentially dangerous, because in a large project some other dev who doesn't understand the issue could come along and write a new thing to deserialize the same pickles and not check the HMAC).
> Unless you're storing an HMAC alongside your pickled blob, and verifying that HMAC on the deserialization side, you should not trust pickle for anything.
I dislike such strong statements. When I use pickle, it's often a dump of where a computation is up to, or a simple cache. How can that be exploited? Someone would need to be meddling on my filesystem, in which case I'm already screwed. What risks am I opening myself up to? Why should I not trust it for my use case?
It depends on the kind of groups/organizations you write code in. In some, I think such strong statements are necessary.
You can mitigate the dangers of pickle by educating junior devs or people new to python, by having a strong culture of code reviews with experienced reviewers, by having a style guide that explicitly prohibits pickle except in exceptional cases.
... Or you can just not use it in the first place and make it trivial for newcomers to your codebase to use safer serialization methods.
But how would you get arbitrary code execution? Pickle does solve a lot of problems, but it creates so many that I think it should be removed from the battery pack. You can't make HMAC integrity checks optional, and you make the mandatory by breaking the payload.
Yes. Basically I got a python startup file opening a store object at the begining of the session. If you set an attribute to the object, it's saved in the store, and if you read it, it loads it.
Cool. I'd love a way to completely persist a session to disk and reinstate it later, tho'. In other languages too http://stackoverflow.com/q/3966925/447514 - but only languages like Smalltalk or Forth with the concept of an "image" really do what I want.