Hacker News new | past | comments | ask | show | jobs | submit login
Oracle plans to dump risky Java serialization (infoworld.com)
260 points by s-macke on May 27, 2018 | hide | past | favorite | 155 comments



I applaud the general decision, but I wonder what will happen to existing blobs of serialized data. Will there be any migration tools provided?

Apart from the horrible security, what annoyed me the most with serialization is the lack of control you have over the process. There doesn't seem to be a way to access serialized data as a simple parse tree or record sequence - you have to construct objects of the actual classes. If only one class is not available or has breaking changes, there is no (built-in) way to access anything inside the blob.

This is particularly fun if you want to refactor things. Suddenly package names, class names and names of private fields (!) are part of your public interface.

So if we could drop reflection/unsafe-based serialization and instead just got a simple parser/writer for java's binary object graph format, I'd be very happy.


There's really no such thing as a "binary object graph format." Objects are not data. Code is not data. It's why serializing objects is so very complicated and dangerous. Serialized "data" is a program that can do everything a normal Java program can do. In the 90s we called this feature "mobile code" and thought it might be the future of distributed computing. Today we call it a "code injection attack" and recognize how enormously dangerous it is.

If you want data then define a schema and write/read your data. Serialization should really never be used and certainly not for long-lived data storage.

There is one place where serialization becomes useful and that is for storing objects off-heap, IPC, and for short-term storage (ie snapshots like Android's parcelable). For these cases I'd like to see the JVM embrace not just immutable value objects but full-fledged structs that have a well-defined memory layout. You can do this today using off-heap Buffers and interfaces but language support is always good so there's a universal standard that everybody can build upon. Once that's in place there'd be no need to ever use serialization.

That said I can't imagine Oracle will simply remove support object serialization. It may be kicked out of the "core" JDK and become an optional module. The classes may be deprecated. But the functionality likely isn't going anywhere in the next ten years.

Even if they did remove it nobody should be using the standard object serialization anyways. If you're going to use serialization (and you shouldn't) then you should absolutely be using FST [0].

[0] https://github.com/RuedigerMoeller/fast-serialization


BTW, contrary to other comments this has nothing to do with the binary nature of serialization. The problem is that serialized "data" are programs. They're code. For some really scary stuff see the XMLDecoder [0] API which is XML-based " Java-bean serialization" but actually enables arbitrary code execution.

Note that even when it's not an explicit feature serialization libraries have a tendency to become unexpectedly Turing-complete [1]. There are many, many systems out there that are vulnerable to "surprise Turing" attacks.

The only real defense against this is to have well-documented schemas in place and to thoroughly validate all incoming data. Ironically all the guys churning out schema-less JSON microservices think they're okay. They're not. As they say the truth is even worse than it appears...

[0] http://blog.diniscruz.com/2013/08/using-xmldecoder-to-execut...

[1] https://medium.com/@cowtowncoder/on-jackson-cves-dont-panic-...


> In the 90s we called this feature "mobile code" and thought it might be the future of distributed computing. Today we call it a "code injection attack" and recognize how enormously dangerous it is.

This sentence neatly describes one of the major ways things have changed since the 90s. Computing is now a war zone, rendering many of the elegant and beautiful distributed systems concepts discussed back then off the table. Instead we have walled gardens, closed platforms, closed systems, and closed pretty much everything. Anything open is immediately spammed and exploited to death.


There's no such thing as "data". Every piece of "data" eventually gets interpreted, thus becoming a "command" in some level of abstraction. The core of every security vulnerability is the belief that "this is just a piece of data".



> There's no such thing as "data".

In the common vernacular,

data = non-Turning complete commands

code = Turning complete commands


> Objects are not data

this is not true. the jvm may choose to make serialization more complicated than it needs to be, but an object is nothing more than a collection of locations in memory

what's fundamentally wrong is sun/oracle's insistence on running constructors, not serialization itself


> Code is not data

Lisp would, of course, disagree. :-)


Lisp recognizes that the difference between code and data is the context they're interpreted in. A more accurate phrasing of "code is not data" should be, "data from untrusted sources should not be interpreted as code".


(import (ice-9 match))

(define data '(display code))

(define code (let ((pe primitive-eval)) (match data ((fun arg ...) (lambda () (apply (pe fun) arg))))))

(code)


> serializing objects is so very complicated and dangerous. Serialized "data" is a program that can do everything a normal Java program can do.

As I recall, the default behavior of Java (de)serializer, would only write the data fields one by one in binary format. The code itself, i.e. the bytecode of the methods and the class definition itself have to be in the classpath at application startup, so how could this malign code be injected as part of the attack?


In the face of an attack, the serializer framework unexpectedly invokes arbitrary existing methods/constructors on untrusted data, including malformed private object states. So, if the attacker can trick one class anywhere on the classpath into doing something bad, then they can bootstrap an attack.

It is similar to “return oriented programming”, which is one way to escalate c stack overflows to arbitrary code execution.


The IBM J9 JVM comes with a library called PackedObjects, ie objects whose memory can be off-heap too.


Project Valhalla had more community support than PackedObjects. PackedObjects was unfortunately removed from later versions of J9.


In fact the Java Serialization Stream format (contrary to most other serialization formats) is designed such that it has well-defined grammar and you can in theory build parser that creates parse trees given only the definitions of classes involved without running any code from these classes. (ie. the parts that depend on behavior of writeObject()/readObject() are delimited inside the stream and you only need to know whether class implements such methods)

Many other language-specific serialization mechanisms just write arbitrary binary data into stream which can then only be parsed by ad-hoc recursive descent parser implemented by equivalent of java's readObject().


There are so many great alternatives available. It's a no-brainer. I applaud.

[edit] so why the downvote? We have YAML, JSON, protobuf, Thrift, Avro. Yup, these serialize "contents" rather than "structure + contents" but one gets interop with other technologies for free. Every tech mentioned above is so simple to use that removing Java serialization is a no-brainer.


I suggest you look at the myriad security problems caused by Ruby's handling of YAML and JSON.

It's not necessarily the serialization format at fault, but rather developer assumptions.


(Here's a post describing some of the problems back in the day with Ruby's YAML and JSON: https://williamedwardscoder.tumblr.com/post/43394068341/ruby...

YAML is of course a format that is designed to instantiate objects as described in the source, rather than as checked by the destination, so I wouldn't want the world to adopt YAML instead of Java serialization.)


Yaml is primarily used as hierarchical config format because JSON and XML are too verbose for a human editor.


Exactly. The people loading up a YAML library and using it that way may have no idea its a potential remote code execution vector into their application.


Luckily, for those that do know what they're doing, it's simple enough to use the yaml.safe_load function that disables support for arbitrary object instantiation.


I believe that the safe_load function only got added because of the spate of exploits that the blog post was talking about.

Its good that its been added, although it would have been better if it'd been there from the beginning and the safe version was default and you had to invoke 'unsafe_load' if you wanted complex object instantiation, to hopefully encourage even novices to think twice before doing it with tainted input.


Archived copy that can be read without JS enabled:

https://archive.is/2JGiJ


> Only it then turns out that the Ruby JSON parser … yes, you’ve guessed it … the Ruby JSON parser can instantiate complex objects too.

Huh. Where can I read more about this?


Read through it, but i wont blame language creators. Its a lot easier to point out that something is broken than to build something better.


And it's a lot easier to build a new broken thing, that to fix something broken. Which is what the creators of YAML did.


Regarding YAML, it's not just Ruby:

* Python's yaml.load() hapilly executes arbitrary code.

* Some Perl YAML libraries deserialize objects by default:

http://blogs.perl.org/users/tinita/2018/02/safely-load-untru...


Python's YAML libraries come with a safe_load function that disables !! commands, so it can be used without loading arbitrary objects.


And they warn quite clearly about that. Though in hindsight, having load and unsafe_load would have been safer.


> And they warn quite clearly about that.

Do they?

https://github.com/yaml/pyyaml doesn't even mention safe_load().

And this is the docstring for yaml.load():

  Parse the first YAML document in a stream
  and produce the corresponding Python object.
This doesn't sound very scary, and certainly doesn't imply possible code execution.


While using a general data transport format instead of Java object serialization is great for most applications it cannot replace all usecases.

JVM based distributed grid computing services use Java object serialization to distribute queries throughout the grid. This allows users to write their queries as arbitrary Java code which will be executed on every node without having to deploy the code on every node.


How does that work? You describe a situation in which code that does not exist on a given node is executed on that node nonetheless.


You write a query that is wrapped inside a serializable object. The object with your code inside is serialized using Java object serialization and distributed to every node. Each node is now able to execute your query based on Java code.

https://ignite.apache.org/features/computegrid.html


This isn't a native feature of Java serialization. Class implementations are not included in the serialized data.

> How do closures get shipped around?

> Every closure is an object of a particular class. When the closure is being sent it gets serialised to a binary form, send over the wire to a remote node and deserialised there. The remote node should have the closure's class in its classpath or enable peerClassLoading in order to load the class from the sender side.

https://apacheignite.readme.io/docs/faq


Don’t forget Kryo[1]. Also Avrò messages can contain the schema.

[1]: https://github.com/EsotericSoftware/kryo


Yes! But Kryo docs explicitly mentions some things may have to be handled via Java serialization.


That would also kill RMI (at least JRMP), and that too would be a good thing, RMI doesn't even pass through a router (unless you do RMI-IIOP - and that is very different from regular RMI - JRMP)

For java backwards compatibility used to be very important, so this is big news for the platform.

I think this remoting the bytecode with serialisation madness was once upon a time very important part of RMI/serialisation - back in the thin client java days this was supposed to be the way to distribute code across a network link, security was not the very first priority in the nineties (beats me why they made JRMP non routable)


If you mean that JRMP leaks internal IP addresses in the protocol itself for callbacks and thus can't be NATted, that can be fixed with a couple properties to force the use of outer-IP DNS:

  java.rmi.server.hostname=myhostname.com
  java.rmi.server.useLocalHostname=true
You also can tunnel JRMP through HTTP - there was a CGI script called java-rmi dating back to the late 1990's that I think was still distributed through Java 8 (!), and also an RMI Servlet Handler which was a bit more robust/performant. Spring also still has the RmiServiceExporter and HttpInvokerServiceExporter.

I remember building Java applets and servers that did fixed income quotes & bond trading systems via streamed encrypted serialized Java objects circa 1999-2000. What a security nightmare, but no one knew better.

I feel old.


if i remember correctly then you still you need to open a port in the router so that the server can call back the client - now good luck with persuading anybody to do this kind of insanity.


JMX uses RMI underneath, so that would have to be fixed first before they can remove RMI


JMX protocols are pluggable, so this likely will be doable. The main issue will be updating all the ancient servers running Java < v7.


Does not pass through firewall/NAT to be more specific


Alternatives or not, you open that 'risky' door at the moment you have to read the payload to know what type of data it is. Once it's read, it's already too late.

Java serialization has not many ways around it, you have to trust the sources.

XML, JSON, ... used naively with reflection (e.g. new XStream().fromXML(...)) exposes the exact same issues. Custom homemade parsers are also very likely somehow vulnerable.

The deserialization attack is a way to make the deserializer exploits vulnerable classes that exist in your classpath. It's not generating or executing malicious code by itself.

The standard classes should be ok, or at least fixed quickly.

Things like commons-collection will likely never be fixed. It might be considered as a feature from some point of view.

Check this out: https://github.com/frohoff/ysoserial

My 2 cents: - Secure your sources - Know you format, do not rely on reflection to parse text or binary data - Watch out with your classpath, but you can never know what new vulnerability will pop next weeks


In case anybody is wondering, this is the attack vector that was used for the equifax hack and is also used in a fair number of remote code execution exploits for java servers.


Through the fact that Struts did not implement things correctly and equifax did not upgrade their systems. It also sounds like it was multiple vulnerabilities. Like this one as well: https://www.cvedetails.com/cve/CVE-2017-5638/


Having used java serialization a few times for POC level work, I'll be sorry to see it go.

I wish they would just rename it something sufficiently ominous sounding that people wouldn't think about using it on untrusted data sources.

Maybe AribitraryCodeAndDataSerialization


Might catch some cases, but just from a couple of weeks ago, by an org that supposedly puts security very high up the priority list we had this: https://news.ycombinator.com/item?id=17096022 (Signal's security hole created by using dangerouslySetInnerHTML)


Yes it was a nice/quick/dirty way to persist an object graph, I bet it's been abused richly in a lot of codebases out there, it'll be curious to see how this is handled in terms of breakage i.e. the JDK that introduces this might be the Python 2/3 showdown (if that's still a thing).

Be curious to do a Github wide grep for ObjectOutputStream or something similar and see what it's like in open source land.


This is made more exciting by the fact that Java rarely introduced big breaking changes in the past. Especially deprecated classes (java.util.Date...) never were actually removed. Starting this now is going to wreak havoc.


We're already starting to see the effects in Java 9. Our team attempted to use it on our current project and many of our dependencies broke because they were using deprecated parts of the API that were removed. We ended up having to drop back to Java 8.


Speaking of Python, pickle is `eval` soooo be a little careful with that one.


Coming from a Javascript context, would this be something roughly analogous to expecting JSON.parse and actually getting eval?


Pretty much. It's slightly less terrible than that -- the deserializer doesn't evaluate arbitrary Java expressions, but it can call no-argument constructors, run deserializer methods, and set field values (including private fields) on any loaded class that's marked Serializable.

As you can imagine, any sufficiently large codebase is likely to have a million different ways this can be leveraged to get arbitrary code execution.


> Maybe AribitraryCodeAndDataSerialization

Wait, what? Java serialization does not serialize and deserialize code. The only thing encoding behavior when serializing is class names. The receiving system needs to know those classes/be able to load them.


... unless you override Java's default serialization behavior and build a whole lot of black magic.

Apparently there are some vulnerabilities in the implementation, but they are not an inherent part of it as some people seem to think.


> unless you override Java's default serialization behavior and build a whole lot of black magic.

...but then, if the developer is writing custom deserialization code that does black magic (like, executing the equivalent of eval() on a content of a String field), any serialized format is affected, be it binary, JSON, YAML, etc


Does anybody have a source for this? The publication doesn't doesn't exactly have the best reputation.

A lot of things are currently built on top of serialization, from JMX to almost everything in Java EE including servlet sessions.


I was trying to understand if C# has the same issue. From what I can tell, as long as you use the default serialization, it seems to be safe. But I can't really tell.

https://www.alphabot.com/security/blog/2017/net/How-to-confi...


The biggest cause of vulnerabilities in Java serialization is that the class is part of the serialization format, so an attacker can cause the serialization to produce classes that you aren't expecting.

Json.NET seems to allow the same behavior, but has it disabled by default.

> In fact the only kind that is not vulnerable is the default: TypeNameHandling.None


I'm rusty on C#, but I believe the equivalent would probably be [Serializable] types being read from a Stream using a BinaryFormatter or SoapFormatter. A malicious stream could include any known types in the system marked as [Serializable], and as they are deserialized any associated static constructor/no argument constructor/property setters could be called.

In the JSON case linked, I presume there is a root type given when you are attempting to deserialize a document. However, if one of the properties of that type is ambiguous (say System.Object), and the deserialization algorithm looks for a 'type' property in the JSON with a class name to determine what is instantiated, then there can be all sorts of unintentional types that might be built by the processing of that malicious JSON.


They moved away from that. Many newer parts of the .NET framework (e.g. SOAP 1.2 implementation in WCF) use data contract serializers by default. With them, a complete list of known types must be provided to deserializer, it's typically done with [KnownType] attribute.


Yes it’s just as vulnerable (example [1]), but I think .net serialization is exposed less often to untrusted inputs than Java with its myriad of enterprise software.

[1] https://googleprojectzero.blogspot.co.uk/2017/04/exploiting-...

Full disclosure, I’m the author of that blog post.


There is a dangerous form https://docs.microsoft.com/en-us/dotnet/standard/serializati...

> Binary serialization can be dangerous. Never deserialize data from an untrusted source and never round-trip serialized data to systems not under your control.

The killed it in .NET Core 1.x; but brought it back in .NET Core 2.x due to back compat and interop complaints


> The killed it in .NET Core 1.x; but brought it back in .NET Core 2.x due to back compat and interop complaints

I've been using .NET binary serialization for dirt cheap local snapshots for the purpose of undo/redo. It's perfect for this.


There are plenty of other options though, such as JSON, MessagePack, Protocol Buffers etc. And BinaryFormatter has always performed poorly against 3rd party libs, especially other binary formatters such MessagePack


Do any of these options support proper handling of cycles in the object graph out of the box?


Yes.

Using the Newtonsoft library you can configure it using `JsonSerializerSettings.ReferenceLoopHandling`, and with with protobuf-net you set `AsReference` on your class's `ProtoContractAttribute`. I don't think MessagePack-CSharp or msgpack-cli support cyclic references tho.


This is the entire point of NSSecureCoding as well. You must provide the list of allowed types and any unexpected ones trigger errors.


It's less useful now than it used to be. In an age of ubiquitous JSON there's much less need for sending Java objects over the wire.


Schema migration in vanilla Java Serialization is a pita. It's absolutely an anti-pattern to use it at all, as far as I'm concerned. So this is all good news.


And even more seamless...protobuf.


Blushing because this is the first I've heard of it - I owe you one :)


Protos are amazing and have so many advantages over json. You'll love them!


Wouldn't they be a pain to debug, given that they are binary? I wouldn't want to use a hex editor to verify if my data serializes properly.


I am not sure, exactly what do you mean by "verifying" that your data serializes properly?

Do you mean that you don't trust the protobuf implementation itself to write the correct bytes, or are you worried you may have written the proto file wrong?

If it's the first case, protobuf is a widely used format that has been rigorously tested in the field. Provided you use it for a major language, you should be fine.

If it's the second case - could you not just serialize and then deserialize, and check that the objects that pop out again are the same as the ones that you sent in?


There is a robust though mostly undocumented text format.

I've even seen Google projects use text protos for config files.

Or to decode binary files, use the protoc tool.


Where have you been hiding.


I’m beginning to think there really is an xkcd for everything: https://www.xkcd.com/1053/

(Everything that tends to come up on HB, anyway)


I meant it in a friendly way, knowing that there is a tonne of shit I don't know. But ASCII can't convey that. Maybe Unicode can. Should of put a smiley.


I would hate to see serialization go completely. It has its uses. Maybe Java should copy the attributes that the C# data contract serializer uses to mark the subset of classes that may appear in serialized form. It coukd be retrofitted and would be a less drastic change at the same time.


Ugh, don't remind me about the data contract serializer. Silent failure for unexpected ordering of XML elements is boneheaded default behavior.


Java has had the "transient" keyword for that since 1.0.


It also didn't have serialization in 1.0 neatly avoiding the entire problem.


You're right, it was introduced in 1.1 (February 1997)


Ahh, the old "TempleOS is immune to <insert latest vulnerability" defense


Is it really going away, though? The article mentions that it's just being replaced with a safer implementation.


I may be wrong, but to me it seems like they will replace the current implementation with something fundamentally incompatible. My impression is that dependencies like RMI will go away, too. But that would be a huge breaking change.


Sure it's going to be incompatible and break old code. The title, however, makes it seem like Oracle is getting rid of serialization when it's just replacing it with a safer method.


Just for the sake of arguing semantics: "Just replacing" is incorrect in my view. The new implementation is designed specifically to not be one. It is a different mechanism mit a different interface and a different design goal.


Can someone briefly explain the problems and how general they are to other serialization interfaces?


The problems is, any class (that's serializable) that is on the target machine's classpath, can be used as a serializable object. So if i want to screw with a server, i can create a payload with any of those classes, and send it to the server. The server will load that class when it sees it in the payload, and will execute code any code in the classes static initializer, before trying to instantiate an object. All of this happens before the host application regains control of the serialization, and realizes that the objects it's deserializing is not what it expects to be sent. So there have been a bunch of problems where classes no one expected to be used in serialization to be exploited because it's really simple for a class to marked serializable thru inheritance (be it class or interface).


> create a payload with any of those classes, and send it to the server.

But on a deeper level, if a programmer doesn't know anything about security, then this sort of hole will continue to happen even if java serialization is disabled (just a bit harder to screw up). I m not that big of a fan of making security decisions without programmer's input, since you'd assume a professional programmer should know better anyway.


The security hole is in the design of java serialization. Any class marked serializable (in your code, your included libraries, or the JVM itself) is a potential security vulnerability as soon as you use this feature. Mark estimated that over half of JRE/JDK vulnerabilities have been due to Serialization.

Recent releases added an opt-in feature to filter which classes are allowed to be deserialized, but there's still a horrible amount of open, unauthenticated network ports that take in serialized java.

The programmer trying to get the same failures in a post-serialization world would presumably have to find or build a new system with the same design issues.


> Any class marked serializable (in [...] included libraries [...])

This is the big one: as soon as you deserialize incoming data, any library on the classpath becomes a potential source of remote-callable snippets. And when a vulnerability resides in a library, exploits will tend to be compatible across applications, which makes them far more likely to actually hit you than any custom weaknesses.


From the article it sounds like you’ll only be able to use the replacement on Java data classes, sort of like JSON structures or C structs.

Since there won’t be any code that gets executed, that particular vector is closed.

Of course if the code reading that object does something stupid…


Programmers should be more careful but then again footguns exist (or in this case legcannons). You shouldn't trust any 3rd party data before validation but how do you validate it if you can't even deserialize it to see what it contains? It's silly.


I can see how it's potentially a problem, but I still don't understand where the vulnerability occurs. The static initializers will be run before the data (which the attacker controls) is inserted into those instances. Yet the attacker has to be able to somehow "execute" code via the data he sends.

There has to be a few specific classes that are common, and that have methods that the attacker can expect will be run (such as toString conversion, comparison etc), where the attacker can control the data used.

E.g. if he knows that serializing a certain collection type containing Font objects will use some platform native code that reads the font data, he can then pass a corrupt font in and fool the deserializer into an out of bounds read. Or something like that. I vaguely remember hearing about one of these attacks and I can't find it. It would be interesting to hear about some real world attacks.


> and how general they are to other serialization interfaces?

This class of problems was very common when that serialization appeared in Java. It was possible to inject code into pretty much anything, e-mails (1), office documents (2), web pages. Since that time, most developers / managers have learned their lesson and started to pay attention / allocate resources so modern software/protocols/formats tend to be more secure, at least on average.

(1) https://en.wikipedia.org/wiki/ILOVEYOU

(2) https://en.wikipedia.org/wiki/Melissa_(computer_virus)


Except that (1) was completely normal executable program (albeit in not well-known "format", which windows has many) and was not in any way executed automatically (and has nothing to with Java). And (2) is VBA macro virus which again has nothing to do with Java.


The comment I replied to specifically asked about other serialization interfaces.

Both that VBS, and DOC/XLS, contain some code inside. But users though they are just data files, so they opened these things, running the code.

Also, old MS office files were not limited to embedded VBA. They are OLE compound files, so a specially crafted file can create and run ActiveX objects (often implemented as Win32 DLLs registered under HKEY_CLASSES_ROOT key of the registry) installed on user’s system. This is very similar to the way Java serializer creates and runs objects when desterilizing data received from untrusted source.


Here's an article which explains in detail how pickles/serialization work in Python, and how pickles can be constructed to evaluate malicious code when they are deserialized. That's specifically about Python, but the same issue exists in many other languages.

[1] - https://intoli.com/blog/dangerous-pickles/


Many runtimes include object serialization capabilities, which allow serialization of essentially arbitrary object structures with little code. Unpacking that kind of structure always means that you're constructing object instances of potentially any object you can construct, which generally means you can run arbitrary code.

Examples: Java Serialization, Python marshal and pickle, Ruby marshal, Perl Data::Dumper.


So the problems don't apply to, say, rust because rust doesn't run arbitrary initialization when building structures?


This is a key point. The problem is significantly worse because of Java's "everything is an object" philosophy. In languages where data is just data and not a combination of data and behavior, you don't get this kind of problem.


You also don't get this behavior if you restrict the scope of serialization. The problem here was that Java wanted to make virtually any object potentially serializable. This required all sorts of deep hacks that circumvent various language guarantees and open the door to vulnerabilities (as well as encumber new features, as serialization could interact with anything). A new serialization mechanism would be restricted to classes for which serialization makes sense, basically classes that, as you say, represent data rather than data with complex behavior.



Oracle has received many reports are received about application servers running on the network with unprotected ports taking serialization streams

I’m just not seeing how this is a problem with the language. Leave anything on an unprotected port taking unsanitised input and it’s vulnerable no matter what it is written in.


The problem is that the API pretends that it already sanitizes user input to a degree. (from the design perspective, not necessarily documentation)

Sanitizing the input yourself means not using it. You can't sanitize it.


> Leave anything on an unprotected port taking unsanitised input and it’s vulnerable no matter what it is written in

I don’t think you’re comparing apples with apples.

Even in a default configuration, the majority of services that show up on a network don’t give you instant code exec. Maybe tomcat servers with default passwords, and a few other things.

But (de)serialization isn’t securable at all. You can’t add auth, you can’t WAF it, you can’t fix the underlying vulnerabilities.

There is absolutely nothing which has that level of vulnerability and lack of security on a modern network.


There is absolutely nothing which has that level of vulnerability and lack of security on a modern network.

OK, let's say I overwrite part of your system with something evil, knowing that your app will Class.forName() it. Is that a problem with the ClassLoader or is the problem that your perimeter is already compromised anyway?


I don’t think it matters, at all.

Java has serialization, and everyone uses it, and it’s a security nightmare.

As uncomfortable as it may feel to me, I kinda agree with Oracles approach here: kill serialization, make things more secure. If they have to change core language/libraries, so be it.

You were trying to say that anything left on a network socket will be compromised, and that’s simply not true. Most software is pretty solid. Anything in C will have more than it’s fair share of upcoming patches, but unlike Java serialization, you can’t fix a single lib to fix 99% of bugs.. or we would’ve surely done that.


If i understand correctly, the problem of Java deserialization is 1) gadgets for code execution and maybe 2) DoS. And this is bad because even if it looks secure now, a gadget could be discovered later.

The default Java serialization is one of the easiest way to serialize instance of objects but there are many other ways, and many other risky ways among them.

It seems to be always the same problem: ClassLoader access. Couldn't there be a way to let the deserializers use a specific ClassLoader?

I mean some sort of (Sandboxed)ObjectInputStream that uses a specific ClassLoader defined in the JRE config. The sandboxed contexts could be defined in something like java.security, .policy, to define what it is supposed to know and when/where it is supposed to be used.


Won't this have serious implications for Spark, Hadoop, and other frameworks that distribute workloads across multiple JVM instances?


Spark uses its own serialization system which while similar to the built in serialization, is designed to give much better performance.


Only partly. Spark's RDD API which is still used quite heavily requires external serializer. Most of the time you would use Kryo, but it does not fork for objects larger than 2 GB or sometimes for custom classes that are not explicitly registered as Kryo serializable.

In these cases, the changes would break existing code. However, who knows when will Oracle decide to remove the Java serialization API. I expect it will take a few years and the situation will be different on the Spark side then.


AFAIK, Hadoop uses protocol buffers for message passing, not Java serialization.


Hadoop uses Avro actually, but the point still stands :)


Hadoop expects your keys and values to implement Writable and serialize themselves (a lot of these are actually hand-written expecting the instance to get reused for each input tuple). There's optional and fairly clumsy glue that makes Avro work in a key or value.


Last time I dived into details, the communication protocol for HDFS was serialized java objects using the built in serialization mechanism.

Edit, they switched to Protocol Buffers in 0.23

https://wiki.apache.org/hadoop/HadoopRpc


I think wildfly also uses serialization to store session state when scaling horizontally.


The JPA spec requires entity IDs to be serializable and tomcat requires session variables to be serializable too.

This is going to break a lot of code.


So to be clear most "code" shipping frameworks in jvm land use jar files. Big data systems are not using native serialization for either code or data.


That's not quite true. Both Akka and Flink use Java serialization in many key aspects.


all web applications use serialization to do session replication. Now this is safe as the both stream ends are controlled by the developer, and only if they are stupid, (instead of malicious) will there problems. Still if java serialization goes away, this will be the most widespread impacting change in the java ecosphere wrt changing how serialization works.


Using java serialization for sessions is terrible idea. It’s mildy convenient to start but bites you as soon as you want anything non-java to be able to read your session store.


In theory this can be solved by using a security policy, but I don't think most are doing anything like that for deserialization.


There are some valid uses for it, but the interface is unsafe by default. How about instead, requiring a MAC key in the serialization protocol?


You can package up the data so that there is authentication to evaluate before you hit the serialization layer, and integrity behind that authentication.

This might be your approach if say your session cookie is based on serialized Java.

(However, most people give up on this approach - java serialization is also very inefficient space-wise, and the cookie will get too big for the browser to honor)


cool. can we kill pickle next?


One of the attack vector in Java was classes storing native pointers as integer fields calling free on the above pointers in the finalizer. So the moment one can force deserialiazation of such classes one ends up with corrupted heap and trivially weponized exploits. Does pickle in Python suffer from the same problem?


The primary difference between pickle and all the other problematic serialization mechanisms that have been causing security issues is that pickle has been labeled for decades now as being unsuitable for any use with public data. No sarcasm, I'm quite serious. It hasn't been 100% successful at preventing issues, there have been a couple, but it does seem to have been enough to keep it from being the star of it's own massive disaster, even though the potential is theoretically there.

As with many of the dynamic language deserializers, such as the Ruby YAML one, for pickle it's not even an exploit or something... it's a feature of the code that it can call methods, and getting to arbitrary methods isn't that hard.


well theres this comment in the docs:

> Warning

> The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.



Thx fpr the good read


That only works because Python is an interpreter. Won't cause a thing on Java.


Deserialization in Java requires reflection, which puts you firmly in the land of "I'm basically interpreted now". Javas serialization has basically the same set of vulnerability.


The JVM is a Java Bytecode interpreter; don't be so sure you can't make it run things.


The JRE has had arbitrary code execution attacks on serialization. The leaked classes eventually invoke a class loader and instantiate your binary code as a new java class.


Does the serialization work with bytecode though? Doesn't it stream just data members, not method implementations?


Why the heck are those classes marked as `implements Serializable`?...


Possibly because a superclass or interface are marked, and it is inherited? The problem is that (excluding the new filtering mechanism) deserialization is unscoped - any class in your code, a third party library, or the core VM is fair game in an attack.


It is a marker interface. You are supposed to consciously opt-in.


Exactly, so I am not sure why wrappers to native pointers are Serializable.


Pickle is an entirely reasonable way to move data between processes when you're using Python for distributed processing.


> Serialization was a “horrible mistake”

RMI required it. Mr. Reinhold has apparently forgotten the scene in late 90s. CORBA anyone?


Would this affect network class loading?


I really hope they choose something like RDF. Which is the only serialization format I know of that handles graph and types natively.


Use Lisp, data and code is same thing.


It's funny that serialization is noted as a "horrible mistake" from 1997 but no mention of RMI, the even worse mistake. I guess Java has a lot of bad mistakes. I wish Oracle would end Java so I could move on to something else.

Or maybe I'll just move on to something else anyways as I'm really rather sick of writing these syntactically crippled lambdas. The streaming stuff is almost good.


Good news, removing serialization will irrevocably break RMI.

And yeah, it would be nice if eventually a spring cleaning could remove some of the mistakes - Enumeration, Date/Calendar, Hashtable, ClassLoader, etc.


What's the problem with ClassLoader?


Could you point me to some information on why RMI is a bad mistake? (Genuine question.)


RMI (and similar technologies like CORBA) is a “mistake” in today’s world because it makes every public method an attack surface. It also hides important details that the client really can’t not deal with, namely network issues.

I put mistake in quotes because there are situations where RMI (and Java serialization) work fine: trusted, reliable networks like cluster or grid computing.


In OOP, objects have identity and state, having a remote endpoint with both these attributes is undesirable. IIRC, RMI also encourages one to abstract over the network, which can lead to very fine-grained, brittle and possibly insecure communication. This is somewhat ironic given that Sun itself authored "The fallacies of distributed computing".




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: