Hacker News new | past | comments | ask | show | jobs | submit login
JVM Internals (2013) (jamesdbloom.com)
144 points by mohsinhijazee on April 19, 2015 | hide | past | favorite | 24 comments





I'm a CLR man myself. Just the path my career has taken me really. Does anyone know if there are equivalent documents for CLR or CLR Core?

I don't see LLVM, JVM or the CLR going anywhere. They'll evolve, get smaller, reach more devices and platforms etc but personally I'd like to appreciate the how and the why better.

It'd be useful to know if there are any reasons why I may pick one over the other in certain circumstances I appreciate how there ecosystems and culture are different but I'm curious from a purely technical perspective too.


I jump between JVM and CLR depending on the project, with some occasional C++.

This is what I am aware of on the CLR side, maybe you already know them.

There used to be the "CLR Inside Out" column at the MSDN magazine.

The "CLR via C#" book.

https://www.microsoft.com/learning/en-us/book.aspx?id=15528&...

Channel 9 Presentations

https://channel9.msdn.com/Tags/clr

https://channel9.msdn.com/Tags/clr+4

https://channel9.msdn.com/Search?term=mdil#ch9Search

The ECMA documentation also provides valuable information.


I second the 'CLR via C#' recommendation - this made a huge difference to my understanding of the language.



Well, LLVM is something else altogether, as it doesn't try to abstract away the OS, while the JVM does[1]. As to the JVM, well, first of all, there is no "the JVM". There are many JVMs by many vendors, and many types of JVMs (from smart-cards, through various embedded devices, and all the way to desktops, servers and mainframes). Some of the most interesting -- and most important -- ones are made by companies you've probably never heard of, like Aicas and Atego that make hard realtime JVMs for use in avionics, medical devices and other safety-critical systems. The best-known JVMs are HotSpot, OpenJDK's JVM (which serves as the basis for Oracle and Azul's JVMs) and J9, IBM's JVM. When many people say "the JVM" they mean the SE (i.e. "standard edition") of HotSpot.

While HotSpot and the CLR are more similar -- at least in their goals -- than different, and each is ahead of the other in some respects and behind in others, I'd say that the most interesting features of HotSpot are its JIT and GCs (it comes with a few) that are years ahead of anything else used in any other environment in the industry (well, except Azul that sells a modified HotSpot with a "pauseless" GC that never pauses the application for more than a few microseconds). In my opinion, the most exciting development regarding HotSpot is the work being done on HotSpot's next generation optimizing JIT called Graal, that comes with an a complementary language toolkit called Truffle[2]. It pushes the envelopes further with regards to HotSpot's defining feature -- speculative optimization and deoptimization -- and lets the user (if they wish) control the code generation through a clever API.

Another important feature of any compliant SE JVM is bytecode manipulation. It allows modifying libraries after they've been compiled -- either on disk or during the class loading process. One use is dependency namespacing, aka shadowing (if your program depends on libraries A and B, and they each depend on different, incompatible versions of library C, shadowing can hide the two versions of C from each other). But one of the coolest things is the ability to transform (and de-transform) bytecode of running classes (to inject, say, tracing).

Since HotSpot is part of OpenJDK, all work on it (including Graal and Truffle) is open-source (and has been for a few years now). The OpenJDK as a whole is probably the world's second largest open-source project (in terms of contributors), after the Linux kernel. The main contributors are, of course, Oracle and IBM (the latter to a much lesser degree), but also Google, Intel, Apple, Twitter and many more.

[1]: https://www.youtube.com/watch?v=uL2D3qzHtqY

[2]: https://wiki.openjdk.java.net/display/Graal/Publications+and...


> When many people say "the JVM" they mean the SE (i.e. "standard edition") of HotSpot

This changed in Java 7. OpenJDK is now the reference implementation, and thus "the JVM"


The fallacy is that those of us that deal with commercial JVMs have more to choose from than "the JVM".


OpenJDK is a lot more than just the JVM (it's also the runtime libraries). Open JDK's JVM is named HotSpot.


I think the only JVM named HotSpot is the one distributed by Oracle.


They are one and the same. Oracle's JVM is the OpenJDK's JVM, slightly modified (they mostly add monitoring capabilities that were part of the JRockit JVM), and Oracle's JDK (which contains the JVM) is OpenJDK with some minor additions.


Thank you for taking the time to explain. In the past couple of years I've noticed businesses are less beholden to one specific platform. Heterogeneous environments are becoming the norm. As a .NET consultant I've noticed a pattern of node websites, CLR (C#) domains and the JVM rich ecosystem offering systems like Elastic Search or Storm. Knitted together with something like Rest APIs and Rabbit.

I really want to appreciate more of the wider ecosystem and the tech that underpins it. I don't see the point in implementing what is proven on the JVM on the CLR.

Concentrate on the businesses business (the core domain), not plumbing. In short learning more about these techs in paramount.


Do you have, or know anyone, that has experience using a JVM in a safety critical application? I'm trying to decide if I want to use it in a medical device, or go the more conventional MISRA-C on QNX route.


The most mission critical one is probably the Aegis Ballistic Missile Defense System.

Here's Lockheed Martin announcing that they chose a (propriety) JVM to run "several critical subsystems" of the Aegis system in 2006[0].

Here's them announcing said project passing operational trials fours years later[1]. Surprisingly fast for a military project.

[0] http://www.atego.com/pressreleases/pressitem/lockheed-martin...

[1] http://www.atego.com/pressreleases/pressitem/aonix-perc-ultr...


I used a hard realtime JVM for a safety-critical missile-defense related application, but that was almost ten years ago. The JVMs to consider are IBM's WebSphere Real Time[1], Aicas JamaicaVM[2] (their front page shows examples of use in medical devices) and Atego (formerly Aonix) Perc[3]. Both Aicas and Atego focus on the embedded market (don't know about WebSphere RT).

See here for some background and some related links (http://www.javaworld.com/article/2906981/java-app-dev/little...). There's recently been a lot of activity around RTSJ 2.0 (real-time specification for Java)[4], JSR-282, led by Aicas, and JSR-302[5] -- also led by Aicas -- which is built on top of RTSJ and meant to support DO-178B certified, safety-critical applications. I see that JSR-302 is marked inactive[6], which means it hasn't had a milestone in the past year, but I think that the relevant JVMs already implement it (or something similar), even if it hasn't been ratified as a standard.

I can tell you that for us, at least 10 years ago, it was a remarkable success, and saved us a lot of time vs. developing in C or Ada, but that was for code running on large servers. I haven't had experience with RTSJ on embedded devices. Many others have, but I doubt they read HN.

You probably want to keep DSP in C, but for everything else Java may save you a lot of time and money.

[1]: http://www-03.ibm.com/software/products/en/real-time

[2]: https://www.aicas.com/

[3]: http://www.atego.com/products/atego-perc/

[4]: https://www.aicas.com/cms/en/rtsj

[5]: http://download.oracle.com/otn-pub/jcp/safety_critical-0_94-...

[6]: https://jcp.org/en/jsr/detail?id=302


Thanks so much for taking the time on a thoughtful response.

> I can tell you that for us, at least 10 years ago, it was a remarkable success, and saved us a lot of time vs. developing in C or Ada, but that was for code running on large servers.

To what do you attribute the time savings? I'm interested in using the JVM because I'm thinking I'll have an advantage in debugging over C. Would that be true, in your experience? Do you think that it was easier to develop using the JVM under your quality system than it would have been under C or Ada?

> I haven't had experience with RTSJ on embedded devices. Many others have, but I doubt they read HN.

Yeah... Do you know of a good mailing list or forum for embedded systems?


> To what do you attribute the time savings?

It's just much easier to produce correct code in Java than in C (or even Ada). The bigger and more complex the software -- the bigger the gains (it's easy to write simple/small programs in any language). We also used NASA's open-source Java PathFinder to verify parts of the code (there are model checkers for C, too, and others for Java). This paper (https://ti.arc.nasa.gov/m/groups/rse/papers/lindstrom-rtembe...) describes applying JPF to the realtime scheduling aspects of RTSJ (something we didn't do).

> Do you know of a good mailing list or forum for embedded systems?

Sadly, no, but others here might.


Thank you for reminding me of Graal. One of the most interesting projects I have ever seen, and I have just gotten a lot of free time to explore it.


My understanding, which may be wrong as I know a lot more about the HotSpot and ART JVMs than CoreCLR, is that the differences boil down to:

• CoreCLR has support for some static language features that C# has but Java lacks, like ability to embed unsafe code easily. In theory you can do that in Java too using sun.misc.Unsafe but nobody actually does.

• HotSpot has historically had a tighter and more focused approach to compilation, with profile guided optimisation being relatively more important. HotSpot always starts executing code with a (highly optimised) interpreter, to gather profile data. The .NET runtime in contrast, compiles all code immediately on first run, and relies on more traditional compiler techniques with less profile guided and speculative optimisation. They seem to have gone back and forth on different compilation strategies and haven't been able to really pick one: e.g. they started out with a simple straight-line JITC, then they rewrote it a couple of times (?), then they did NGen for ahead of time compilation based on the MSVC++ backend, and also something called RyuJIT that is (again) an immediate mode compiler without any kind of interpreted guidance. HotSpot also has multiple compilers and always has, but the general design seems more stable - interpreter for gathering data, then a quick compile by C1, and then another slower compile by C2 for really hot methods.

• As just mentioned, .NET can do ahead of time compilation when an app is installed, whereas HotSpot cannot. The ART runtime used on Android nowadays also does AOT compilation (actually a mix of AOT and JIT). It seems like AOT works better for desktop style apps where the overhead of a JIT compiler is harder to tolerate and the benefits are smaller. As .NET has had a stronger desktop focus for longer it's not really a big surprise that .NET has experimented with this more, whereas the HotSpot guys and many of the competing JVM vendors focused exclusively on the server for a long time.

• HotSpot has focused in recent years a lot more on dynamic language support, as far as I can tell .NET's support for this isn't as good. HotSpot supports "invokedynamic" which is basically a very advanced programmable dynamic linker, and gives a major performance boost to implementations of dynamic scripting languages like Javascript, Ruby, Python, etc. The post by pron below goes into more detail on Graal/Truffle which I agree is very interesting work, where they're pushing profile guided JITC technology dramatically further and represents a major change to the JVM's architecture (no more bytecode, for dynamic languages).

• HotSpot has several garbage collectors where as CoreCLR only seems to have one. Each HotSpot GC represents a different set of tradeoffs. The most advanced is called G1 but it's intended for servers that have lots of memory as the additional RAM overhead it imposes is somewhat high, which is why HotSpot still supports ParCMS (parallel concurrent mark sweep). For more desktopy workloads ParCMS is lower overhead and pause times are still OK. G1 has a lot of nifty features, for instance it can deduplicate strings that are lying around in RAM. I think the .NET GC is simply not as good as G1 and this tends to show up in benchmarks.

As an aside, the HotSpot/OpenJDK source code is dramatically easier to read and less messy than the .NET source code. The G1 collector code for instance is neatly modularised into different files, the classes are straightforward, etc. The .NET GC is a giant single C++ file so large that github refuses to render it. It's also riddled with #ifdefs and utility functions scattered around without any apparent rhyme or reason.


> In theory you can do that in Java too using sun.misc.Unsafe but nobody actually does.

You wouldn't believe how many libraries use sun.misc.Unsafe, which is why it is being considered as a public API in Java 9 (this is necessary because access to it will be restricted by the new module system). It's used by people building their own concurrency primitives (it offers direct access to memory fences), as well as by people doing low-latency, low-garbage processing (very common in the UK financial trading industry).

> .NET can do ahead of time compilation when an app is installed, whereas HotSpot cannot

Right, but all realtime JVMs offer AOT compilation (it's the usual tradeoff: slower code but more predictable), and even HotSpot may offer either AOT (less likely) or JIT caching (more likely) in Java 9.


While a very interesting article, and it provides a broad overview of what is to be expected by any JVM implementing the "JVM Specification", not all JVMs are made alike.

For example, JikesRVM, Aonix, IBM Websphere Real Time, OS/400 JVM (which makes use of OS/400 kernel JIT and TIMI) and many others.


I'd like to see this as a book.


Java 8 update 40 has a commercial feature called "Application Class Data Sharing" [1] that extends Class Data Sharing to application classes. However almost no documentation is available. Likely it is almost useless because likely it only works with the default system classloader.

[1] http://www.oracle.com/technetwork/java/javase/8u40-relnotes-...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: